# DuckDuckGo Image Scraper

This was originally an image scraper notebook for creating deep learning datasets.

It has since been turned into an installable library and is much easier to use as you can simply drop a few lines of code into your own notebook as you're experimenting. 

This notebook now shows you how to use the library.

Docs are at [joedockrill.github.io/jmd_imagescraper/](https://joedockrill.github.io/jmd_imagescraper/)

Hugs & kisses, Joe Dockrill. 

## Install



In [None]:
!pip install -q jmd_imagescraper

[?25l[K     |▏                               | 10 kB 16.9 MB/s eta 0:00:01[K     |▍                               | 20 kB 5.3 MB/s eta 0:00:01[K     |▋                               | 30 kB 7.6 MB/s eta 0:00:01[K     |▉                               | 40 kB 3.6 MB/s eta 0:00:01[K     |█                               | 51 kB 3.7 MB/s eta 0:00:01[K     |█▎                              | 61 kB 4.3 MB/s eta 0:00:01[K     |█▌                              | 71 kB 4.6 MB/s eta 0:00:01[K     |█▊                              | 81 kB 5.2 MB/s eta 0:00:01[K     |█▉                              | 92 kB 3.9 MB/s eta 0:00:01[K     |██                              | 102 kB 4.0 MB/s eta 0:00:01[K     |██▎                             | 112 kB 4.0 MB/s eta 0:00:01[K     |██▌                             | 122 kB 4.0 MB/s eta 0:00:01[K     |██▊                             | 133 kB 4.0 MB/s eta 0:00:01[K     |███                             | 143 kB 4.0 MB/s eta 0:00:01[K    

## Download images

In [None]:
! rm -r images

In [None]:
from pathlib import Path
root = Path().cwd()/"images"

from jmd_imagescraper.core import * # dont't worry, it's designed to work with import *

num = 50

duckduckgo_search(root, "initial", "empty glass", max_results=num)
duckduckgo_search(root, "positive", "glass of orange juice", max_results=num)
duckduckgo_search(root, "negative", "glass of drink", max_results=num)

# file paths are returned so if you want to snag a list of downloaded files as you go, do this:

# images = []
# images.extend(duckduckgo_search(root, "Cats", "cute kittens", max_results=10))
# images.extend(duckduckgo_search(root, "Dogs", "cute puppies", max_results=10))
# images.extend(duckduckgo_search(root, "Birds", "cute baby ducks and chickens", max_results=10))
# images

Duckduckgo search: empty glass
Downloading results into /content/images/initial


Duckduckgo search: glass of orange juice
Downloading results into /content/images/positive


Duckduckgo search: glass of drink
Downloading results into /content/images/negative


[PosixPath('/content/images/negative/051_e72a56cd.jpg'),
 PosixPath('/content/images/negative/052_0b378ae4.jpg'),
 PosixPath('/content/images/negative/053_4b268f89.jpg'),
 PosixPath('/content/images/negative/054_e36b00bc.jpg'),
 PosixPath('/content/images/negative/055_847d4bdf.jpg'),
 PosixPath('/content/images/negative/056_41849ab8.jpg'),
 PosixPath('/content/images/negative/057_123bb6db.jpg'),
 PosixPath('/content/images/negative/058_e129a31a.jpg'),
 PosixPath('/content/images/negative/059_126924fb.jpg'),
 PosixPath('/content/images/negative/060_08ff0f7b.jpg'),
 PosixPath('/content/images/negative/061_660f6e02.jpg'),
 PosixPath('/content/images/negative/062_3b9d890d.jpg'),
 PosixPath('/content/images/negative/063_c1eac779.jpg'),
 PosixPath('/content/images/negative/064_3310bc27.jpg'),
 PosixPath('/content/images/negative/065_94ec37dd.jpg'),
 PosixPath('/content/images/negative/066_d535aeb5.jpg'),
 PosixPath('/content/images/negative/067_ee3d7afb.jpg'),
 PosixPath('/content/images/neg

## Changing params across multiple searches

In [None]:
# If you're going to override default params across multiple searches you can use a 
# dictionary like this (so you can change search params for the entire dataset once).

params = {
    "max_results": 10,             # this can go up to 477 at the time of writing
    "img_size":    ImgSize.Cached, 
    "img_type":    ImgType.Photo,
    "img_layout":  ImgLayout.Square,
    "img_color":   ImgColor.Purple
}

duckduckgo_search(root, "Nice", "nice clowns", **params)
duckduckgo_search(root, "Scary", "scary clowns", **params)

Duckduckgo search: nice clowns
Downloading results into /content/images/Nice


Duckduckgo search: scary clowns
Downloading results into /content/images/Scary


[PosixPath('/content/images/Scary/001_e6a1781b.jpg'),
 PosixPath('/content/images/Scary/002_57e8ce1a.jpg'),
 PosixPath('/content/images/Scary/003_fc2c589b.jpg'),
 PosixPath('/content/images/Scary/004_73b8a3dd.jpg'),
 PosixPath('/content/images/Scary/005_9c438a61.jpg'),
 PosixPath('/content/images/Scary/006_0f359931.jpg'),
 PosixPath('/content/images/Scary/007_edc456f1.jpg'),
 PosixPath('/content/images/Scary/008_7a939d3b.jpg'),
 PosixPath('/content/images/Scary/009_d32354ad.jpg'),
 PosixPath('/content/images/Scary/010_5b463cca.jpg')]

## Deleting all images

In [None]:
rmtree(root)

## Displaying the image cleaner

Use this to get rid of unsuitable images without leaving your notebook

In [None]:
from jmd_imagescraper.imagecleaner import *

display_image_cleaner(root)

HBox(children=(Button(description='|<<', layout=Layout(width='60px'), style=ButtonStyle()), Button(description…

HTML(value='<h2>No images left to display in this folder.</h2>', layout=Layout(visibility='hidden'))

GridBox(children=(VBox(children=(Image(value=b'', layout="Layout(width='150px')"), Button(description='Delete'…

## Create a zip to download or transfer to google drive

In [None]:
# create zip

ZIP_NAME = "images.zip" # maybe change this?

!rm -f {ZIP_NAME}
!zip -q -r {ZIP_NAME} {root}

In [None]:
# download to your local system

from google.colab import files
files.download(ZIP_NAME)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# copy to google drive 

from google.colab import drive
import shutil

DESTINATION_FOLDER = "Datasets" # where would you like this in Google Drive?

drive.mount("/content/drive") 
folder = Path("/content/drive/My Drive")/DESTINATION_FOLDER
folder.mkdir(parents=True, exist_ok=True)

shutil.copyfile(ZIP_NAME, str(folder/ZIP_NAME))

## Create a CSV file of URLs

If you'd rather distribute a file with the image URLs and labels and have people download the images themselves you can do so here.

In [None]:
CSV_NAME = "images.csv" # maybe change this?

!rm -f {CSV_NAME}

csv = Path.cwd()/CSV_NAME
save_urls_to_csv(csv, "Nice", "nice clowns", max_results=5)
save_urls_to_csv(csv, "Scary", "scary clowns", max_results=5)