# DuckDuckGo Image Scraper

This was originally an image scraper notebook for creating deep learning datasets.

It has since been turned into an installable library and is much easier to use as you can simply drop a few lines of code into your own notebook as you're experimenting.

This notebook now shows you how to use the library.

Docs are at [joedockrill.github.io/jmd_imagescraper/](https://joedockrill.github.io/jmd_imagescraper/)

Hugs & kisses, Joe Dockrill.

## Install



In [None]:
!pip install -q jmd_imagescraper

## Download images

In [None]:
from pathlib import Path
root = Path().cwd()/"images"

from jmd_imagescraper.core import * # dont't worry, it's designed to work with import *

duckduckgo_search(root, "mainan_puzzle", "puzzle", max_results=5000)
# duckduckgo_search(root, "Dogs", "cute puppies", max_results=10)
# duckduckgo_search(root, "Birds", "cute baby ducks and chickens", max_results=10)

# file paths are returned so if you want to snag a list of downloaded files as you go, do this:

# images = []
# images.extend(duckduckgo_search(root, "Cats", "cute kittens", max_results=10))
# images.extend(duckduckgo_search(root, "Dogs", "cute puppies", max_results=10))
# images.extend(duckduckgo_search(root, "Birds", "cute baby ducks and chickens", max_results=10))
# images

Duckduckgo search: puzzle
Downloading results into /content/images/mainan_puzzle


[PosixPath('/content/images/mainan_puzzle/001_10cb919f.jpg'),
 PosixPath('/content/images/mainan_puzzle/002_bb8f1207.jpg'),
 PosixPath('/content/images/mainan_puzzle/003_7b8aee9e.jpg'),
 PosixPath('/content/images/mainan_puzzle/004_68d4ae2f.jpg'),
 PosixPath('/content/images/mainan_puzzle/005_34590e6e.jpg'),
 PosixPath('/content/images/mainan_puzzle/006_aba9dea4.jpg'),
 PosixPath('/content/images/mainan_puzzle/007_fe3bccd2.jpg'),
 PosixPath('/content/images/mainan_puzzle/008_0abf5da2.jpg'),
 PosixPath('/content/images/mainan_puzzle/009_e94020df.jpg'),
 PosixPath('/content/images/mainan_puzzle/010_1e148ef8.jpg'),
 PosixPath('/content/images/mainan_puzzle/011_067236f4.jpg'),
 PosixPath('/content/images/mainan_puzzle/012_7b3400e7.jpg'),
 PosixPath('/content/images/mainan_puzzle/013_119fbf04.jpg'),
 PosixPath('/content/images/mainan_puzzle/014_04a0561a.jpg'),
 PosixPath('/content/images/mainan_puzzle/015_44109b97.jpg'),
 PosixPath('/content/images/mainan_puzzle/016_833ae93b.jpg'),
 PosixPa

## Changing params across multiple searches

In [None]:
# If you're going to override default params across multiple searches you can use a
# dictionary like this (so you can change search params for the entire dataset once).

params = {
    "max_results": 10,             # this can go up to 477 at the time of writing
    "img_size":    ImgSize.Cached,
    "img_type":    ImgType.Photo,
    "img_layout":  ImgLayout.Square,
    "img_color":   ImgColor.Purple
}

duckduckgo_search(root, "Nice", "nice clowns", **params)
duckduckgo_search(root, "Scary", "scary clowns", **params)

## Deleting all images

In [None]:
rmtree(root)

NameError: ignored

## Displaying the image cleaner

Use this to get rid of unsuitable images without leaving your notebook

In [None]:
from jmd_imagescraper.imagecleaner import *

display_image_cleaner(root)

## Create a zip to download or transfer to google drive

In [None]:
# create zip

ZIP_NAME = "mainanv1.zip" # maybe change this?

!rm -f {ZIP_NAME}
!zip -q -r {ZIP_NAME} {root}

In [None]:
# download to your local system

from google.colab import files
files.download(ZIP_NAME)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# copy to google drive

from google.colab import drive
import shutil

DESTINATION_FOLDER = "Datasets" # where would you like this in Google Drive?

drive.mount("/content/drive")
folder = Path("/content/drive/My Drive")/DESTINATION_FOLDER
folder.mkdir(parents=True, exist_ok=True)

shutil.copyfile(ZIP_NAME, str(folder/ZIP_NAME))

## Create a CSV file of URLs

If you'd rather distribute a file with the image URLs and labels and have people download the images themselves you can do so here.

In [None]:
CSV_NAME = "images.csv" # maybe change this?

!rm -f {CSV_NAME}

csv = Path.cwd()/CSV_NAME
save_urls_to_csv(csv, "Nice", "nice clowns", max_results=5)
save_urls_to_csv(csv, "Scary", "scary clowns", max_results=5)