# DuckDuckGo Image Scraper

This was originally an image scraper notebook for creating deep learning datasets.

It has since been turned into an installable library and is much easier to use as you can simply drop a few lines of code into your own notebook as you're experimenting.

This notebook now shows you how to use the library.

Docs are at [joedockrill.github.io/jmd_imagescraper/](https://joedockrill.github.io/jmd_imagescraper/)

Hugs & kisses, Joe Dockrill.

## Install



In [None]:
!pip install -q jmd_imagescraper

In [None]:
from google.colab import drive
drive.mount('/content/drive')

KeyboardInterrupt: ignored

In [None]:
import pandas as pd

file_path = '/content/drive/My Drive/catbreeds_wcf.csv'

# Read the CSV without a header and get the first column
df = pd.read_csv(file_path, header=None)

# Replace semicolons in the DataFrame
df[0] = df[0].str.replace(';', '')
df[0] = df[0].str.replace('SH', 'shorthair')
df[0] = df[0].str.replace('LH', 'longhair')
df[0] = df[0].str.replace('\xa0', '')


# Drop rows where the column is empty ('' or NaN after the replacement)
df = df.dropna().reset_index(drop=True)
df = df[df[0] != '']

# Convert the DataFrame column to a list
cleaned_strings = df[0].tolist()

In [None]:
cleaned_strings

['Abyssinian',
 'American Bobtail shorthair',
 'American Bobtail longhair',
 'American Curl shorthair',
 'American Curl longhair',
 'American Shorthair',
 'American Wirehair',
 'Anatoli',
 "Aphrodite's Giant shorthair",
 "Aphrodite's Giant longhair",
 'Arabian Mau',
 'Asian',
 'Australian Mist',
 'Balinese',
 'Bengal',
 'Bombay',
 'Brazilian Shorthair',
 'British Shorthair',
 'British Longhair',
 'Burmese',
 'Burmilla',
 'Burmilla longhair',
 'Celtic Shorthair',
 'Ceylon',
 'Chartreux',
 'Chausie',
 'Chinese Li Hau',
 'Classicat',
 'Colourpoint',
 'Colourpoint Shorthair',
 'Cornish Rex',
 'Cymric',
 'Devon Rex',
 'Deutsch Langhaar',
 'Don Sphynx',
 'Egyptian Mau',
 'Exotic Shorthair',
 'Foreign White shorthair',
 'Foreign White longhair',
 'German Rex',
 'Household pet',
 'Havana',
 'Highland Fold',
 'Highland Straight',
 'Highlander shorthair',
 'Highlander longhair',
 'Japanese Bobtail shorthair',
 'Japanese Bobtail longhair',
 'Kanaani',
 'Karelian Bobtail shorthair',
 'Karelian Bob

In [None]:
from pathlib import Path
root = Path().cwd()/"images"

from jmd_imagescraper.core import * # dont't worry, it's designed to work with import *

# Loop through the cleaned strings and perform the search
for string in cleaned_strings:
    # Ensure that the string is not just whitespace
    if string.strip():
        duckduckgo_search(root, string, f"{string} cat", max_results=1000)
        # ... handle the search results

Duckduckgo search: Abyssinian cat
Downloading results into /content/images/Abyssinian


Duckduckgo search: American Bobtail shorthair cat
Downloading results into /content/images/American Bobtail shorthair


Duckduckgo search: American Bobtail longhair cat
Downloading results into /content/images/American Bobtail longhair


Duckduckgo search: American Curl shorthair cat
Downloading results into /content/images/American Curl shorthair


Duckduckgo search: American Curl longhair cat
Downloading results into /content/images/American Curl longhair


Duckduckgo search: American Shorthair cat
Downloading results into /content/images/American Shorthair


Duckduckgo search: American Wirehair cat
Downloading results into /content/images/American Wirehair


Duckduckgo search: Anatoli cat
Downloading results into /content/images/Anatoli


Duckduckgo search: Aphrodite's Giant shorthair cat
Downloading results into /content/images/Aphrodite's Giant shorthair


Duckduckgo search: Aphrodite's Giant longhair cat
Downloading results into /content/images/Aphrodite's Giant longhair


Duckduckgo search: Arabian Mau cat
Downloading results into /content/images/Arabian Mau


Duckduckgo search: Asian cat
Downloading results into /content/images/Asian


Duckduckgo search: Australian Mist cat
Downloading results into /content/images/Australian Mist


Duckduckgo search: Balinese cat
Downloading results into /content/images/Balinese


Duckduckgo search: Bengal cat
Downloading results into /content/images/Bengal


Duckduckgo search: Bombay cat
Downloading results into /content/images/Bombay


Duckduckgo search: Brazilian Shorthair cat
Downloading results into /content/images/Brazilian Shorthair


Duckduckgo search: British Shorthair cat
Downloading results into /content/images/British Shorthair


Duckduckgo search: British Longhair cat
Downloading results into /content/images/British Longhair


Duckduckgo search: Burmese cat
Downloading results into /content/images/Burmese


Duckduckgo search: Burmilla cat
Downloading results into /content/images/Burmilla


Duckduckgo search: Burmilla longhair cat
Downloading results into /content/images/Burmilla longhair


Duckduckgo search: Celtic Shorthair cat
Downloading results into /content/images/Celtic Shorthair


Duckduckgo search: Ceylon cat
Downloading results into /content/images/Ceylon


Duckduckgo search: Chartreux cat
Downloading results into /content/images/Chartreux


Duckduckgo search: Chausie cat
Downloading results into /content/images/Chausie


Duckduckgo search: Chinese Li Hau cat
Downloading results into /content/images/Chinese Li Hau


/content/images/Chinese Li Hau/354_0345b464.jpg is invalid
Duckduckgo search: Classicat cat
Downloading results into /content/images/Classicat


Duckduckgo search: Colourpoint cat
Downloading results into /content/images/Colourpoint


Duckduckgo search: Colourpoint Shorthair cat
Downloading results into /content/images/Colourpoint Shorthair


Duckduckgo search: Cornish Rex cat
Downloading results into /content/images/Cornish Rex


Duckduckgo search: Cymric cat
Downloading results into /content/images/Cymric


Duckduckgo search: Devon Rex cat
Downloading results into /content/images/Devon Rex


Duckduckgo search: Deutsch Langhaar cat
Downloading results into /content/images/Deutsch Langhaar


Duckduckgo search: Don Sphynx cat
Downloading results into /content/images/Don Sphynx


Duckduckgo search: Egyptian Mau cat
Downloading results into /content/images/Egyptian Mau


Duckduckgo search: Exotic Shorthair cat
Downloading results into /content/images/Exotic Shorthair


Duckduckgo search: Foreign White shorthair cat
Downloading results into /content/images/Foreign White shorthair


Duckduckgo search: Foreign White longhair cat
Downloading results into /content/images/Foreign White longhair


Duckduckgo search: German Rex cat
Downloading results into /content/images/German Rex


Duckduckgo search: Household pet cat
Downloading results into /content/images/Household pet


Duckduckgo search: Havana cat
Downloading results into /content/images/Havana


Duckduckgo search: Highland Fold cat
Downloading results into /content/images/Highland Fold


Duckduckgo search: Highland Straight cat
Downloading results into /content/images/Highland Straight


Duckduckgo search: Highlander shorthair cat
Downloading results into /content/images/Highlander shorthair


Duckduckgo search: Highlander longhair cat
Downloading results into /content/images/Highlander longhair


Duckduckgo search: Japanese Bobtail shorthair cat
Downloading results into /content/images/Japanese Bobtail shorthair


Duckduckgo search: Japanese Bobtail longhair cat
Downloading results into /content/images/Japanese Bobtail longhair


Duckduckgo search: Kanaani cat
Downloading results into /content/images/Kanaani


Duckduckgo search: Karelian Bobtail shorthair cat
Downloading results into /content/images/Karelian Bobtail shorthair


Duckduckgo search: Karelian Bobtail longhair cat
Downloading results into /content/images/Karelian Bobtail longhair


Duckduckgo search: Korat cat
Downloading results into /content/images/Korat


Duckduckgo search: Kurilian Bobtail shorthair cat
Downloading results into /content/images/Kurilian Bobtail shorthair


Duckduckgo search: Kurilian Bobtail longhair cat
Downloading results into /content/images/Kurilian Bobtail longhair


Duckduckgo search: LaPerm shorthair cat
Downloading results into /content/images/LaPerm shorthair


Duckduckgo search: LaPerm longhair cat
Downloading results into /content/images/LaPerm longhair


Duckduckgo search: Lykoy cat
Downloading results into /content/images/Lykoy


Duckduckgo search: Maine Coon cat
Downloading results into /content/images/Maine Coon


Duckduckgo search: Mandalay cat
Downloading results into /content/images/Mandalay


Duckduckgo search: Manx cat
Downloading results into /content/images/Manx


Duckduckgo search: Mekong Bobtail cat
Downloading results into /content/images/Mekong Bobtail


Duckduckgo search: Minskin cat
Downloading results into /content/images/Minskin


Duckduckgo search: Munchkin shorthair cat
Downloading results into /content/images/Munchkin shorthair


Duckduckgo search: Munchkin longhair cat
Downloading results into /content/images/Munchkin longhair


Duckduckgo search: Nebelung cat
Downloading results into /content/images/Nebelung


Duckduckgo search: Neva Masquerade cat
Downloading results into /content/images/Neva Masquerade


Duckduckgo search: Norwegian Forest cat
Downloading results into /content/images/Norwegian Forest


Duckduckgo search: Ocicat cat
Downloading results into /content/images/Ocicat


Duckduckgo search: Ojos Azulesshorthair cat
Downloading results into /content/images/Ojos Azulesshorthair


Duckduckgo search: Ojos Azules longhair cat
Downloading results into /content/images/Ojos Azules longhair


Duckduckgo search: Oriental (Semi-) Longhair cat
Downloading results into /content/images/Oriental (Semi-) Longhair


Duckduckgo search: Oriental Shorthair cat
Downloading results into /content/images/Oriental Shorthair


Duckduckgo search: Original Longhair cat
Downloading results into /content/images/Original Longhair


Duckduckgo search: Persian cat
Downloading results into /content/images/Persian


Duckduckgo search: Peterbald cat
Downloading results into /content/images/Peterbald


Duckduckgo search: Pixiebob shorthair cat
Downloading results into /content/images/Pixiebob shorthair


Duckduckgo search: Pixiebob longhair cat
Downloading results into /content/images/Pixiebob longhair


Duckduckgo search: Ragamuffin cat
Downloading results into /content/images/Ragamuffin


Duckduckgo search: Ragdoll cat
Downloading results into /content/images/Ragdoll


Duckduckgo search: Russian cat
Downloading results into /content/images/Russian


Duckduckgo search: Russian Blue cat
Downloading results into /content/images/Russian Blue


Duckduckgo search: Sacred Birman cat
Downloading results into /content/images/Sacred Birman


Duckduckgo search: Savannah cat
Downloading results into /content/images/Savannah


Duckduckgo search: Scottish Fold cat
Downloading results into /content/images/Scottish Fold


Duckduckgo search: Scottish Straight cat
Downloading results into /content/images/Scottish Straight


Duckduckgo search: Selkirk Rex shorthair cat
Downloading results into /content/images/Selkirk Rex shorthair


Duckduckgo search: Selkirk Rex longhair cat
Downloading results into /content/images/Selkirk Rex longhair


Duckduckgo search: Serengeti cat
Downloading results into /content/images/Serengeti


Duckduckgo search: Siamese cat
Downloading results into /content/images/Siamese


Duckduckgo search: Siberian cat cat
Downloading results into /content/images/Siberian cat


Duckduckgo search: Singapura cat
Downloading results into /content/images/Singapura


Duckduckgo search: Snowshoe cat
Downloading results into /content/images/Snowshoe


Duckduckgo search: Sokoke cat
Downloading results into /content/images/Sokoke


Duckduckgo search: Somali cat
Downloading results into /content/images/Somali


Duckduckgo search: Sphynx cat
Downloading results into /content/images/Sphynx


Duckduckgo search: Thai cat
Downloading results into /content/images/Thai


Duckduckgo search: Tiffanie cat
Duckduckgo search: Tiffanie cat
Downloading results into /content/images/Tiffanie
Downloading results into /content/images/Tiffanie


Duckduckgo search: Tonkanese shorthair cat
Duckduckgo search: Tonkanese shorthair cat
Downloading results into /content/images/Tonkanese shorthair
Downloading results into /content/images/Tonkanese shorthair


Duckduckgo search: Tonkanese longhair cat
Duckduckgo search: Tonkanese longhair cat
Downloading results into /content/images/Tonkanese longhair
Downloading results into /content/images/Tonkanese longhair


Duckduckgo search: Toybob cat
Duckduckgo search: Toybob cat
Downloading results into /content/images/Toybob
Downloading results into /content/images/Toybob


Duckduckgo search: Toyger cat
Duckduckgo search: Toyger cat
Downloading results into /content/images/Toyger
Downloading results into /content/images/Toyger


Exception occured while retrieving https://tse1.mm.bing.net/th?id=OIP.Qz3CnBm2TfvQEIXOeuPruAHaG4&pid=Api
Exception occured while retrieving https://tse1.mm.bing.net/th?id=OIP.Qz3CnBm2TfvQEIXOeuPruAHaG4&pid=Api
Duckduckgo search: Turkish Angora cat
Duckduckgo search: Turkish Angora cat
Downloading results into /content/images/Turkish Angora
Downloading results into /content/images/Turkish Angora


Duckduckgo search: Turkish Van cat
Duckduckgo search: Turkish Van cat
Downloading results into /content/images/Turkish Van
Downloading results into /content/images/Turkish Van


Duckduckgo search: Turkish Vankedisi cat
Duckduckgo search: Turkish Vankedisi cat
Downloading results into /content/images/Turkish Vankedisi
Downloading results into /content/images/Turkish Vankedisi


Duckduckgo search: Ural Rex shorthair cat
Duckduckgo search: Ural Rex shorthair cat
Downloading results into /content/images/Ural Rex shorthair
Downloading results into /content/images/Ural Rex shorthair


Duckduckgo search: Ural Rex longhair cat
Duckduckgo search: Ural Rex longhair cat
Downloading results into /content/images/Ural Rex longhair
Downloading results into /content/images/Ural Rex longhair


Duckduckgo search: York cat
Duckduckgo search: York cat
Downloading results into /content/images/York
Downloading results into /content/images/York


In [None]:
import shutil
import os

# Define your Colab folder where all subdirectories are located
colab_folder = '/content/images'
# Define the Drive target directory where you want to move these subdirectories
drive_target_dir = '/content/drive/My Drive/Cat_Breeds'

# Check and create the target directory in Drive if it doesn't exist
if not os.path.exists(drive_target_dir):
    os.makedirs(drive_target_dir)

# Loop through all subdirectories in the Colab folder
for breed_dir in os.listdir(colab_folder):
    # Full path to the subdirectory in Colab
    source_breed_path = os.path.join(colab_folder, breed_dir)
    # Full path to the target subdirectory in Drive
    destination_breed_path = os.path.join(drive_target_dir, breed_dir)

    # Ensure the path is a directory before copying
    if os.path.isdir(source_breed_path):
        # If the target subdirectory already exists, it must be removed before copytree can be used
        if os.path.exists(destination_breed_path):
            shutil.rmtree(destination_breed_path)
        # Copy the whole subdirectory to Drive
        shutil.copytree(source_breed_path, destination_breed_path)

print("All directories have been copied to Google Drive.")


All directories have been copied to Google Drive.


In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
colab_folder = '/content/images'
print(f"Subdirectories in '{colab_folder}':")
print(os.listdir(colab_folder))

Subdirectories in '/content/images':
['Somali', 'Highlander longhair', 'Korat', 'Tiffanie', 'Mandalay', 'Munchkin longhair', 'British Longhair', 'American Bobtail longhair', 'Burmese', 'Burmilla longhair', 'Cornish Rex', 'Exotic Shorthair', 'Chinese Li Hau', 'Ural Rex longhair', 'Abyssinian', 'Sphynx', 'Sokoke', 'Pixiebob longhair', "Aphrodite's Giant shorthair", 'Foreign White shorthair', 'Celtic Shorthair', 'Serengeti', 'Siberian cat', 'Chartreux', 'Savannah', 'Deutsch Langhaar', 'Kanaani', 'German Rex', 'Highland Fold', 'Classicat', 'Persian', 'Havana', 'York', 'Ragamuffin', 'Australian Mist', 'Russian', 'Colourpoint', 'Highland Straight', 'LaPerm longhair', 'Oriental (Semi-) Longhair', 'Brazilian Shorthair', 'Minskin', 'Munchkin shorthair', 'Japanese Bobtail longhair', 'Sacred Birman', 'Karelian Bobtail longhair', 'Mekong Bobtail', 'Karelian Bobtail shorthair', 'Kurilian Bobtail shorthair', 'Tonkanese shorthair', 'Kurilian Bobtail longhair', 'Egyptian Mau', 'Bengal', 'Scottish Stra

In [None]:
for breed_dir in os.listdir(colab_folder):
    source_breed_path = os.path.join(colab_folder, breed_dir)
    destination_breed_path = os.path.join(drive_target_dir, breed_dir)

    print(f"Attempting to copy {source_breed_path} to {destination_breed_path}")

    if os.path.isdir(source_breed_path):
        if os.path.exists(destination_breed_path):
            shutil.rmtree(destination_breed_path)
        shutil.copytree(source_breed_path, destination_breed_path)
        print(f"Copied {breed_dir}")
    else:
        print(f"Skipped {breed_dir} because it is not a directory")


Attempting to copy /content/images/Somali to /content/drive/My Drive/Cat_Breeds/Somali
Copied Somali
Attempting to copy /content/images/Highlander longhair to /content/drive/My Drive/Cat_Breeds/Highlander longhair
Copied Highlander longhair
Attempting to copy /content/images/Korat to /content/drive/My Drive/Cat_Breeds/Korat
Copied Korat
Attempting to copy /content/images/Tiffanie to /content/drive/My Drive/Cat_Breeds/Tiffanie
Copied Tiffanie
Attempting to copy /content/images/Mandalay to /content/drive/My Drive/Cat_Breeds/Mandalay
Copied Mandalay
Attempting to copy /content/images/Munchkin longhair to /content/drive/My Drive/Cat_Breeds/Munchkin longhair
Copied Munchkin longhair
Attempting to copy /content/images/British Longhair to /content/drive/My Drive/Cat_Breeds/British Longhair
Copied British Longhair
Attempting to copy /content/images/American Bobtail longhair to /content/drive/My Drive/Cat_Breeds/American Bobtail longhair
Copied American Bobtail longhair
Attempting to copy /cont

## Download images

In [None]:
from pathlib import Path
root = Path().cwd()/"images"

from jmd_imagescraper.core import * # dont't worry, it's designed to work with import *



# file paths are returned so if you want to snag a list of downloaded files as you go, do this:

# images = []
# images.extend(duckduckgo_search(root, "Cats", "cute kittens", max_results=10))
# images.extend(duckduckgo_search(root, "Dogs", "cute puppies", max_results=10))
# images.extend(duckduckgo_search(root, "Birds", "cute baby ducks and chickens", max_results=10))
# images

## Changing params across multiple searches

In [None]:
# If you're going to override default params across multiple searches you can use a
# dictionary like this (so you can change search params for the entire dataset once).

params = {
    "max_results": 200,             # this can go up to 477 at the time of writing
    "img_size":    ImgSize.Cached,
    "img_type":    ImgType.Photo,
    "img_layout":  ImgLayout.Square,
    "img_color":   ImgColor.Purple
}

duckduckgo_search(root, "Nice", "nice clowns", **params)
duckduckgo_search(root, "Scary", "scary clowns", **params)

## Deleting all images

In [None]:
rmtree(root)

## Displaying the image cleaner

Use this to get rid of unsuitable images without leaving your notebook

In [None]:
from jmd_imagescraper.imagecleaner import *

display_image_cleaner(root)

## Create a zip to download or transfer to google drive

In [None]:
# create zip

ZIP_NAME = "images_all_911.zip" # maybe change this?

!rm -f {ZIP_NAME}
!zip -q -r {ZIP_NAME} {root}

In [None]:
# download to your local system

from google.colab import files
files.download(ZIP_NAME)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# copy to google drive

from google.colab import drive
import shutil

DESTINATION_FOLDER = "Datasets" # where would you like this in Google Drive?

drive.mount("/content/drive")
folder = Path("/content/drive/My Drive")/DESTINATION_FOLDER
folder.mkdir(parents=True, exist_ok=True)

shutil.copyfile(ZIP_NAME, str(folder/ZIP_NAME))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


'/content/drive/My Drive/Datasets/images_all_911.zip'

## Create a CSV file of URLs

If you'd rather distribute a file with the image URLs and labels and have people download the images themselves you can do so here.

In [None]:
CSV_NAME = "imageurls.csv" # maybe change this?

!rm -f {CSV_NAME}

csv = Path.cwd()/CSV_NAME
save_urls_to_csv(csv, "Nice", "nice clowns", max_results=5)
save_urls_to_csv(csv, "Scary", "scary clowns", max_results=5)

In [None]:
import shutil
import os

# Define the source folder path in Colab
source_folder_path = '/content/images'  # Replace with your Colab folder path

# Define the destination folder path in Google Drive
destination_drive_path = '/content/drive/My Drive/Webscraping_newtry/'  # Replace with your desired Google Drive folder path

# Use shutil to copy the entire folder and its contents
shutil.copytree(source_folder_path, destination_drive_path)


'/content/drive/My Drive/Webscraping_newtry/'