<a href="https://colab.research.google.com/github/butchland/build-your-own-image-classifier/blob/master/colab-build-image-dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collect Images for your Image Classifier

## Instructions

1. Fill out the search terms and project name first _(if you want to build the default image classifier, just leave the search terms and project name as is)_ 
1. Click on the `Connect` button on the top right area of the page. This will change into a checkmark with the RAM and Disk health bars once the connection is complete.
1. Press `Cmd/Ctrl+F9` or Click on the menu `Runtime/Run all`
1. Click on the link to `accounts.google.com` that appears and login in to your Google Account if neccessary or select the Google Account to use for your Google Drive. (This will open a new tab)
1. Authorize `Google Drive File Stream` to access your Google Drive (We will use this to save your collected images to a folder on your Google Drive). 
1. Copy the generated authentication token and paste it on the input box that appears.
1. Let your notebook run all the way to the end. _(You don't need to do anything)_
1. Once the text 'DONE! DONE! DONE!' is printed at the end of the notebook, You can click on the menu `Runtime/Factory reset runtime` and click `Yes` on the dialog box to end your session.

Your image dataset will be saved in your Google Drive under `/My Drive/build-your-own-image-classifier/data/<project-name>/<project-name>.tgz` _(if you didn't change the defaults, it should be under `/My Drive/build-your-own-image-classifier/data/pets/pets.tgz`)_


## What is going on?

This section explains the code behind this notebook

_(Click on SHOW CODE to display the code)_

### Connect to your Google

We'll need to connect to your Google Drive in order to save the images we'll be collecting later.

In [None]:
#@title {display-mode: "form"}
from google.colab import drive
drive.mount('/content/drive')

### Specify Search Terms and Project Names

Fill out the `search terms` and `project name` -- the project name is going to be used as the file and folder names where the images searched for (one folder for each search term) will be zipped and stored.

In [None]:
#@title Enter your search terms and project name {display-mode: "form"}
search_terms = "cats,dogs" #@param {type: "string"}
project = "pets" #@param {type: "string"}

### Install Python Packages

Install all the python packages to collect the images

In [None]:
#@title {display-mode: "form"} 
!pip install -Uqq fastai --upgrade
!pip install -Uqq jmd_imagescraper

### Execute the Search

Search and download the images for each search term. Display a count of how many images for each search term was downloaded.

In [None]:
#@title {display-mode: "form"} 
from pathlib import Path
from jmd_imagescraper.core import *
path = Path(project)
params = {
    "max_results": 500,
    "img_size":    ImgSize.Cached, 
    "img_type":    ImgType.Photo,
    "img_layout":  ImgLayout.Square,
    "img_color":   ImgColor.All,
    "uuid_names": True
}
search_items = [term.strip() for term in search_terms.split(',')]
all_imgs = []
folder_path = f'build-your-own-image-classifier/data/{project}'  
file_name = f'{project}.tgz' 
print('Image Counts:')
for search_item in search_items:
    imgs = duckduckgo_search(path, search_item, search_item, **params)
    img_counts = len(imgs)
    all_imgs.extend(imgs)
    print(f'{search_item} : {img_counts}')

### Zip and save the downloaded images to Google Drive

Zip up the images for each category into a single tar zipped file (`.tgz` format) and copy it to your Google Drive.

In [None]:
#@title {display-mode: "form"} 
!tar -czf {project}.tgz {project}
!mkdir -p /content/drive/My\ Drive/{folder_path}
!cp {file_name} /content/drive/My\ Drive/{folder_path}

### Finish up 
Print completion message and display where the zipped images file was copied to.

In [None]:
#@title {display-mode: "form"} 
print('DONE! DONE! DONE!')
print(f'Your image dataset has been saved in your Google Drive/My Drive/{folder_path}/{project}.tgz')