
# Get Image Data for Training the Neural Network

## Structure of this Notebook

Part 1: See how it works for one class (banana peel).

Part 2: Do it in a loop and get the remaining classes.

Part 3: Finish preparation of folders to be used as training data.

## Challenges and Remarks

- Hard to find images containing a single object
- Hard to find realistic images instead of stock photos
- I used glass bottles for now, because they are easier to take care of than most plastic bottles

## Other

Code adapted from fastbook.

In [3]:
# imports
# enable using google drive
from google.colab import drive
# enable using Bing image search API to download and save images as well as many other useful functions
!pip install -Uqq fastbook
from fastbook import *
# enable operational system interface
import os

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m719.8/719.8 KB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 KB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m56.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.0/213.0 KB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.0/132.0 KB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 KB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

In [5]:
# mount google drive
drive.mount('/content/drive')

Mounted at /content/drive


## Get it done for one class
This section is supposed to provide understanding of how the code works using my favorite class: banana peel.

In [4]:
# see arguments of the function
help(search_images_bing)

Help on function search_images_bing in module fastbook:

search_images_bing(key, term, min_sz=128, max_images=150)



In [None]:
# set the Bing image search key
# CAREFUL: Do not save when your key is visible here!
# delete after running the cell!
key = os.environ.get('AZURE_SEARCH_KEY', 'XXX')

In [None]:
# get the results of the bing image search 
# variable results contains all info including website and date accessed
# we agreed to use about 20 images. since many image files do not work (see first link),
# I set the max images to 30 in order to reliably get a sufficient amount
results = search_images_bing(key, term = "banana peel", max_images = 30)
# variable ims contains the urls of the images
ims = results.attrgot('contentUrl')
# print all the urls, makes it possible to open them
for url in ims:
  print(url)
# it really annoys me that the first one does not work^^ 
# ff any other did not work it would not matter, but this one will be clicked lol 

In [None]:
# set the directory where the images will be downloaded to
# they are all in the directory "images"
# first class that I will test the download with is banana peel
# this command creates a new directory called "banana_peel" to contain only banana peel images
dest = Path('drive/MyDrive/Training_WasteWise/images/banana_peel')
dest.mkdir()

In [None]:
# download the images
download_images(dest, urls=ims)

In [None]:
# gets the path for each of the do
fns = get_image_files(dest)
print(len(fns), "images were downloaded successfully:")
fns

In [None]:
# just to make sure, verify that the files are actually images
# verify_images() returns files that are NOT images (failed)
# map/unlink deletes failed files
failed = verify_images(fns)
failed.map(Path.unlink)

## Make a loop for the remaining classes
For now just include the classes included in the first prototype.

In [None]:
# set classes 
# in this step, they will be used as search terms
# we can rename them later on, but for now, the search has to be precise
# because of that, use search prompts instead of class names
classes = 'orange fruit','paper','cardboard','glass bottle','plastic packaging waste','smartphone'
# set the path, where the images will be saved
path = Path('drive/MyDrive/Training_WasteWise/images')

In [None]:
# download the images for each class
for trash_type in classes:
    dest = (path/trash_type)
    dest.mkdir(exist_ok=True)
    results = search_images_bing(key, trash_type)
    download_images(dest, urls=results.attrgot('contentUrl'))

In [None]:
# get rid of failed images
fns = get_image_files(dest)
failed = verify_images(fns)
failed.map(Path.unlink)

## Rename the Folders

You can now find the images in Google Drive :)

However, the folder names are kind of bad, since they contain spaces (which are annoying to handle in a lot of pipelines) and are not how we want our classes to be called. An example is "orange fruit". We want the class to be called "orange" instead.

In [11]:
# go to directory which images are save at
os.chdir("drive/MyDrive/Training_WasteWise/images")

In [12]:
# check if in correct directory
os.getcwd()

'/content/drive/MyDrive/Training_WasteWise/images'

In [13]:
# list files in directory
os.listdir()

['banana_peel',
 'glass bottle',
 'orange fruit',
 'paper',
 'cardboard',
 'plastic packaging waste',
 'smartphone']

In [18]:
# rename the folders that need it
os.rename("glass bottle", "glass_bottle")
os.rename("orange fruit", "orange")
os.rename("plastic packaging waste", "plastic_packaging")

In [19]:
# list files in directory
os.listdir()

['banana_peel',
 'glass_bottle',
 'orange',
 'paper',
 'cardboard',
 'plastic_packaging',
 'smartphone']

Perfect! Now the folders have the required structure and name to be used for training a model (using the fastai library).