
# Get Image Data for Training the Neural Network

## Structure of this Notebook

Part 1: See how it works for one class (banana peel).

Part 2: Do it in a loop and get the remaining classes.

## Challenges and Remarks

- Hard to find images containing a single object
- Hard to find realistic images instead of stock photos
- I used glass bottles for now, because they are easier to take care of than most plastic bottles
- We must find a feasible way of sharing the downloaded data to make sure we use the same ones. I suppose GitHub is not a good place for that. Maybe Google Drive, but we would have to use our private storage for it.

## Other

Code adapted from fastbook.

In [None]:
# imports
# enable using google drive
from google.colab import drive
# enable using Bing image search API to download and save images as well as many other useful functions
from fastbook import *
# enable operational system interface
import os

In [None]:
# mount google drive
drive.mount('/content/drive')

## Get it done for one class
This section is supposed to provide understanding of how the code works using my favorite class: banana peel.

In [None]:
# see arguments of the function
help(search_images_bing)

In [None]:
# set the Bing image search key
# CAREFUL: Do not save when your key is visible here!
# delete after running the cell!
key = os.environ.get('AZURE_SEARCH_KEY', 'XXX')

In [None]:
# get the results of the bing image search 
# variable results contains all info including website and date accessed
# we agreed to use about 20 images. since many image files do not work (see first link),
# I set the max images to 30 in order to reliably get a sufficient amount
results = search_images_bing(key, term = "banana peel", max_images = 30)
# variable ims contains the urls of the images
ims = results.attrgot('contentUrl')
# print all the urls, makes it possible to open them
for url in ims:
  print(url)
# it really annoys me that the first one does not work^^ 
# ff any other did not work it would not matter, but this one will be clicked lol 

In [None]:
# set the directory where the images will be downloaded to
# they are all in the directory "images"
# first class that I will test the download with is banana peel
# this command creates a new directory called "banana_peel" to contain only banana peel images
dest = Path('drive/MyDrive/Training_WasteWise/images/banana_peel')
dest.mkdir()

In [None]:
# download the images
download_images(dest, urls=ims)

In [None]:
# gets the path for each of the do
fns = get_image_files(dest)
print(len(fns), "images were downloaded successfully:")
fns

In [None]:
# just to make sure, verify that the files are actually images
# verify_images() returns files that are NOT images (failed)
# map/unlink deletes failed files
failed = verify_images(fns)
failed.map(Path.unlink)

## Make a loop for the remaining classes
For now just include the classes included in the first prototype.

In [None]:
# set classes 
# in this step, they will be used as search terms
# we can rename them later on, but for now, the search has to be precise
classes = 'orange fruit','paper','cardboard','glass bottle','plastic packaging waste','smartphone'
# set the path, where the images will be saved
path = Path('drive/MyDrive/Training_WasteWise/images')

In [None]:
# download the images for each class
for trash_type in classes:
    dest = (path/trash_type)
    dest.mkdir(exist_ok=True)
    results = search_images_bing(key, trash_type)
    download_images(dest, urls=results.attrgot('contentUrl'))

In [None]:
# get rid of failed images
fns = get_image_files(dest)
failed = verify_images(fns)
failed.map(Path.unlink)

You can now find the images in Google Drive :)

In [10]:
!git status

On branch data_preparation_DL
Your branch is ahead of 'origin/data_preparation_DL' by 1 commit.
  (use "git push" to publish your local commits)

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	[32mnew file:   AI/DL_data_preparation/downloading_image_data.ipynb[m

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   AI/DL_data_preparation/downloading_image_data.ipynb[m



In [8]:
!git add .