# Image Classification Transfer learning with dataset from local disk




This Colab follows closely the https://www.tensorflow.org/lite/models/modify/model_maker/image_classification example, but with some small changes to upload custom images instead of downloading the example dataset

Be careful that Colab limits your usage, so try to disconnect after retraining and downloading are finished, else you won't be able to access the perks of the GPU for a while

(This is a copy of the contents of the colab link to save locally, you don't have to do anything here)

# Step 0: Prerequisites

In [None]:
import os

In [None]:
!pip install -q tflite-model-maker

In [None]:
import numpy as np

import tensorflow as tf
assert tf.__version__.startswith('2')

from tflite_model_maker import model_spec
from tflite_model_maker import image_classifier
from tflite_model_maker.config import ExportFormat
from tflite_model_maker.config import QuantizationConfig
from tflite_model_maker.image_classifier import DataLoader

import matplotlib.pyplot as plt

# Step 1: Loading data from local disk into TF ImageDataset object
**Before running, go to the folder icon and drag a zipped dataset into /content/. (This should be the default location when opeing the files tab, it contains sample_data by default, place the zipped data set next to sample_data)**

The dataset must have images properly sorted into labeled sub directories. Each sub directory will correspond to a new class in the models head layer. For example, make sure all images of plastic bottles are in the directory dataset/plastic_bottles/. 

In [None]:
!unzip -q Recyclables.zip

Set image path to the name of the uploaded data folder

In [None]:
image_path = 'Recyclables'

Using tflife_model_maker/image_classifer/Dataloader to load the data from folder. This will automatically label the images with the name of thier resident directory. 

In [None]:
data = DataLoader.from_folder(image_path)

Showing an example of 25 images to make sure the data was loaded in properly

In [None]:
plt.figure(figsize=(10,10))
for i, (image, label) in enumerate(data.gen_dataset().unbatch().take(25)):
  plt.subplot(5,5,i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(image.numpy(), cmap=plt.cm.gray)
  plt.xlabel(data.index_to_label[label.numpy()])
plt.show()

## Train test splitting
Here we are splitting the data into training, validation, and test data, with a 0.8 to 0.1 to 0.1 split, respectively. 

In [None]:
train_data, rest_data = data.split(0.8)
validation_data, test_data = rest_data.split(0.5)

In [None]:
print(type(train_data))

# Step 2: Customizing the TF model

Choose pretrained model to customize. Options using ModelMaker are: 
* 'efficientnet_lite0',
*'efficientnet_lite1',
*'efficientnet_lite2',
*'efficientnet_lite3',
*'efficientnet_lite4',
*'mobilenet_v2',
*'resnet_50'

Through our benchmarks, we prioritized non-recyclable waste accuracy to minimize false positives, and we found that mobilenet_v2 with 15 epochs is the way to go for our current dataset. Benchmarks for models that we tested are included in the GitHub repo under benchmarking sheet.xlsx(open this with Microsoft excel for images). If there are any significant changes to the dataset, please do step 3 for more benchmarking.:

Note on the benchmarking sheet:

The first page of the benchmarking sheet includes data that we looked at to decide on what model_spec we should use, and the second sheet, marked "Hyperparams" is the data we looked at to decide the hyperparameters we would use for our model (epochs and batch_size). feel free to mess around with these variables in the future to find a

In [None]:
model_spec='mobilenet_v2'

In [None]:
model = image_classifier.create(
  train_data, 
  validation_data=validation_data,
  model_spec=model_spec,
  epochs=20,
  batch_size = 256
)

# Step 3: Evalutate the model

optional, do this to test out different specs.
if you don't need to test them, skip this step



In [None]:
loss, accuracy = model.evaluate(test_data)


In [None]:
# A helper function that returns 'red'/'black' depending on if its two input
# parameter matches or not.
def get_label_color(val1, val2):
  if val1 == val2:
    return 'black'
  else:
    return 'red'

# Then plot 100 test images and their predicted labels.
# If a prediction result is different from the label provided label in "test"
# dataset, we will highlight it in red color.
plt.figure(figsize=(20, 20))
predicts = model.predict_top_k(test_data)
for i, (image, label) in enumerate(test_data.gen_dataset().unbatch().take(100)):
  ax = plt.subplot(10, 10, i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(image.numpy(), cmap=plt.cm.gray)

  predict_label = predicts[i][0][0]
  color = get_label_color(predict_label,
                          test_data.index_to_label[label.numpy()])
  ax.xaxis.label.set_color(color)
  plt.xlabel('Predicted: %s' % predict_label)
plt.show()

# Export the model

Run this to download the model as a .tflite file that you can load to the Raspberry pi

In [None]:
model.export(export_dir='.', tflite_filename= f'{model_spec}.tflite')

In [None]:
# Download the TFLite model to your local computer.
from google.colab import files
files.download(f'{model_spec}.tflite')