<a href="https://colab.research.google.com/github/aivis-ai/pet-classification/blob/master/Pet_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cats v/s Dogs Classification

### Note - Change runtime to GPU

## Imports

In [0]:
import os
from zipfile import ZipFile
import shutil
import random

import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras import Model
from tensorflow.keras.layers import InputLayer, Conv2D, MaxPool2D, Flatten, Dense

In [2]:
print(tf.version.VERSION)

2.2.0


## Dataset Prep

### Get the Dataset
1. Download the Dataset from here - https://www.kaggle.com/c/dogs-vs-cats/data.  
2. Upload it to your Google Drive and then use the following code to mount the dataset on Google Colab  
a. !cp '/content/gdrive/My Drive/< file_path_on_google_drive >' < file_path_in_colab >

In [3]:
from google.colab import drive
drive.mount('/content/gdrive')

!cp '/content/gdrive/My Drive/Datasets/dogs-vs-cats.zip' '/tmp/dogs-vs-cats.zip'

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


### Unzip the data

In [0]:
zip_location = '/tmp/dogs-vs-cats.zip'

with ZipFile(zip_location, 'r') as zip:
  zip.extractall('/tmp')

In [0]:
train_zip_location = '/tmp/train.zip'
test_zip_location = '/tmp/test1.zip'
test_csv_location = '/tmp/sampleSubmission.csv'

In [0]:
with ZipFile(train_zip_location, 'r') as zip:
  zip.extractall('/tmp')

In [0]:
with ZipFile(test_zip_location, 'r') as zip:
  zip.extractall('/tmp')

### Define Paths and Create Directories

In [0]:
base_path = '/tmp'

source_path = os.path.join(base_path, 'train') 

train_path = os.path.join(base_path, 'training')
validation_path = os.path.join(base_path, 'validation')
test_path = os.path.join(base_path, 'test1')

train_cats_dir = os.path.join(train_path, 'cats')
train_dogs_dir = os.path.join(train_path, 'dogs')

validation_cats_dir = os.path.join(validation_path, 'cats')
validation_dogs_dir = os.path.join(validation_path, 'dogs')

In [0]:
try:
  os.mkdir(train_path)
  os.mkdir(validation_path)
  os.mkdir(train_cats_dir)
  os.mkdir(train_dogs_dir)
  os.mkdir(validation_cats_dir)
  os.mkdir(validation_dogs_dir)
except OSError:
  pass

### Use 90% of Files for Training and 10% for Validation 

In [0]:
def copyfiles(source, list_of_files, train_dir, validation_dir, split_ratio=0.9):
  random.shuffle(list_of_files)
  split = int(len(list_of_files)*split_ratio)
  train_data, validation_data = list_of_files[:split], list_of_files[split:]

  for filename in train_data:
    shutil.copyfile(os.path.join(source,filename), os.path.join(train_dir,filename))
  
  for filename in validation_data:
    shutil.copyfile(os.path.join(source,filename), os.path.join(validation_dir,filename))


In [0]:
def split_data(source, train_cats_dir, train_dogs_dir, validation_cats_dir, validation_dogs_dir, split_ratio = 0.9):
  all_files = os.listdir(source)
  cat_files = []
  dog_files = []

  for filename in all_files:
    temp = filename.split('.')
    if temp[0] == 'cat':
      cat_files.append(filename)
    else:
      dog_files.append(filename)

  copyfiles(source, cat_files, train_cats_dir, validation_cats_dir, split_ratio)
  copyfiles(source, dog_files, train_dogs_dir, validation_dogs_dir, split_ratio)

  return


In [0]:
split_data(source_path, train_cats_dir, train_dogs_dir, validation_cats_dir, validation_dogs_dir, 0.9)

### Data Preprocessing

In [15]:
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   rotation_range = 40,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

train_generator = train_datagen.flow_from_directory(train_path, batch_size = 128, class_mode = 'binary', target_size = (150,150))

validation_datagen = ImageDataGenerator(rescale = 1./255)

validation_generator = validation_datagen.flow_from_directory(validation_path, batch_size = 128, class_mode = 'binary', target_size = (150, 150))

Found 22500 images belonging to 2 classes.
Found 2500 images belonging to 2 classes.


 ## Training

Using Classes helps when you want to customize your models and it's layers - https://www.tensorflow.org/guide/keras/custom_layers_and_models 

If you want to Improve accuracy further you can try the following
1. Increase Epochs
2. Add Dropouts
3. Add l2 regularization
4. try adding more layers
5. Add Batch Norm
6. Use a PreTrained Model

### Define Model

In [0]:
class Classifier(Model):
  def __init__(self):
    super(Classifier, self).__init__()
    self.conv1 = Conv2D(16, (5,5), activation='relu', padding = 'same')
    self.maxpool = MaxPool2D(2,2)
    self.conv2 = Conv2D(32, (5,5), activation='relu', padding = 'same')
    self.conv3 = Conv2D(64, (3,3), activation='relu', padding = 'same')
    self.conv4 = Conv2D(128, (3,3), activation='relu', padding = 'same')
    self.flatten = Flatten()
    self.dense1 = Dense(512, activation='relu')
    self.dense2 = Dense(1, activation='sigmoid')

  def call(self, inputs):
    x = self.conv1(inputs)
    x = self.maxpool(x)
    x = self.conv2(x)
    x = self.maxpool(x)
    x = self.conv3(x)
    x = self.maxpool(x)
    x = self.conv4(x)
    x = self.maxpool(x)
    x = self.flatten(x)
    x = self.dense1(x)
    x = self.dense2(x)
    return x

In [0]:
classifier = Classifier()

classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics = ['accuracy'])

In [18]:
classifier.build(input_shape = (128, 150, 150, 3))
classifier.summary()

Model: "classifier"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              multiple                  1216      
_________________________________________________________________
max_pooling2d (MaxPooling2D) multiple                  0         
_________________________________________________________________
conv2d_1 (Conv2D)            multiple                  12832     
_________________________________________________________________
conv2d_2 (Conv2D)            multiple                  18496     
_________________________________________________________________
conv2d_3 (Conv2D)            multiple                  73856     
_________________________________________________________________
flatten (Flatten)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  5

### Define Callbacks

In [0]:
# Save Checkpoints in your Drive if you want to restart training for a particular epoch
# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "/content/gdrive/My Drive/Checkpoints/cp{epoch:04d}.ckpt" 
checkpoint_dir = os.path.dirname(checkpoint_path)

try:
  os.mkdir(checkpoint_dir)
except OSError:
  pass

In [0]:
# Create a callback that saves the model's weights every epoch and keeps the best weights after completion of training
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1, 
    save_weights_only=True,
    save_best_only=True,
    save_freq='epoch')

early_stop_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience = 3, restore_best_weights=True, min_delta = 0.001)

### Train

In [21]:
history = classifier.fit(train_generator, validation_data=validation_generator, epochs = 10, callbacks = [early_stop_callback, cp_callback])

Epoch 1/10
Epoch 00001: val_loss improved from inf to 0.68281, saving model to /content/gdrive/My Drive/Checkpoints/cp0001.ckpt
Epoch 2/10
Epoch 00002: val_loss improved from 0.68281 to 0.63359, saving model to /content/gdrive/My Drive/Checkpoints/cp0002.ckpt
Epoch 3/10
Epoch 00003: val_loss improved from 0.63359 to 0.59518, saving model to /content/gdrive/My Drive/Checkpoints/cp0003.ckpt
Epoch 4/10
Epoch 00004: val_loss improved from 0.59518 to 0.52922, saving model to /content/gdrive/My Drive/Checkpoints/cp0004.ckpt
Epoch 5/10
Epoch 00005: val_loss improved from 0.52922 to 0.47029, saving model to /content/gdrive/My Drive/Checkpoints/cp0005.ckpt
Epoch 6/10
Epoch 00006: val_loss did not improve from 0.47029
Epoch 7/10
Epoch 00007: val_loss improved from 0.47029 to 0.43420, saving model to /content/gdrive/My Drive/Checkpoints/cp0007.ckpt
Epoch 8/10
Epoch 00008: val_loss improved from 0.43420 to 0.40666, saving model to /content/gdrive/My Drive/Checkpoints/cp0008.ckpt
Epoch 9/10
Epoch 0

### Freeze Model
We need to Freeze the model before deploying on OpenVino

In [0]:
for layer in classifier.layers:
  layer.trainable = False

### Save Model
This will save the model's architecture, weights and training configuration. This allows you to export a model o it can be used without access to the original Python code.

Saving a fully-functional model is useful as you can load them in TensorFlow.js, run on mobile devices using TF Lite

In [0]:
# Save Model in your Drive if you want to restart training for a particular epoch
model_path = "/content/gdrive/My Drive/Models/pet_classification" 
model_dir = os.path.dirname(model_path)

try:
  os.mkdir(model_dir)
except OSError:
  pass

In [24]:
classifier.save(model_path)

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: /content/gdrive/My Drive/Models/pet_classification/assets


To Load the saved model you can use the following code

In [0]:
classifier = tf.keras.models.load_model(model_path)

In [26]:
!pip install git+https://github.com/onnx/tensorflow-onnx

Collecting git+https://github.com/onnx/tensorflow-onnx
  Cloning https://github.com/onnx/tensorflow-onnx to /tmp/pip-req-build-ynlq3l0g
  Running command git clone -q https://github.com/onnx/tensorflow-onnx /tmp/pip-req-build-ynlq3l0g
Collecting onnx>=1.4.1
[?25l  Downloading https://files.pythonhosted.org/packages/36/ee/bc7bc88fc8449266add978627e90c363069211584b937fd867b0ccc59f09/onnx-1.7.0-cp36-cp36m-manylinux1_x86_64.whl (7.4MB)
[K     |████████████████████████████████| 7.4MB 4.5MB/s 
Building wheels for collected packages: tf2onnx
  Building wheel for tf2onnx (setup.py) ... [?25l[?25hdone
  Created wheel for tf2onnx: filename=tf2onnx-1.7.0-cp36-none-any.whl size=181185 sha256=58643f86a4667e6eeb70cda6b3b46361fe72ed45ae5f15620060209c55620cda
  Stored in directory: /tmp/pip-ephem-wheel-cache-1_34hwg0/wheels/db/db/21/74f30455028095a1ee011391af71fb68fde8660aad68602f2a
Successfully built tf2onnx
Installing collected packages: onnx, tf2onnx
Successfully installed onnx-1.7.0 tf2onnx-1.

In [27]:
!python -m tf2onnx.convert --saved-model '/content/gdrive/My Drive/Models/pet_classification' --output '/content/gdrive/My Drive/Models/model.onnx'

2020-06-13 22:17:31.645842: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-13 22:17:33.739935: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-13 22:17:33.742879: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-13 22:17:33.743771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-06-13 22:17:33.743830: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-13 22:17:33.745802: I tensorflow/stream_executor/pl

In [0]:
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2


def freeze(model, outputdir):
# Convert Keras model to ConcreteFunction
    full_model = tf.function(lambda x: model(x))
    full_model = full_model.get_concrete_function(
        tf.TensorSpec((1, 150, 150, 3), tf.float32)
    )
    frozen_func = convert_variables_to_constants_v2(full_model)
    frozen_func.graph.as_graph_def()
    # Save frozen graph from frozen ConcreteFunction to hard drive
    path = tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
                      logdir=outputdir,
                      name="frozen_graph.pb",
                      as_text=False)

    print(path)
    layers = [op.name for op in frozen_func.graph.get_operations()]

    print("Frozen model layers: ")
    for layerName in layers:
        print(f"layer: {layerName}")
 
    print("-" * 50)
    print("Frozen model inputs: ")
    print(frozen_func.inputs)
    print("Frozen model outputs: ")
    print(frozen_func.outputs)


In [29]:
freeze(classifier, './content/gdrive/My Drive/Models/')

./content/gdrive/My Drive/Models/frozen_graph.pb
Frozen model layers: 
layer: x
layer: classifier/9244
layer: classifier/9230
layer: classifier/9242
layer: classifier/9252
layer: classifier/9232
layer: classifier/9246
layer: classifier/9238
layer: classifier/9234
layer: classifier/9250
layer: classifier/9240
layer: classifier/9236
layer: classifier/9248
layer: Func/classifier/StatefulPartitionedCall/input_control_node/_0
layer: Func/classifier/StatefulPartitionedCall/input/_1
layer: Func/classifier/StatefulPartitionedCall/StatefulPartitionedCall/input_control_node/_16
layer: Func/classifier/StatefulPartitionedCall/StatefulPartitionedCall/input/_17
layer: Func/classifier/StatefulPartitionedCall/StatefulPartitionedCall/conv2d/StatefulPartitionedCall/input_control_node/_32
layer: Func/classifier/StatefulPartitionedCall/StatefulPartitionedCall/conv2d/StatefulPartitionedCall/input/_33
layer: Func/classifier/StatefulPartitionedCall/input/_2
layer: Func/classifier/StatefulPartitionedCall/Stat

## To Do
1. Add more comments
2. Try Improving val accuracy to 95%
3. Compare with testing data
4. Plots
5. (New) add keras tuner