# Demo:  CNN Classifier for CIFAR10

In this demo, you will learn to:

* Load the classic CIFAR10 dataset from keras and visualize the images
* Train and test a simple CNN classifier for the dataset
* Enhance the classifiers with batch normalization, dropout and data augmentation and evaluate the relative performance gains.


The [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) is a collection of 60,000 color, `32x32`-pixel images in ten classes. Classes include common objects such as airplanes, automobiles, birds, cats and so on. There are 50,000 train and 10,000 test images. Keras can automatically download the dataset from `keras.datasets`. Note that it will take some time to downloading the dataset for the first time. 

State of the art results are achieved using very large Convolutional Neural networks. Model performance is reported in this [classification accuracy table](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#43494641522d3130), with state-of-the-art results at 96.5%. Note that for this problem human performance is roughly 94%.  In this demo, we will use a very basic/shallow CNN.  With suitable enhancements we can performance ~78%.

Most of the code and the description in this demo is taken from `cifar10_cnn.py` available at [keras-team Github page](https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py) as well as from the tutorial page by [Jason Brownlee](https://machinelearningmastery.com/object-recognition-convolutional-neural-networks-keras-deep-learning-library/).  Also, thanks to [Phil Schniter](http://www2.ece.ohio-state.edu/~schniter/) for helping adjust some parameters.


## Loading Basic Packages

We first load some basic packages.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from tensorflow.keras.datasets import cifar10
import pickle

## Computing Environment

This demo will be **very slow** unless a GPU is used.  The function below is one way to test if your installation of Tensorflow has access to a GPU. There are many reasons it may not: for example, Tensorflow does not support GPUs at all on Macs. 

If you do not have access to a GPU on your local machine, we highly recommend running this lab through [Google Colab](https://colab.research.google.com/). To make sure Colab is using a GPU, click on the Runtime tab and then Change Runtime Environment. Select GPU under hardware acceleration. Colab even has access to Google's special purpose [Tensor Processing Units](https://en.wikipedia.org/wiki/Tensor_processing_unit) but we found these to be significantly slower than using the standard GPU acceleration.

In [None]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Create a 3×3 plot of photographs. The images have been scaled up from their small 32×32 size, but you can clearly see trucks horses and cars. You can also see some distortion in some images that have been forced to the square aspect ratio.

In [None]:
# load data
(Xtr,ytr), (Xts,yts) = cifar10.load_data()
ntr, nrow, ncol, nchan = Xtr.shape
nts = Xts.shape[0]

print('Xtr shape:  ' + str(Xtr.shape))
print('Xts shape:  ' + str(Xts.shape))

In [None]:
def plt_image(im):
    plt.imshow(im)
    plt.xticks([])
    plt.yticks([])
    
# Print a few random samples
nplot = 9
I = np.random.permutation(ntr)
for i in range(0, 9):
    plt.subplot(3,3,i+1)
    plt_image(Xtr[I[i]])

Import some more packages for building our CNN model and saving the trained model.

In [None]:
from __future__ import print_function

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import load_model #save and load models
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D, BatchNormalization
from tensorflow.keras import optimizers
import tensorflow.keras.backend as K

The pixel values are in the range of 0 to 255 for each of the red, green and blue channels.

It is good practice to work with normalized data. Because the input values are well understood, we can easily normalize to the range 0 to 1 by dividing each value by the maximum observation which is 255.  Note, the data is loaded as integers, so we must cast it to floating point values in order to perform the division.

In [None]:
Xtr = Xtr.astype('float32') / 255.
Xts = Xts.astype('float32') / 255.

## Testing Different Classifiers

We now define a function to create a model.  The function has two paramters:

* `use_bn`:  Adds BatchNormalization.
* `use_dropout`:  Adds Dropout.

By setting the parameters, we can experiment with different model features and compare their performance.  The model has two convolutional layers + two FC layers.  Dropout, if added, is done on the FC layers.

In [None]:
def create_mod(use_dropout=False, use_bn=False):
    num_classes = 10
    model = Sequential()
    model.add(Conv2D(32, (3, 3), 
                     padding='valid', activation='relu',
                     input_shape=Xtr.shape[1:]))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    if use_bn:
        model.add(BatchNormalization())
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    if use_bn:
        model.add(BatchNormalization())
    if use_dropout:
        model.add(Dropout(0.5))
    model.add(Dense(512, activation = 'relu'))
    if use_bn:
        model.add(BatchNormalization())
    if use_dropout:
        model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))
    
    return model

To study data augmentation, we also create an `ImageDataGenerator` object that will create augmented images for the training data set.

In [None]:
def create_datagen():
    datagen = ImageDataGenerator(
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        # randomly shift images horizontally (fraction of total width)
        width_shift_range=0.05,
        # randomly shift images vertically (fraction of total height)
        height_shift_range=0.05,
        horizontal_flip=True,  # randomly flip images
        # image data format, either "channels_first" or "channels_last"
        data_format="channels_last")
    return datagen

We can visualize randomly transform data below. You can change the parameters above to generate more extreme random changes. 

In [None]:
def plt_image(im):
    plt.imshow(im)
    plt.xticks([])
    plt.yticks([])
    
# Print a few random samples
datagen1 = create_datagen()
nplot = 9
for i in range(0, 9):
    plt.subplot(3,3,i+1)
    plt_image(datagen1.random_transform(Xtr[I[i]]))

Now, we run over all possible options:  The options are:
        
* `basic`:  Basic CNN, no batch normalization or dropout   
* `bn`:  Basic CNN + batch normalization
* `dropout`:  Basic CNN + batch normalization + dropout
* `dataaug`:  Basic CNN + batch normalization + dropout + data augmentation

This will take tens of minutes per model even on a GPU. 

We run each in a seperate code cells: otherwise Colab might time out for "inactivity" on a single block. First we set some common parameters:

In [None]:
# Parameters
nepochs = 100
batch_size = 32
lr = 1e-3
decay = 1e-4

# Create the optimizer
opt = optimizers.RMSprop(lr=lr, decay=decay)

In [None]:
# The following line you can ignore. It was needed to properly use the current version of Tensorflow on my Macbook 
# due to issues with OpenMP. Leaving here in case it's useful for others.
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

In [None]:
# basic model

K.clear_session()
model = create_mod()  

# Compile
hist = model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
print(model.summary()) 

# Fit the model
hist = model.fit(Xtr, ytr, batch_size=batch_size, epochs=nepochs, validation_data=(Xts, yts), shuffle=True)

# Save history
mod_name = 'basic'
hist_fn = ('hist_%s.p' % mod_name)
with open(hist_fn, 'wb') as fp:
    hist_dict = hist.history
    pickle.dump(hist_dict, fp) 
print('History saved as %s' % hist_fn)                                

In [None]:
# bn model (batch normalization add)

# Create the basic CNN model
K.clear_session()
model = create_mod(use_bn=True)

# Compile
hist = model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
print(model.summary()) 

# Fit the model
hist = model.fit(Xtr, ytr, batch_size=batch_size, epochs=nepochs, validation_data=(Xts, yts), shuffle=True)

# Save history
mod_name = 'bn'
hist_fn = ('hist_%s.p' % mod_name)
with open(hist_fn, 'wb') as fp:
    hist_dict = hist.history
    pickle.dump(hist_dict, fp) 
print('History saved as %s' % hist_fn)                                

In [None]:
# dropout model (batch normalization and dropout added)

# Create the basic CNN model
K.clear_session()
model = create_mod(use_bn=True, use_dropout=True)   

# Compile
hist = model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
print(model.summary()) 

# Fit the model
hist = model.fit(Xtr, ytr, batch_size=batch_size, epochs=nepochs, validation_data=(Xts, yts), shuffle=True)

# Save history
mod_name = 'dropout'
hist_fn = ('hist_%s.p' % mod_name)
with open(hist_fn, 'wb') as fp:
    hist_dict = hist.history
    pickle.dump(hist_dict, fp) 
print('History saved as %s' % hist_fn)                                

In [None]:
## NOTE: For some reason running model.fit with a data generator is not working in Google Colab right now.
## I'm having trouble diagoning why, so have commented out this part of the lab for now.

'''
# dataaug model (data augmentation added)

# Create the basic CNN model
K.clear_session()
model = create_mod(use_bn=True, use_dropout=True) 

# Compile
hist = model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
print(model.summary()) 

# Create a data augmentation object
datagen = create_datagen()

# Fit the model
hist = model.fit(datagen.flow(Xtr, ytr, batch_size=batch_size), epochs=nepochs, validation_data=(Xts,yts)) 

# Save history
mod_name = 'dataaug'
hist_fn = ('hist_%s.p' % mod_name)
with open(hist_fn, 'wb') as fp:
    hist_dict = hist.history
    pickle.dump(hist_dict, fp) 
print('History saved as %s' % hist_fn) 
'''

We now plot the results.  You should approximately get

* `baseline` ~71%
* `bn` ~75%
* `bn+dropout` ~79%

So batch normalization and dropout help.  Note that `baseline` and `bn` get training accuracies of ~100% suggesting overfitting.

In [None]:
mod_name_plot = ['basic', 'bn', 'dropout']
plt.figure(figsize=(10,5))
for iplt in range(2):
    
    plt.subplot(1,2,iplt+1)
    for i, mod_name in enumerate(mod_name_plot):

        # Load history
        hist_fn = ('hist_%s.p' % mod_name)
        with open(hist_fn, 'rb') as fp:        
            hist_dict = pickle.load(fp) 

        acc = hist_dict['val_accuracy']
        plt.plot(acc, '-', linewidth=3)
    
    n = len(acc)
    nepochs = len(acc)
    plt.grid()
    plt.xlim([0, nepochs])
    plt.legend(['baseline', 'bn', 'bn+dropout', 'bn+dropout+aug'])
    plt.xlabel('Epoch')
    if iplt == 0:
        plt.ylabel('Train accuracy')
    else:
        plt.ylabel('Test accuracy')
        
plt.tight_layout()

Print final accuracies:

In [None]:
for i, mod_name in enumerate(mod_name_plot):

    # Load history
    hist_fn = ('hist_%s.p' % mod_name)
    with open(hist_fn, 'rb') as fp:        
        hist_dict = pickle.load(fp) 
        
    # Print average of last 5 acc
    acc = hist_dict['val_accuracy']
    print('%15s :  %5.3f' % (mod_name, np.mean(acc[-5:])))