# GR5242 Final Project Report

## Team members:
- Jia, Kewei (kj2408@columbia.edu)
- Zhang, Yini (yz3005@columbia.edu)
- Zhu, Chenyun (cz2434@columbia.edu)

## Overview

The [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html) is an important image classification dataset. It consists of 60000 32x32 colour images in 10 classes (airplanes, automobiles, birds, cats, deer, dogs, frogs, horses, ships, and trucks), with 6000 images per class. There are 50000 training images and 10000 test images.<br>

The **GOALS** of this project are to:
- Learn how to preprocess the image data
- Implement different Convolutional Neural Networks (CNN) classifiers using GPU-enabled Tensorflow and Keras API
- Compare different CNN architectures

**Tools:**
- GPU-enabled Tensorflow
- Keras API

## 1. Data Exploration & Preprocessing

(Please refer to the *Data Exploration and Preprocessing.ipynb* for detailed code.)

### 1-1. Data Description

The version we used is [CIFAR-10 python version](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz).

The original CIFAR 10 training dataset has five batches of files, each contains 10,000 images. The test dataset has one file that contains 10,000 images. We use functions in our script **load_data_helper_functions.py** to load both images and labels in training and test data.

The training set we get is numpy ndarray with shape (50,000, 3072) and test set is numpy ndarray with shape (10,000, 3072). Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.

The labels for training and test dataset are numpy array with shape (50,000, 1) and (10,000, 1). They are not one-hot-encoding yet.

### 1-2. Data Exploration

Here we reshape each row into a (32,32,3) numpy array, with one inner array as one pixel with three channels: red, green and blue. The reshaped training data is of shape (50,000, 32, 32, 3). The reshaped test data is of shape (10,000, 32, 32, 3).

Then we plot the first 10 images in training set with true class labels. This is for better understanding of the dataset. The images are plotted using functions in our script **preprocess_data.py**.

__The first 10 images in training set:__


<img style="float: left;" src="figs/first10.png">

### 1-3. Data Preprocessing

To prepare data for training CNN models, we do the following things: 

First, we convert image labels to one-hot-encoding.

Next, we inflate the size of training dataset by adding randomly distorted images which are cropped, horizontally flipped, or adjusted in terms of hue, contrast and saturation. This way of distorting images will include different variation of images in training set, and will therefore make the CNN model we trained to generalize better in test dataset. We got this idea of data preprocessing from [Magnus Erik Hvass Pedersen](http://www.hvass-labs.org/).

Last, the test dataset will be images cropped around center without any other adjustment. The cropped size is the same as that in training set.

__Plot the distorted image__<br>
Here are 10 examples of the 321st image in test dataset after preprocessing:

<img style="float: left;" src="figs/distorted.png"> 

As we can see, the distorted images are eithered flipped or adjusted in some way that varies from original image. These images will later be used to train CNN model.

## 2. Convolutional Neural Networks (CNN) with Keras
### 2-1. Brief Introduction for Keras

'Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.(1)' The reasons we tried Keras are as follows:
- Easy to get started with
- Results in much more readable and succinct code
- Able to run on GPU (much faster than CPU)

### 2-2. CNN with Keras

__I. We first created the basic CNN with 100 Epochs.__<br>
(Please refer to the *Keras_CNN_Baseline.ipynb* for detailed code.)

__Architecture:__ <br>
input = ($32\times32\times3$) -- <br>
2D convolution layer with filters size $32$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D convolution layer with filters size $32$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D Maxpooling -- <br>
20% Dropout -- <br>

2D convolution layer with filters size $64$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D convolution layer with filters size $64$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D Maxpooling -- <br>
25% Dropout -- <br>
Softmax Output

__Baseline CNN Result:__

<img style="float: left; width:60%" src="output/baseline_CNN.png">

__Insights:__ <br>
Notice that the test result is 79.7%. Let's try to add more layers to the model and see if the test accuracy has any improve.

__II. Now we added more layers to the previous model.__<br>
(Please refer to the *Keras_CNN_Baseline_More_Layer.ipynb* for detailed code.)

__Architecture:__ <br>
input = ($32\times32\times3$) -- <br>
2D convolution layer with filters size $32$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D convolution layer with filters size $32$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D Maxpooling -- <br>
20% Dropout -- <br>

2D convolution layer with filters size $64$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D convolution layer with filters size $64$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D Maxpooling -- <br>
25% Dropout -- <br>

2D convolution layer with filters size $128$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D convolution layer with filters size $128$ and kernel size $3\times3$ plus ReLU activation -- <br>
2D Maxpooling -- <br>
30% Dropout -- <br>
Softmax Output

__Baseline CNN with More Layers Result:__

<img style="float: left; width:60%" src="output/baseline_CNN_moreLayers.png">

__Insights:__<br>
The good thing is test accuracy increases from 79.7% to 83.1%. However, the model seems overfitting.

__III. Try to prevent overfitting by adding batchnormalization and kernel regularizer__<br>
Idea from the Kaggle comment of [EricAlcaideAldeano](https://www.kaggle.com/ericalcaide9834/discussion) <br>
(Please refer to the *Keras_CNN_Prevent_Overfitting.ipynb* for detailed code.)

__Architecture:__ <br>
input = ($32\times32\times3$) -- <br>
2D convolution layer with filters size $32$ and kernel size $3\times3$ plus ReLU activation and *regularizer(0.001)* -- <br>
*BatchNormalization* -- <br>
2D convolution layer with filters size $32$ and kernel size $3\times3$ plus ReLU activation and *regularizer(0.001)* -- <br>
*BatchNormalization* -- <br>
2D Maxpooling -- <br>
20% Dropout -- <br>

2D convolution layer with filters size $64$ and kernel size $3\times3$ plus ReLU activation and *regularizer(0.001)* -- <br>
*BatchNormalization* -- <br>
2D convolution layer with filters size $64$ and kernel size $3\times3$ plus ReLU activation and *regularizer(0.001)* -- <br>
*BatchNormalization* -- <br>
2D Maxpooling -- <br>
25% Dropout -- <br>

2D convolution layer with filters size $128$ and kernel size $3\times3$ plus ReLU activation and *regularizer(0.001)* -- <br>
*BatchNormalization* -- <br>
2D convolution layer with filters size $128$ and kernel size $3\times3$ plus ReLU activation and *regularizer(0.001)* -- <br>
*BatchNormalization* -- <br>
2D Maxpooling -- <br>
30% Dropout -- <br>
Softmax Output

__ Previous Model with Batchnorm and Kernel Regularizer Added Result:__

<img style="float: left; width:60%" src="output/prevent_overfit.png">

__Insights:__<br>
After applied batchnormalization and regularizer, we can see the test accuracy 85.2% is better than the previous model 83.1%. In addition, the generalization of the model improves as we can see in the graph (the gap between training accuracy and validation accuracy is smaller than before).

### 2.3 Findings of CNN with Keras

- Add relatively more layers can achieve higher accuracy.
- Batchnormalization and kernel regularizaer can help us prevent overfitting and keep the weights small so that the model can generalize well.

### 2.4 Next Steps

Apply data augmentation to the model. It takes 17 hours to run 25 epochs so the limited time is the main challenge for us to add data augmentation.

## Reference

(1) *Keras: The Python Deep Learning Library. keras.io/#keras-the-python-deep-learning-library.*