![GIKI_logo_with_text.png](attachment:GIKI_logo_with_text.png)

<h1><center>Deep Neural Networks (AI341) - Assignment No. 2 </center></h1>

# Classification of the CIFAR-10 dataset

The [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) provides 60000 32x32-pixel images, classified into 10 categories.  The figure below provides a random sample of some images in each category.

![images.png](Dataset.png)

In this assignment, you will learn how to build a Convolutional Neural Network (CNN), which (when trained) will be able to automatically classify new images into one of these categories, you will also learn to optimize your model through different techniques covered in your class.  We will make use of the [Keras library](https://www.tensorflow.org/guide/keras) which provides a high-level interface to TensorFlow.

# Table of content

[1. Introduction to keras](#intro_keras)<br>
[2. A first look at the data set](#dataset)<br>
[3. A first naive model](#first_model)<br>
[4. Interpreting the results](#results)<br>
[4.1 Making predictions](#results_prediction)<br>
[4.2 Evaluating the results](#results_evaluation)<br>



<a id='intro_keras'></a>

## 0 - Introduction to Keras

Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:

- __User friendly__: Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.
- __Modular and composable__: Keras models are made by connecting configurable building blocks together, with few restrictions.
- __Easy to extend__: Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

In Keras, models are built by assymbling multiple layers.  Suppose we want to create a new multilayer perceptron model to categorize 128-feature data into 10 labeled categories.  Keras code looks like:

```python
# Create a sequential model
model = keras.models.Sequential()
# Adds a densely-connected layer with 64 units to the model
model.add(layers.Dense(64, activation='relu'), input_shape=[128])
# Add another
model.add(layers.Dense(32, activation='relu'))
# Add a softmax layer with 10 output units
model.add(layers.Dense(10, activation='softmax'))
```

The `input_shape` argument must be given for the first layer in the model, however all other layers will automatically determine the input shape based on the previous layer in the model.  Note that the code above is substantially simpler than the corresponding TensorFlow code.  This is particularly useful for building convolutional or other types of layers, as we will see.

Once built, a model's learning can be configured with the `compile()` function:

```python
model.compile(
    loss='categorical_crossentropy', 
    optimizer=tf.train.AdamOptimizer(0.001), 
    metrics=['accuracy'])
```

In this case, a cross-entropy loss function is used with the ADAM optimization algorithm.  The `metrics` argument allows the model to keep track of a number of [training metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) during training.

Once configured, training is performed using the `fit()` function.

```python
model.fit(data, labels, epochs=10, batch_size=32)
```

The function takes an array-like (could be numpy array) of data and the corresponding target values, and performs the optimization of the learnable parameters in the model.  See the documentation for the [fit()](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model#fit) function for more details.

Once trained, the model can be used to predict, using the `predict()` function. 

```python
prediction = model.predict(new_data)
```

<a id='dataset'></a>
# 1 - Understanding the data set

Begin by importing the necessary modules.

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from keras.layers import Dense, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dropout, BatchNormalization
from keras.optimizers import SGD
from keras.datasets import cifar10
from keras.utils.np_utils import to_categorical
from keras.models import Model
from keras.models import Sequential
from keras.callbacks import EarlyStopping

Understanding your dataset is the first prerequisit to training any model.  The CIFAR-10 dataset can be loaded directly from Keras.

**1) Download the dataset. See [`keras.datasets`](https://keras.io/datasets/) for how to download the data, and in what format it is provided.  Note that the dataset is already divided into a training set of 50000 images, and a test set of 10000.**

In [None]:

(X_train,Y_train),(X_test,Y_test)= tf.keras.datasets.cifar10.load_data()

**2) Verify that the shape of the image and target arrays are what you expect.**


In [None]:
print("X_train {} and Y_train{}".format(X_train.shape,Y_train.shape))
print("X_test {} and Y_test{}".format(X_test.shape,Y_test.shape))

We now create a list of labels corresponding to the 10 categories.  It will be used to convert the 0-9 digits in the target arrays to string labels. The categories are labeled as follows:

  0. airplane
  1. automobile
  2. bird
  3. cat
  4. deer
  5. dog
  6. frog
  7. horse
  8. ship
  9. truck


In [None]:
Classes=['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']

**3) Normalize the image data from [0,255] to be [0,1].  Normalizing improves model training (to test this, you can comment out the normalization later).**

In [None]:
X_train=X_train/255
X_test=X_test/255

**4) Convert the target arrays to one-hot encodings.  Hint: checkout the [`keras.utils.np_utils.to_categorical()`](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical)**

In [None]:
Y_train=to_categorical(Y_train,num_classes=10)
Y_test=to_categorical(Y_test,num_classes=10)

**5) Visualize some images in each category using the `imshow()` function in `matplotlib.pyplot`.  Can you recreate the figure below?  Hint: the below figure was created using the first 8 images belonging to each category in the training data.**

![Dataset.png](attachment:Dataset.png)

In [None]:
Classes=['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']

<a id='first_model'></a>

# 2 - First naive model

In order to better understand the importance of CNNs, it is instructive to first see how well a naive dense network performs on the dataset.

**6) Create a sequential model with 4 `Dense` hidden layers of 2048, 1024, 512, and 256 nodes each, with ReLU activation, and a final output layer of 10 nodes. Compile the model with a `categorical_crossentropy` loss, using the SGD optimizer, and the `accuracy` metric. 
Note that you will need to use the `Flatten` layer first in order to convert the 3D (x, y, rgb) image data into 1D.**

In [None]:

from tensorflow.python import metrics
from tensorflow.keras import models,layers
from tensorflow.keras.models import Sequential
model=models.Sequential()
model.add(layers.Flatten())
model.add(layers.Dense(2048,activation='relu'))
model.add(layers.Dense(1024,activation='relu'))
model.add(layers.Dense(512,activation='relu'))
model.add(layers.Dense(256,activation='relu'))
model.add(layers.Dense(10))
model.compile(optimizer='SGD',
              loss=tf.keras.losses.categorical_crossentropy,
               metrics=['accuracy'])
model.fit(X_train,Y_train,epochs=8)

**7) Compute by hand the total number of trainable parameters (weights and biases) in the model.**

In the first layer we have 3072 neurons which is fully connected to hidden layer  2048  which results in 3072x2048+2048 =6293504
In the second layer we have 2048 neurons which is connected with dense layer having neurons 1024 which result in 2048X1024+1024=2098176
and so on for all till end 
for end we have 256 neurons in hidden layer which connected with output layer 
 having 10 neurons which result in 256X10+10=2570



**8) Use the `summary()` function on model to get a text summary of the model.  Did you compute the number of parameters correctly?**

In [None]:
model.summary()

**9) Train the model:**
  - Start with a small batch size of 32 and train for 10 epochs
  - Use early stopping on the validation accuracy with a patience of 2 (use 10% of your training set as the validation set)
  
**How does the model perform?**

In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.categorical_crossentropy,
               metrics=['accuracy'])
model.fit(X_train,Y_train,epochs=10,batch_size=32,callbacks=EarlyStopping(patience=0.2))

**10) Try changing the batch size to see if there is any improvement.**

In [None]:
model.fit(X_train,Y_train,epochs=10,batch_size=40)

**11) Try adding batch normalization after each hidden layer.  Any better?**

In [None]:
from tensorflow.python import metrics
from tensorflow.keras import models,layers
from tensorflow.keras.models import Sequential
model=models.Sequential()
model.add(layers.Flatten())
model.add(layers.Dense(2048,activation='relu'))
model.add(layers.BatchNormalization(momentum=0.99, epsilon=0.002))
model.add(layers.Dense(1024,activation='relu'))
model.add(layers.BatchNormalization(momentum=1, epsilon=0.003))
model.add(layers.Dense(512,activation='relu'))
model.add(layers.BatchNormalization(momentum=0.923, epsilon=0.003))
model.add(layers.Dense(256,activation='relu'))
model.add(layers.BatchNormalization(momentum=0.94, epsilon=0.034))
model.add(layers.Dense(10))
model.compile(optimizer='SGD',
              loss=tf.keras.losses.categorical_crossentropy,
               metrics=['accuracy'])
model.fit(X_train,Y_train,epochs=8)

<a id='cnn'></a>

# 3 - Convolutional Neural Network
 

Convolutional neural networks allow us to do drastically better on this dataset (and many image classification problems in general). In this task, you will build a convolutional network and see how it performs during training.

**12) Create a new model with the following layers**
  - 3x3 2D convolution with zero padding (same), 32 filters
  - ReLU activation
  - 3,3 2D convolution, no padding, 32 filters
  - ReLU activation
  - Max pooling with size (2,2)
  - 3x3 2D convolution, no padding, 64 filters
  - ReLU activation
  - 3x3 2D convolution, no padding, 64 filters
  - ReLU activation
  - Max pooling with size (2,2)
  - Flatten
  - Dense layer with 512 nodes, ReLU activation
  - Softmax output layer with 10 nodes
  
**Compile the network with same optimizer and metrics as the dense network.**  

In [None]:
from keras.layers.attention.multi_head_attention import activation
from tensorflow import keras
model1=models.Sequential()
model1.add(keras.layers.ZeroPadding2D(padding=(2, 2)))
model1.add(layers.Conv2D(32,(3,3),activation='relu',padding='valid'))
model1.add(layers.Conv2D(32,(3,3),activation='relu'))
model1.add(layers.MaxPool2D(pool_size=(2, 2)))
model1.add(layers.Conv2D(64,(3,3),activation='relu'))
model1.add(layers.Conv2D(64,(3,3),activation='relu'))
model1.add(layers.MaxPool2D(pool_size=(2, 2)))
model1.add(layers.Flatten())
model1.add(layers.Dense(512,activation='relu'))
model1.add(layers.Dense(10,activation='Softmax'))
model1.compile(optimizer='adam',
              loss=tf.keras.losses.categorical_crossentropy,
               metrics=['accuracy'])
model1.fit(X_train,Y_train)

**13) Compute by hand the number of trainable parameters in this network.  Are there more or less than the more simple dense network?  Why?  Confirm with `summary()`.**

In [None]:
model1.summary()

**14) Use the same training procedure as before for 10 epochs and batch size of 32. How does the validation accuracy change with each epoch?**

In [None]:
(X_train,Y_train),(X_test,Y_test)= tf.keras.datasets.cifar10.load_data()
X_test=X_test/255
Y_test=Y_test/255
Y_train=to_categorical(Y_train,num_classes=10)
Y_test=to_categorical(Y_test,num_classes=10)
model1.fit(X_train,Y_train,batch_size=32,epochs=10,validation_data=(X_test,Y_test))

**15) Increase the batch size to 64 and retrain.  Better or worse?  Try 128 as well.  How does increasing the batch size improve the training?**

In [None]:
#With batch size of 64 accuracy is improving 
model1.fit(X_train,Y_train,epochs=10,batch_size=64)
#With batch size of 128 accuracy is improving 
model.fit(X_train,Y_train,epochs=10,batch_size=128)

**16) Note how the validation accuracy begins to decrease at some point, while the training accuracy continues to increase.  What is this phenomena called?  Try adding 3 dropout layers to the model, one before each max pooling layer and one before the last layer, using a dropout ratio of 0.25.  Does this improve the model?**

In [None]:
model1=models.Sequential()
model1.add(keras.layers.ZeroPadding2D(padding=(2, 2)))
model1.add(layers.Conv2D(32,(3,3),activation='relu',padding='valid'))
model1.add(layers.Conv2D(32,(3,3),activation='relu'))
model1.add(layers.MaxPool2D(pool_size=(2, 2)))
model1.add(layers.Dropout(0.25))
model1.add(layers.Conv2D(64,(3,3),activation='relu'))
model1.add(layers.Conv2D(64,(3,3),activation='relu'))
model1.add(layers.MaxPool2D(pool_size=(2, 2)))
model1.add(layers.Dropout(0.25))
model1.add(layers.Flatten())
model1.add(layers.Dense(512,activation='relu'))
model1.add(layers.Dropout(0.25))
model1.add(layers.Dense(10,activation='Softmax'))
model1.compile(optimizer='adam',
              loss=tf.keras.losses.categorical_crossentropy,
               metrics=['accuracy'])


**17) Play with batch normalization.  For example, add batch normalization layers after each dropout layer.  Do you notice a faster increase in the model improvement? Why?**

In [None]:
model1=models.Sequential()
model1.add(keras.layers.ZeroPadding2D(padding=(2, 2)))
model1.add(layers.Conv2D(32,(3,3),activation='relu',padding='valid'))
model1.add(layers.Conv2D(32,(3,3),activation='relu'))
model1.add(layers.MaxPool2D(pool_size=(2, 2)))
model1.add(layers.Dropout(0.25))
model1.add(layers.BatchNormalization(momentum=1, epsilon=0.003))
model1.add(layers.Conv2D(64,(3,3),activation='relu'))
model1.add(layers.Conv2D(64,(3,3),activation='relu'))
model1.add(layers.MaxPool2D(pool_size=(2, 2)))
model1.add(layers.Dropout(0.25))
model1.add(layers.BatchNormalization(momentum=1, epsilon=0.003))
model1.add(layers.Flatten())
model1.add(layers.Dense(512,activation='relu'))
model1.add(layers.Dropout(0.25))
model1.add(layers.BatchNormalization(momentum=1, epsilon=0.003))
model1.add(layers.Dense(10,activation='Softmax'))
model1.compile(optimizer='adam',
              loss=tf.keras.losses.categorical_crossentropy,
               metrics=['accuracy'])


<a id='results'></a>

# 4 - Interpreting the results
 
<a id='results_prediction'></a>

## 4.1 - Making predictions

Assuming all went well during the previous tasks, you can now predict the category of a new image!  Here are a few examples of my predictions:

![Results.png](attachment:Results.png)

**18) Use `predict` on your trained model to test its prediction on a few example images of the test set. Using `imshow` and `hbar` from `matplotlib.pyplot`, try to recreate the image above for a few example images.**

<!---**Hint:** at this point, it is probably convenient to use the `save` and `load_model` functions from Keras.  You can save the model after training it, and then decide to load from saved file instead of building a new one (if available) on successive runs.--->

<a id='results_evaluation'></a>

## 4.2 Evaluating the results

A confusion matrix is often used in supervised learning to understand how well (or not) each category is being classified.  Each element (i,j) in the confusion matrix represents the predicted class j for each true class i.  Consider the following 10 predictions for a 2 category model predicting male or female:

| example     | true category  | predicted category  |
|-------------|----------------|---------------------|
| 1           | male           | male                |
| 2           | female         | male                |
| 3           | female         | female              |
| 4           | male           | male                |
| 5           | male           | female              |
| 6           | male           | male                |
| 7           | female         | female              |
| 8           | male           | female              |
| 9           | female         | female              |
| 10          | female         | female              |

Based on the above data, the model is accurate 70% of the time.  The confusion matrix is

|        | male | female |
|--------|------|--------|
| male   | 3    | 2      |
| female | 1    | 4      |

The confusion matrix gives us more information than a simple accuracy measurement.  In this case, we see that the class female has a higher accuracy over male.  

**19) Create the confusion matrix for the CIFAR-10 dataset using the test data.  What does it tell you about the relationships between each class?**

<a id='pretrained_cnn'></a>
# 5 - Improving on current performances

**20) Play with different CNN architectures. Provide a few attempts (atleast 1 and atmost 3)**

Note that several pre-trained networks are directly accessible via keras (see https://nbviewer.jupyter.org/github/fchollet/deep-learning-with-python-notebooks/blob/master/5.3-using-a-pretrained-convnet.ipynb)