# Lab 2: Convolutional Neural Network (CNN)

# 2.1 Dataset pre-processing
The first thing that we need to do when we are dealing with a new dataset is to operate some pre-processing operations. Data preprocessing usually refers to the steps applied to make data more suitable for learning. 
In this section we are going to deal with:
* 2.1.1 Dataset loading
* 2.1.2 Normalization
* 2.1.3 Standardization
* 2.1.4 Splitting and label preprocessing


## 2.1.1 Dataset loading
In this section we load the augmented dataset generated in the previous section

In [None]:
import tensorflow as tf
# Here we are importing the train and test set separated 
(train_X, train_Y),(test_X, test_Y) = tf.keras.datasets.cifar10.load_data()

print(train_X.shape, test_X.shape)



## 2.1.2 Normalization
One common practice in training a Neural Network is to normalize the images by dividing each pixel value by the maximum value that we can have, i.e. 255.<br>
The purpose of this is to obtain a mean close to 0.<br>
Normalizing the data generally speeds up learning and leads to faster convergence

In [None]:
import numpy as np
# Normalizing the data
print("Normalizing training set..")
train_X = np.asarray(train_X, dtype=np.float32) / 255										# Normalizing training set
print("Normalizing test set..")
test_X = np.asarray(test_X, dtype=np.float32) / 255											# Normalizing test set

## 2.1.3 Standardization
Another common practice in data pre-processing is standardization.<br>
The idea about standardization is to compute your dataset mean and standard deviation in order to subtract from every data point $x$ the dataset mean $\mu$ and then divide by the standard deviation $\sigma$.<br>
That is to apply the following operation:<br>
<img src="https://drive.google.com/uc?id=1rpuybw_fmI8XK38JQhWWxX2TOExBAV2V" width="150px"><br>
The outcome of this operation is to obtain a distribution with mean equal to 0 and a standard deviation equal to 1.<br>
By applying normalization to our data we are making the features more similar to each other and this usually makes the learning process easier.<br>
To better understand that we can show an example of what happens after a standardization process is applied to a dataset:
<img src="https://drive.google.com/uc?id=1wtqTW4hz8n8k7b7q0mUSzCc9X0npNUY2" width="500px" align="left"><br>

In [None]:
# Standardizing the data
def compute_mean_and_std(X):
	image_means = []
	image_stds = []

	mean = np.mean(train_X, axis=(0,1,2))
	std = np.std(train_X, axis=(0,1,2))
 
	return [mean, std]					

In [None]:
# For every image we subtract to it the dataset mean and we divide by the dataset standard deviation

dataset_mean, dataset_std = compute_mean_and_std(train_X)
print("Standardizing training set..")
train_X = (train_X-dataset_mean)/dataset_std												# Standardizing the training set
print("Standardizing test set..")
test_X = (test_X-dataset_mean)/dataset_std												# Standardizing the test set

## 2.1.4 Splitting and label preprocessing
Now we just need to split our training set in orer to get the validation set and convert our labels to one-hot representation

In [None]:
# Creating the validation set
from sklearn.model_selection import train_test_split
print("Splitting training set to create validation set..")
train_X, valid_X, train_Y, valid_Y = train_test_split(train_X, train_Y, test_size=0.2, random_state=13)
print(train_X.shape)
# Converting labels to one-hot representation
from keras.utils.np_utils import to_categorical
train_Y_one_hot = to_categorical(train_Y)						# Converting training labels to one-hot representation
valid_Y_one_hot = to_categorical(valid_Y)						# Converting validation labels to one-hot representation
test_Y_one_hot = to_categorical(test_Y)							# Converting test labels to one-hot representation

# 2.2 Training a model from scratch
Now that we have properly pre-processed our data, we are going to create a convolutional model in Keras. 
Usually a convolutional model is made by two subsequent part:
* A convolutional part
* A fully connected

We can show an example of the general structure in the next picture:
<img src="https://drive.google.com/uc?id=1duP8u9bs6ELNu4degUuYP4-YS1mBYn2O" width="600px"><br>

Usually the convolutional part is made by some layers composed by
* convolutional layer: performs a spatial convolution over images (see Conv2D)
* pooling layer: used to reduce the output spatial dimension from $n$ to 1 by averaging the $n$ different value or considering the maximum between them (see MaxPool2D)
* dropout layer: applied to a layer, consists of randomly "dropping out" (i.e. set to zero) a number of output features of the layer during training. (see DropOut)

The convolutional part produces its output and the fully connected part ties together the received information in order to solve the classification problem

In [None]:
# Creating the model from scratch
import keras
from keras.models import Sequential,Input,Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import LeakyReLU
from sklearn.metrics import accuracy_score

categories = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']


# Network parameters
batch_size = 16													# Setting the batch size
epochs = 6														# Setting the number of epochs
num_classes = len(categories)									# Getting the amount of classes

scratch_model = Sequential()	

# Build here your keras model.
# Try to use one or more convolutional layer, joint with pooling layer and dropout layer











# Compile the model with the Adam optimizer
scratch_model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),metrics=['accuracy'])

# Visualize the model through the summary function
scratch_model.summary()

In [None]:
# Let's train the model!



In [None]:
# Getting the results
scratch_model_train_acc = scratch_model_history.history['accuracy']
scratch_model_valid_acc = scratch_model_history.history['val_accuracy']
scratch_model_train_loss = scratch_model_history.history['loss']
scratch_model_valid_loss = scratch_model_history.history['val_loss']

print("Test accuracy: ", accuracy_score(scratch_model.predict_classes(test_X), test_Y))			# Testing the model

**Is the obtained value coherent with what you expected?**<br>
**What are the differences when using a different batch size? Why?**

# 2.3 Data Augmentation
Before even starting to load the dataset we should ask ourself whether the available amount of data is sufficient to our purposes.<br>
When the answer is negative we could need to do "data augmentation".<br>
Doing data augmentation means to increase the number of available data points. In terms of images, it may mean that increasing the number of images in the dataset. A common way to do this is to generate new images by applying a linear transformation to the original images in the dataset.<br>
The most common linear transformations are the following:<br>
* Rotation
* Shifting
* Blurring
* Change lighting conditions

In the picture below we show an example of augmentation:<br>
<img src="https://drive.google.com/uc?id=1B74snda_oJKkhVzxch9Ov8Y1XL63U3w5" width="600px" align="left"><br>

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Augmentation parameters
noise_range = 5																	# Gaussian blur range
flip_hor_prob = 0.5																# Probability to flip horizontally the image
rot_range = 30																	# Rotation range

print(train_X.shape, test_X.shape)

#Try different augmentation strategies
cifar10_datagen = ImageDataGenerator(
    featurewise_center=False,
    featurewise_std_normalization=False,
    #rotation_range=20,
    #width_shift_range=0.1,
    #height_shift_range=0.1,
    horizontal_flip=True)


In [None]:
from keras.utils.np_utils import to_categorical

scratch_model = Sequential()	

# Build here your keras model.
# Try to use one or more convolutional layer, joint with pooling layer and dropout layer









# Train your model



# Getting the results
scratch_model_train_acc = scratch_model_history.history['accuracy']
scratch_model_valid_acc = scratch_model_history.history['val_accuracy']
scratch_model_train_loss = scratch_model_history.history['loss']
scratch_model_valid_loss = scratch_model_history.history['val_loss']

print("Test accuracy: ", accuracy_score(scratch_model.predict_classes(test_X), test_Y))			# Testing the model

**What is the performance obtained on this new augmented dataset?**<br>
**How can you explain the obtained result?**

# 2.4 Using a pre-trained model

A common alternative to train a model from scratch consists in using a pre-trained model.<br>
The idea is to replace the convolutional part with a highly optimized convolutional part engineered and trained previously by someone else.<br>
Usually the models that we can use through keras.applications have been trained over the image net dataset. <br>
Today we are going to use the VGG19 model. Its architecture it is shown below:
<img src="https://www.researchgate.net/profile/Clifford_Yang/publication/325137356/figure/fig2/AS:670371271413777@1536840374533/llustration-of-the-network-architecture-of-VGG-19-model-conv-means-convolution-FC-means_W640.jpg" width="600px"><br>
After the convolutional part replacement we still need to set up a fully connected part.<br>
**Why in this lab we cannot use the fully connected part of VGG19 Net?<br>
What should we do to use it?<br>
And more in general in which situations we can do that?**

Moreover, using a pre-trained network is not always the best choice<br>
**Can you guess in which situations could be useful to use a pre-trained model?**

In [None]:
# Creating the model based over the pretrained Xception network
from keras import applications
vgg19 = applications.VGG19(weights = "imagenet", include_top=False, input_shape = (32, 32, 3))

# Produce the features of the train and validation sets using VGG19 predict function



from keras import models
from keras import layers
from keras import optimizers

# Creating a simple model that will classify the extracted features from the VGG19 network
pretrained_model = models.Sequential()








In [None]:
# Visualize the model through the summary function
vgg19.summary()

In [None]:
# Let's train the model!


In [None]:
# Getting the results
pretrained_model_train_acc = pretrained_model_history.history['acc']
pretrained_model_valid_acc = pretrained_model_history.history['val_acc']
pretrained_model_train_loss = pretrained_model_history.history['loss']
pretrained_model_valid_loss = pretrained_model_history.history['val_loss']

test_X_feature = vgg19.predict(test_X)						# Producing the test feature
print("Test accuracy: ", accuracy_score(pretrained_model.predict_classes(test_X_feature), test_Y)) # Testing the model

# 2.5 Comparing the models


Now that we trained both the "from scratch" and the "pre-trained" models, we are going to compare the obtained results obtained during the training. We are going to consider accuracy and loss.<br>
**What can you expect from these plots?**

In [None]:
# Create here the plots to compare the "from scratch" model and the "pretrained" model
# Try to produce a comparison plot about the accuracies (train and validation) and another plot for the losses
# Creating the plots to compare the "from scratch" model and the "pretrained" model
# Producing accuracy over epochs plot
print("Producing accuracy over epochs plot")
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(16,7))

plt.plot(scratch_model_train_acc, label="Scratch Train Acc.", color="#4db8ff")
plt.plot(scratch_model_valid_acc, label="Scratch Valid. Acc.", color="#006bb3")

plt.plot(pretrained_model_train_acc, label="Pretrained Train Acc.", color="#ff4d4d")
plt.plot(pretrained_model_valid_acc, label="Pretrained Valid. Acc.", color="#b30000")

plt.xlabel('Epoch')
plt.ylabel('Accuracy(%)')
plt.legend(loc='lower right', fancybox=True, shadow=True, ncol=4)
plt.grid()
plt.savefig('acc_epochs.png', dpi=300)


# Producing loss over epochs plot
print("Producing loss over epochs plot")
fig = plt.figure(figsize=(16,7))

plt.plot(scratch_model_train_loss, label="Scratch Train Loss", color="#4db8ff")
plt.plot(scratch_model_valid_loss, label="Scratch Valid. Loss", color="#006bb3")

plt.plot(pretrained_model_train_loss, label="Pretrained Train Loss", color="#ff4d4d")
plt.plot(pretrained_model_valid_loss, label="Pretrained Valid. Loss", color="#b30000")

plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(loc='upper right', fancybox=True, shadow=True, ncol=4)
plt.grid()
plt.savefig('loss_epochs.png', dpi=300)








**What information can you get from these plots?**<br>
**Are they showing what you expected?**