<a href="https://colab.research.google.com/github/Servat0r/ISPR-Midterms-2023/blob/master/Midterm3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Midterm 3 ISPR 2023 - Assignment 2 - Salvatore Correnti (matr. 584136)
In this assignment we will design and test on the `CIFAR-10` dataset a custom Convolutional Neural Network.

## Initial Imports
As usual, we start with a couple of cells for changing working directory and for all necessary imports before actually starting coding.

In [None]:
%cd "/content/drive/MyDrive/Colab Notebooks/ISPR-Midterms-2023"

/content/drive/MyDrive/Colab Notebooks/ISPR-Midterms-2023


In [None]:
# Below is just to make sure we can build Tensorflow with GPU and to avoid a verbose output for installation
!pip install tensorflow 1> /dev/null

Impossibile trovare il percorso specificato.


In [None]:
import numpy as np
import tensorflow as tf
from PIL import Image
import matplotlib.pyplot as plt

from tensorflow.keras.utils import to_categorical

## CIFAR-10 Dataset
For training and evaluating our Convolutional Neural Network, we will use the `CIFAR-10` dataset, which is made up of $50,000$ train and $10,000$ test $32 \times 32$ RGB images, and is available in `keras` as a "built-in" dataset.

In [None]:
(x_dev, y_dev), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Show an image as an example
exampleImage = Image.fromarray(x_dev[0], mode='RGB')
exampleImage = exampleImage.resize((64, 64))  # Just to show it better
exampleImage.show()

We now convert CIFAR-10 labels into `one-hot` format for usage with a "Sotfmax-based" CNN classifier.

In [None]:
y_dev = to_categorical(y_dev)
y_test = to_categorical(y_test)
y_dev[0]

array([0., 0., 0., 0., 0., 0., 1., 0., 0., 0.], dtype=float32)

Before proceeding, we also normalize pixel values into the range $[0, 1]$ and we split training data into properly training ones and validation ones (we will keep test data to evaluate the final model).

In [None]:
x_dev = x_dev.astype(np.float32) / 255
x_test = x_test.astype(np.float32) / 255

from sklearn.model_selection import train_test_split

x_train, x_eval, y_train, y_eval = train_test_split(x_dev, y_dev, test_size=0.2, random_state=0)

## Designing a Convolutional Neural Network for CIFAR-10
When designing a custom Convolutional Neural Network for a specific task, it is important to take into account at least the most important "building blocks" for CNNs that have been developed through the years and used in "reference" models. Indeed, since CNNs are in usage by about a decade, it is convenient to take inspiration or directly modify one of these reference models to achieve our objectives. Also, monitoring the accuracy and the space and time required for training a given model (e.g. by calculating the number of parameters) is also fundamental, and in particular since we don't have required hardware for huge models we want to get a tradeoff between accuracy and size of the model.

We will then go through our discussion throughout the following design choices:

1. `Target Accuracy`: we want to achieve an accuracy at least $\geq 70\%$, since this is the accuracy that one can easily get with a "reduced" `VGG` model (see below). On top of that, we can consider additional design choices based on what we get from experiments for achieving a higher one;
2. `"Style" of the CNN`: we will use the "traditional" design pattern of a series of Convolutional-MaxPooling layers, ended by a `Fully-Connected` block and `Softmax` activation, as employed in `AlexNet`, `VGG` and (apart from MaxPool) in `GoogLeNet`;
3. `Base Model`: since `CIFAR-10` images are of size $32 \times 32$, we are not interested in very deep networks for reducing the feature maps sizes up to reasonable values for usage with a final sequence of fully-connected layers, or in other words if we use `max-pooling` with a pool size of $(2, 2)$, it suffices to employ $3$ `MaxPool2D` layers to get a feature map of size $\leq 4 \times 4$, and if we suppose to have $N$ filters at the end, we will get $16N$ input units for the `dense` part of the network, which can be reasonable if we take for example a single hidden layer with a size $\leq 128$, or we directly skip the `dense hidden` layers. As a consequence, we will model our CNNs as a sort of "reduced" version of `VGG`, which has proved to be quite effective in classification tasks over `ImageNet`;
4. `Number of Parameters`: ideally, we want to keep $< 1,000,000$ parameters for our Convolutional Neural Network, which is suitable for a 2-deep or 3-deep (in the sense of Conv2D-MaxPool blocks) `VGG-like` network. After having built a "satisfactory" network without explicit design choices for restricting the number of parameters, we can explore usage of $1 \times 1$ convolutions and reduced `Dense` blocks to reduce the number of parameters while retaining most of the accuracy;
5. `Regularization`: since a `VGG-like` CNN can quickly become quite big, especially in the `Dense` part, it is essential to adopt regularization strategies to limit overfitting and improving overall performance. From an architectural point of view, two viable yet effective strategies are `Dropout` and `Batch Normalization`, and we will experiment with both of them to see if we can improve overall performance of the network;
6. `Advanced Blocks`: if we manage to keep our network "sufficiently small" (i.e. with 2 or 3 Conv2D-MaxPool blocks), we may not use advanced architectural patterns like `Skip Connections` and `Inception Blocks`, which complicate the design and coding of the Neural Network, especially if we can keep under control `vanishing gradient` phenomena by avoiding a too deep network.

### VGG Network
<img src="vgg16.png">

**`VGG16`** (**`Visual Geometry Group 16`**) network was developed in (?) by (?) and won the (?) competition on the `ImageNet` dataset. As we can see in the above figure, VGG16 is composed by a sequence of $5$ `Conv2D-Conv2D-MaxPool` blocks, i.e. 2 Convolutional Layers with `same` padding and $3 \times 3$ kernel sizes, followed by a single MaxPooling layer with pool size of $2 \times 2$ for reducing feature maps sizes. As we can see, each block contains 2 Convolutional Layer with an increasing number of filters for each one ($64$, $128$, $256$, $512$, $512$) and a single hidden layer of size $4096$.

The idea behind this pattern is that by reducing the size of the feature maps we are progressively representing "higher-level" features that somehow "summarize" information by lower-level ones, hence the number of "descriptors" we want to keep should be increasing when we traverse the network, and moreover we want to let "feature descriptors" from more and more distant areas of the image to combine themselves.

#### Reduced VGG
We can keep the above design pattern in our CNN by simply reducing the number of Blocks and the size of the hidden layer, or by removing it at all. For example, we can employ a first block made up of two convolutional layers each one with $32$ filters of size $3 \times 3$, a second one with $64$ filters and a third one of $128$ ones. As in VGG16, we use `same` padding even if it may introduce some form of bias due to the padding value since we are using a very small kernel.

We will start with a "simple" Reduced VGG without any Dropout, Batch Normalization or Dense Hidden Layer to see if this is a viable baseline.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, MaxPool2D, Dropout, BatchNormalization

In [None]:
baseReducedVGG = Sequential()

baseReducedVGG.add(Conv2D(32, kernel_size=3, activation="relu", input_shape=(32,32,3), padding='same'))
baseReducedVGG.add(Conv2D(32, kernel_size=3, activation="relu", padding='same'))
baseReducedVGG.add(MaxPool2D())

baseReducedVGG.add(Conv2D(64, kernel_size=3, activation="relu", padding='same'))
baseReducedVGG.add(Conv2D(64, kernel_size=3, activation="relu", padding='same'))
baseReducedVGG.add(MaxPool2D())

baseReducedVGG.add(Conv2D(128, kernel_size=3, activation="relu", padding='same'))
baseReducedVGG.add(Conv2D(128, kernel_size=3, activation="relu", padding='same'))
baseReducedVGG.add(MaxPool2D())

# Now we flatten for fully-connected part
baseReducedVGG.add(Flatten())
baseReducedVGG.add(Dense(10, activation="softmax"))
baseReducedVGG.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 32, 32, 32)        896       
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        9248      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 16, 16, 32)       0         
 )                                                               
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 64)        18496     
                                                                 
 conv2d_3 (Conv2D)           (None, 16, 16, 64)        36928     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 8, 8, 64)         0         
 2D)                                                    

As we can see, our `baseReducedVGG` model has $307,498$ parameters, which is a reasonable amount and far below the $1,000,000$ maximum we mentioned before. We also notice that since we have not used any Dense hidden layer, the two Conv2D layers with $128$ filters contribute for $\approx 75\%$ of the total parameters, hence we will not use any higher number of filters to keep number of parameters (hence training time) under control.

We now compile and train the model with a batch size of $64$ and for $10$ epochs, using `Adam` optimizer and `Categorical Cross Entropy` loss:

In [None]:
baseReducedVGG.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
baseReducedVGGHistory = baseReducedVGG.fit(
    x_train, y_train, validation_data=(x_eval, y_eval), epochs=10, batch_size=64,
    callbacks=[tf.keras.callbacks.CSVLogger('baseReducedVGG_log.csv')],
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
