# Deep learning with convolutional neural networks
#### Part of the course on "Foundations of machine learning", Department of Mathematics and Statistics, University of Turku, Finland
#### Lectures available on YouTube: https://youtube.com/playlist?list=PLbkSohdmxoVAZ9DEHEWHjeGK7Ei-DjKHI&si=Msu74_I0qhLrRWcu
#### Code available on GitHub: https://github.com/ionpetre/FoundML_course_assignments

#### This notebook is partially based on the following sources:

> https://www.tensorflow.org/tutorials/keras/classification

We demonstrate in this notebook the use of convolutional neural networks for classification. We use the tensorflow and keras as the Pyhton libraries. We also demonstrate the use of a pre-trained ResNet model.

Datasets used in this notebook: MNIST, Fashion MNIST, CIFAR-10.

#### This notebook uses some fairly big models. On a "standard" CPU, running the notebook may take a longer than usual time, on the scale of hours, perhaps up to a day or so. Using a GPU speeds this up by a factor of 10 or so. You can do small experiments on your CPU and do full trainings on a GPU platform such as Google Colaboratory. 

In [None]:
# From https://www.tensorflow.org/tutorials/keras/classification:

# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

#### Load the libraries and our own support functions

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf


from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

In [None]:
# Reset the seed of the random number generator, for reproducibility purposes
# Reset also the Keras/tensorflow environment
# Finally, make tensorflow deterministic.

def my_reset_env(SEED):
  from keras.utils import set_random_seed

  # Make TensorFlow ops as deterministic as possible for reproducibility of results
  # This will affect the overall performance.
  # `enable_op_determinism()` is introduced in TensorFlow 2.9.
  tf.config.experimental.enable_op_determinism()

  # We reset all variables implicitly instantiated by Keras/tensorflow
  tf.keras.backend.clear_session()

  # Set the seed using keras.utils.set_random_seed. This will set:
  # 1) `numpy` seed
  # 2) `tensorflow` random seed
  # 3) `python` random seed
  set_random_seed(SEED)

In [None]:
# This callback will stop the training when there is no improvement in the loss
#      for three consecutive epochs.
callback_loss_patience = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)

In [None]:
my_metrics = [tf.keras.metrics.CategoricalAccuracy(),
           #   tf.keras.metrics.TruePositives(),
              ]

In [None]:
# Plot the evolution of the loss and the accurayc throughout the epochs
# This is useful to find over-fitting and decide on early stopping of the training.
def plot_train_history(history_dict):

  print(history_dict.keys())

  import matplotlib.pyplot as plt

  train_loss = history_dict['loss']
  val_loss = history_dict['val_loss']
  train_acc = history_dict['categorical_accuracy']
  val_acc = history_dict['val_categorical_accuracy']
  # train_tp = np.array(history_dict['true_positives']) / X_train_std.shape[0]
  # val_tp = np.array(history_dict['val_true_positives']) / X_valid_std.shape[0]
  epochs = range(1, len(train_loss) + 1)


  plt.figure(figsize=(15, 5))

  plt.subplot(1,2,1)
  plt.plot(epochs, train_loss, 'b', label='Training cat. cross-entropy')
  plt.plot(epochs, val_loss, 'r', label='Validation cat. cross-entropy')
  plt.title('Training and validation loss')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()


  plt.subplot(1,2,2)
  plt.plot(epochs, train_acc, 'b', label='Training accuracy')
  plt.plot(epochs, val_acc, 'r', label='Validation accuracy')
  plt.title('Training and validation accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Categorical accuracy')
  plt.legend()

  #plt.subplot(1,3,3)
  #plt.plot(epochs, train_tp, 'b', label='Training TP')
  #plt.plot(epochs, val_tp, 'r', label='Validation TP')
  #plt.title('Training and validation true positives')
  #plt.xlabel('Epochs')
  #plt.ylabel('True positives')
  #plt.legend()

  plt.show()

## Demo a convolutional neural network classifier on the MNIST dataset
#### We use the LeNet-5 architecture.

In [None]:
from keras.datasets import mnist
from keras.utils import to_categorical
from keras import backend as K

(X_train_valid, y_train_valid), (X_test, y_test) = mnist.load_data()
img_width = X_train_valid.shape[1]
img_heights = X_train_valid.shape[2]
print('The size of our training dataset: samples x width x height x channels =', X_train_valid.shape)

# Reshape the data to add an extra dimension for the grayscale 'color' channel
# Depending on the version of Keras, the color channel is either expected in fromnt, or in the end

if K.image_data_format() == 'channels_first':
    X_train_valid = X_train_valid.reshape(X_train_valid.shape[0], 1, img_width, img_heights)
    X_test = X_test.reshape(X_test.shape[0], 1, img_width, img_heights)
    input_shape = (1, img_width, img_heights)
else:
    X_train_valid = X_train_valid.reshape(X_train_valid.shape[0], img_width, img_heights, 1)
    X_test = X_test.reshape(X_test.shape[0], img_width, img_heights, 1)
    input_shape = (img_width, img_heights, 1)

print('The size of our training dataset: samples x width x height x channels =', X_train_valid.shape)
print('We train on ',X_train_valid.shape[0], 'samples.')
print('We test on ', X_test.shape[0], 'samples.')


#### Data preprocessing
The data must be preprocessed before training the network. If you inspect the first image in the training set, you will see that the pixel values fall in the range of 0 to 255:

In [None]:
plt.figure()
plt.imshow(X_train_valid[0])
plt.colorbar()
plt.grid(False)
plt.show()

In [None]:
# Scale the data into [0,1] by dividing to 255

X_train_valid_std = X_train_valid/255
X_test_std  = X_test/255

del X_train_valid
del X_test

In [None]:
# Input size: 32x32 expected in LeNet-5, we have 28x28
# The LeNet architecture accepts a 32x32 pixel images as input, mnist data is 28x28 pixels.
# We simply pad the images with zeros to overcome that.
# Another possibility is to modify the input layer size in the LeNet architecture.
# We prefer to leave that unchaged for historical reasons.

# Pad images with 0s
X_train_valid_std = np.pad(X_train_valid_std, ((0,0),(2,2),(2,2),(0,0)), 'constant')
X_test_std = np.pad(X_test_std, ((0,0),(2,2),(2,2),(0,0)), 'constant')

print("Updated Image Shape: {}".format(X_train_valid_std.shape))

In [None]:
# Display some images

class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

plt.figure(figsize=(20,12))
for i in range(50):
    plt.subplot(5,10,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train_valid_std[i])
    plt.xlabel(class_names[y_train_valid[i]])
plt.show()

In [None]:
# Train - validation split

X_train_std, X_valid_std, y_train, y_valid = train_test_split(
    X_train_valid_std,
    y_train_valid,
    test_size=0.2,
    random_state=150,
    stratify=y_train_valid,
    shuffle=True
)

# Check the result of the data split

print('# of training images:', X_train_std.shape[0])
print('# of validation images:', X_valid_std.shape[0])

del X_train_valid_std
del y_train_valid

# Encode the labels from numerical to categorical

from keras.utils import to_categorical

y_train_cat = to_categorical(y_train, num_classes=10)
y_valid_cat = to_categorical(y_valid, num_classes=10)
y_test_cat = to_categorical(y_test, num_classes=10)

#### The LeNet-5 model architecture

![lenet.png](https://raw.githubusercontent.com/MostafaGazar/mobile-ml/master/files/lenet.png)
> LeNet-5 Architecture. Credit: [LeCun et al., 1998](http://yann.lecun.com/exdb/publis/psgz/lecun-98.ps.gz)

#### Input
    32x32x1 pixels image

#### Architecture
* **Convolutional #1** outputs 28x28x6
    * **Activation** `sigmoid`

* **Pooling #1** The output shape should be 14x14x6.

* **Convolutional #2** outputs 10x10x16.
    * **Activation** `sigmoid`

* **Pooling #2** outputs 5x5x16.
    * **Flatten** Flatten the output shape of the final pooling layer

* **Fully Connected #1** outputs 120
    * **Activation** `sigmoid`

* **Fully Connected #2** outputs 84
    * **Activation** `sigmoid`

* **Fully Connected #3** output 10
    * **Activation** `softmax`

In [None]:
# The model can be setup by specifying each layer:
#          its type, its size, its activation function.

from keras import models
from keras import layers

LeNet5model = models.Sequential([
    layers.Conv2D(filters=6,
                  kernel_size=(5, 5),
                  strides=(1,1),
                  padding='valid',
                  activation='sigmoid',
                  input_shape=X_train_std.shape[1:] # (32,32,1)
                 ),
    layers.AveragePooling2D(pool_size=(2, 2)),
    layers.Conv2D(filters=16,
                  kernel_size=(5, 5),
                  strides=[1,1],
                  padding='valid',
                  activation='sigmoid'
                 ),
    layers.AveragePooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dense(units=120, activation='sigmoid'),
    layers.Dense(units=84, activation='sigmoid'),
    layers.Dense(units=10, activation = 'softmax')
])

LeNet5model.summary()

Our model has 61706 parameters. Most of them (48120 + 10164 + 850) come from the dense layers. Let's see how it trains.

In [None]:
# The model must be compiled by specifying the numerical optimizer algorithm,
#     the loss function, and metrics to be followed up epoch by epoch

LeNet5model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=my_metrics,
)

In [None]:
# Reset the Keras/tensorflow environment, including the random number generator seed
my_reset_env(2023)


# Fit the model by specifying the number of epochs and the batch size
# We also indicate the validation data so we can collect the evolution
#      of the metrics through the epochs, both on train, as well as on validation.

LeNet5_fit_history = LeNet5model.fit(X_train_std,
                               y_train_cat,
                               epochs=100,
                               batch_size=500,
                               callbacks=[callback_loss_patience],
                               validation_data=(X_valid_std, y_valid_cat)
                              )

plot_train_history(LeNet5_fit_history.history)

This is an amazing result, both on the training set, as well as on the validation set. Let's check the classification rerport.

In [None]:
# Use the model to predict in the form of a 10-class probability distribution
y_train_prob = LeNet5model.predict(X_train_std)

# Select the most likely class
y_train_pred=np.argmax(y_train_prob, axis=1)

print("\n The classification results on the train data:")
print(classification_report(y_train,y_train_pred))
print("Confusion matrix (train data):\n", confusion_matrix(y_train,y_train_pred))




# The classification results for the validation data

y_valid_prob = LeNet5model.predict(X_valid_std)
y_valid_pred=np.argmax(y_valid_prob, axis=1)
print("\n The classification results on the validation data:")
print(classification_report(y_valid,y_valid_pred))
print("Confusion matrix (validation data):\n", confusion_matrix(y_valid,y_valid_pred))




# The classification results for the test data

y_test_prob = LeNet5model.predict(X_test_std)
y_test_pred=np.argmax(y_test_prob, axis=1)
print("\n The classification results on the test data:")
print(classification_report(y_test,y_test_pred))
print("Confusion matrix (test data):\n", confusion_matrix(y_test,y_test_pred))

In [None]:
del X_train_std
del X_valid_std
del X_test_std
del y_train
del y_train_prob
del y_train_pred
del y_valid
del y_valid_prob
del y_valid_pred
del y_test
del y_test_prob
del y_test_pred
del LeNet5model

## Challenge 1: train a convolutional neural network classifier on the fashion MNIST dataset


#### Data: the Fashion MNIST dataset

This is a dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images. This dataset can be used as a drop-in replacement for MNIST.

The classes are:

| Label | Description   |
|-------|---------------|
|    0  | T-shirt/top   |
|    1  |	Trouser     |
|    2  |	Pullover    |
|    3  |	Dress       |
|    4  |	Coat        |
|    5  |	Sandal      |
|    6  |	Shirt       |
|    7  |	Sneaker     |
|    8  |	Bag         |
|    9  |	Ankle boot  |

License: The copyright for Fashion-MNIST is held by Zalando SE. Fashion-MNIST is licensed under the MIT license.

The data is available from the Keras datasets.

#### Coding instructions

> Use the LeNet-5 architecture, originally introduced to classify the MNIST dataset.

> Loading the data is similar as for MNIST, the following code should help:
>>from keras.datasets import fashion_mnist
>>(X_train_valid, y_train_valid), (X_test, y_test) = fashion_mnist.load_data()

> Use the 28 x 28 original images in the fashion MNIST dataset, skip the padding to size 32x32. Instead, indicate in the input layer that the input size is (28,28,1).

> Scale the data by diving to 255.

> Use the LeNet-5 architecture demonstrated in this notebook for the MNIST dataset, modified so that it uses 'relu' instead of the 'sigmoid' activation and the MaxPooling2D layer instead of the AveragePooling2D, with the same parameters.

> Reset the tensorflow environment and the random number generator seed in exactly the same way as for the MNIST dataset (this is important for reproducibility). Do this before every re-training of the model. 

> Train the model for 50 epochs with the same callback function we used before. Use a batch size of 500.

> Get the classification report on train, validation, and test.

#### Questions
> Q1. How many (trainable) parameters does your LeNet-5 model for Fashion MNIST have?

> Q2. What is the categorical accuracy of your LeNet-5 model on the Fashion MNIST training dataset? 

> Q3. What is the categorical accuracy of your LeNet-5 model on the Fashion MNIST validation dataset? 

> Q4. What is the categorical accuracy of your LeNet-5 model on the Fashion MNIST test dataset? 


In [None]:
# Your code here



## Demo 2: train a convolutional neural network classifier on the CIFAR-10 dataset



#### The CIFAR datasets

The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The 10 classes are:

| Label | Description   |
|-------|---------------|
|    0  | Airplane   |
|    1  |	Automobile     |
|    2  |	Bird    |
|    3  |	Cat       |
|    4  |	Deer        |
|    5  |	Dog      |
|    6  |	Frog       |
|    7  |	Horse     |
|    8  |	Ship         |
|    9  |	Truck  |

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

Webpage, including download: https://www.cs.toronto.edu/~kriz/cifar.html
Dataset on Keras: https://www.tensorflow.org/api_docs/python/tf/keras/datasets/cifar10

#### Data preparation

In [None]:
from keras.datasets import cifar10
from keras.utils import to_categorical

(X_train_valid, y_train_valid), (X_test, y_test) = cifar10.load_data()

print('We have %2d training pictures and %2d test pictures.' % (X_train_valid.shape[0],X_test.shape[0]))
print('Each picture is of size (%2d,%2d)' % (X_train_valid.shape[1], X_train_valid.shape[2]))

In [None]:
# Scale the data into [0,1] by dividing to 255

X_train_valid_std = X_train_valid/255
X_test_std  = X_test/255


In [None]:
# Display some images

class_names = ['Airplane', 'Automobile', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']


plt.figure(figsize=(20,12))
for i in range(50):
    plt.subplot(5,10,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train_valid_std[i])
    plt.xlabel(class_names[int(y_train_valid[i])])
plt.show()


In [None]:
# Train - validation split

X_train_std, X_valid_std, y_train, y_valid = train_test_split(
    X_train_valid_std,
    y_train_valid,
    test_size=0.2,
    random_state=150,
    stratify=y_train_valid,
    shuffle=True
)

# Check the result of the data split

print('# of training images:', X_train_std.shape[0])
print('# of validation images:', X_valid_std.shape[0])
print("Note the shape of the data (3 color channels):", X_train_std.shape)

# Encode the labels from numerical to categorical

from keras.utils import to_categorical

y_train_cat = to_categorical(y_train, num_classes=10)
y_valid_cat = to_categorical(y_valid, num_classes=10)
y_test_cat = to_categorical(y_test, num_classes=10)

#### Train a LeNet-5 model

In [None]:
# Train a CNN model with an input layer of shape (32, 32, 3), accounting for the 3 color channels.
# Try first the LeNet-5 architecture.

from keras import models
from keras import layers


LeNet5model = models.Sequential([
    layers.Conv2D(filters=6,
                  kernel_size=(5, 5),
                  strides=(1,1),
                  padding='valid',
                  activation='relu',
                  input_shape=X_train_std.shape[1:]  #(32,32,3)
                 ),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(filters=16,
                  kernel_size=(5, 5),
                  strides=[1,1],
                  padding='valid',
                  activation='relu'
                 ),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dense(units=120, activation='relu'),
    layers.Dense(units=84, activation='relu'),
    layers.Dense(units=10, activation = 'softmax')
])

LeNet5model.summary()

LeNet5model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=my_metrics,
)


In [None]:
# Reset the Keras/tensorflow environment, including the random number generator seed
my_reset_env(2023)


# Fit the model by specifying the number of epochs and the batch size
# We also indicate the validation data so we can collect the evolution
#      of the metrics through the epochs, both on train, as well as on validation.

LeNet5_fit_history = LeNet5model.fit(X_train_std,
                               y_train_cat,
                               epochs=100,
                               batch_size=500,
                               callbacks=[callback_loss_patience],
                               validation_data=(X_valid_std, y_valid_cat)
                              )

In [None]:
plot_train_history(LeNet5_fit_history.history)

The model overfits from epoch or so 40 onwards. Its accuracy is only about 60% on the validation data. Let's try a bigger and deeper model.

In [None]:
del LeNet5model
del LeNet5_fit_history

## Challenge 2: classify CIFAR-10 with about 80% accuracy on the validation set.

#### Coding instructions:
> Layer 1: a convolutional layer with 32 filters, kernel size (3,3), stride 1, padding 'same', nad activation 'relu'

> Layer 2: a batch normalization layer that nornalizess the activation output of the first later (declare it using "layers.BatchNormalization()")

> Layer 3: the same as layer 1

> Layer 4: the same as layer 2

> Layer 5: a max pooling layer of size (2,2) with strides 2

> Layer 6: a dropout layer with rate 0.25 (declare it using "layers.Dropout(0.25)"). Its role is one of regularization, to help avoiding the overfitting of the mode. During each iteration of training, 25% of the activations from the previous layer, chosen randomly, are nulified. 

> Layers 7-12: similar to layers 1-6, except that we use now 64 filters in the convolutional layers.

> Layer 13: flatten the input

> Layer 14: a dense layer with 64 neurons and 'relu' activation

> Layer 15: dropout with rate 0.25 (which is it to say 25%)

> Layer 16: a dense layer with 10 neurons and 'softmax' activation. 

> Reset the tensorflow environment and the random number generator seed in exactly the same way as for the MNIST dataset (this is important for reproducibility). Do this before every re-training of the model. 

> Train the model for 100 epochs with the same callback function we used before. Use a batch size of 500.

> Get the classification report on train, validation, and test.

#### Questions
> Q5. How many trainable parameters does your CNN model for CIFAR-10 have? 

> Q6. What is its categorical accuracy of your CNN model for CIFAR-10 on the training set? 

> Q7. What is its categorical accuracy of your CNN model for CIFAR-10 on the validation set? 

> Q8. Do you consider your CNN model for CIFAR-10 to be overfit (in other words, is the loss on the validation set clearly increasing through the later stage of training)? 

> Q9. What is its categorical accuracy of your CNN model for CIFAR-10 on the test set? 


In [None]:
# Your code here

