In [None]:
# Copyright (c)
# SANTE 2023
# Philippe Bouchet 2022
# philippe.bouchet@epita.fr

# Deep Learning Workshop - Google Developer Student Club EPITA

Welcome to GDSC EPITA's first workshop!

In this notebook we will be discovering the very basics of Deep Learning with Python's TensorFlow library!
This notebook is meant to be accessible for people who have never used TensorFlow (or NumPy) before! An adept understanding of Python is required to finish this notebook.

Remember:
- You can discuss with your neighbours, and exchange ideas so that you can improve your models!
- If you don't have enough time to finish the notebook here, you can always take your time at home!

- If you have any issues with the notebook, don't hesitate to ask for help, or send a message on the Discord!

Good luck!

## Setting up the environment

In order to accomplish this task, we will need to load some Python libraries.
We are going to use:
- **Tensorflow 2.0**: For building and training our deep learning model
- **Numpy**: For linear algebra, and ndarray operations
- **Matplotlib**: To visualize the data

In [None]:
# Let's start with importing tensorflow into our work environment

import tensorflow as tf
from tensorflow.keras import models, layers
print("TensorFlow version:", tf.__version__)

# Fixed seed for deterministic results
tf.random.set_seed(0)

In [None]:
# Numpy for linear algebra

import numpy as np

In [None]:
# Matplotlib and Seaborn for data visualization

import matplotlib.pyplot as plt

Since we will be on Google colab, we will need to make use of the GPUs.
To enable GPUs on your session, go on `Edit`, then `Notebook settings` and select `GPU` as the hardware accelerator.

## A brief introduction to NumPy

This section is meant for those who have never used NumPy. If you're already familiar with NumPy, **then you can skip this section**.

NumPy, or *Numerical Python* is a library dedicated to make numerical computing with Python very easy and straightforward. We will not go too in depth with NumPy, so all you need to know for now is how to use the `np.ndarray` type.

What is an `np.ndarray` type? It stands for "*N-Dimensional Array*". This type is very powerful, as it allows you to manipulate multidimensional arrays quickly and with ease, as opposed to standard lists in Python.

Here are a few ways that we can use the `np.ndarray`type:

Initializing an array

In [None]:
simple_array = np.array([1, 2, 3, 4])  # Initialize from list
array = np.ones((2, 2)) # Initialize a 2x2 (2D) array full of ones

print(f"arr:\n{simple_array}\n")
print(f"arr_ones:\n{array}")

Checking the array's dimensions using its `shape` attribute

In [None]:
array.shape  # Shape of (2x2) 

Performing array wide operations quickly and efficiently


In [None]:
array += 1 # Add 1 to every element in array
print(f"Increment array:\n{array}\n")

array = np.sin(array) # Apply the sine function to every element in array
print(f"Sine function applied to array:\n{array}")

## Classification

The task in this workshop is that of image **classification**.

We seek to determine if the contents of an image corresponds to one of the classes from the dataset that was used to train our network. A class simply represents the label that is associated to data from the dataset.

For example, imagine we have a dataset that contains 50000 images, of horses, shoes and guitars. We have only 3 **classes** here. We can use this dataset to train our network to classify images of horses, shoes and guitars.

But can you guess what happens when you try to give in an image of an airplane to your network? It will probably think it's a horse! This network is specialized **only** for classifying those 3 classes.

## Loading the dataset

Let's start off by loading our dataset! It's really important to correctly load our dataset. In most real life situations, we will be confronted with corrupted datasets, often populated with invalid pieces of data. However, the dataset that we will be using for this workshop (**cifar10**) is a perfectly clean, and rich dataset! So we can jump straight into writing the network once we load and verify our data! 

### What are $X$ and $y$?

In **supervised learning**, we call $X$ the input data, and $y$ the label data. When training our model, we will input $X$, and the model will predict a variable $\hat{y}$. In order to measure the difference between the predicted value ($\hat{y}$), and the expected value ($y$), we use a *loss function*
We call this loss function $L$:

$$
L_{CE}(\hat{y},y) = - \sum_{i=0}^{N}y_i log(\hat{y_i})
$$

This is the formula for **cross entropy** ($CE$).

We will be using **categorical cross entropy** ($CCE$) for this workshop.

$$
L_{CCE}(\hat{y}, y) = -\frac{1}{N}\sum_{i=0}^{N}{\sum_{j=0}^{J}{y_j log(\hat{y_j})+(1-y_j) log(1 - \hat{y_j})}}
$$

No need to memorize these formulas! It's just to give you an idea of how the network calculates the error between the predictions and the expected values.

The network then uses **gradient descent** to optimize the parameters, with respect to the variations in the loss function, by using the chain rule.

We'll try and understand the mathematical aspect of deep learning in another workshop perhaps... ;) 

### Train and test data, why split the dataset?

In Deep Learning, and by extent Machine Learning, it's absolutely vital to separate our data into training, and testing datasets.

Why, you ask? When you train the network with all of the data from the dataset, the model develops bias towards this data. So if we were to use that same data to evaluate the model's accuracy, our result would be biased.
We have to use data that the model has never seen before!

One way to do this is by splitting dataset into two:
- **Training dataset**: This dataset will be used to train the model.
- **Test dataset**: This data will be used to evaluate the model's accuracy once it has been trained.

We'll start off by loading the MNIST handwritten dataset. It contains images of handwritten digits from 0, to 9. This dataset has **10** classes!

In [None]:
# Load the data
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

# Change shape from (60000, 1) -> (60000,)
y_train = y_train.flatten()
y_test  = y_test.flatten()

## Visualizing the data

Let's take a look at the data we just loaded!

In [None]:
X_train.shape

In [None]:
y_train.shape

By calling the builtin `shape` attribute from `X_train`, we get a tuple containing 3 elements. 60000, 28, 28. These elements correspond to the dimensions of our data.

We have 60000 elements, with each element having a dimension of 28x28. These dimensions correspond to image dimensions. An image of width 28 and of height 28.

If we had an RGB image, we would have an extra dimension. We'd simply have a tuple that would look like `(60000, 28, 28, 3)`.

Let's go ahead and visualize our data!

In [None]:
# idx - Select a random index between 0 and 50000
idx = np.random.randint(X_train.shape[0])

img_data  = X_train[idx] # Data containing the image

_ = plt.imshow(X_train[idx])

We can see an image from our dataset! Amazing! 

Now, let's try and see the corresponding label for this image!

In [None]:
print(f"All possible y values: {np.unique(y_train)}")
print(f"\nLabel for image {idx}: {y_train[idx]}")

# Pre-processing

Before we can start feeding our network with the data, we're gonna have to process it a bit. This step is called **pre-processing** since it will be carried out *before* it is passed on to the network.

We're gonna look at **one-hot encoding** for our y labels, and **normalization** for our X values.

## One-hot encoding

Since we have 10 different classes, our labels vary from 0 to 9.

However, our network can't have a single output that predicts from 0 to 9! Since we are bound by the properties of the softmax activation function, we can only have an output that ranges continuously from 0 to 1!

To solve this, we need a network with 10 different outputs, with each output corresponding to a certain class. Now our network is be able to predict these different classes! So we can have 10 output neurons in this case, and a neuron whose value is close to `1` means that it predicted an element from that class!

But our label values ranges from 0 to 9, so how can we make our labels compatible with this network of 10 different outputs?

We will be using "*one-hot encoding*"!

The concept is simple, given a class of $n$ possible values, we will create vectors of size $n$ for each label. 

With $n = 10$ for this situation.

For each vector, the element at the corresponding value of the label will be set to 1, whereas all the others will be set to 0.

For example:

The label for the number `3` the vector `[0,0,0,1,0,0,0,0,0,0]`

The label for the number `9` becomes the vector `[0,0,0,0,0,0,0,0,0,1]`

So on and so forth...

Here's how to use One hot encoding in TensorFlow! 

In [None]:
# One-hot encoding
# depth - corresponds to the number of classes we wish to encode
y_train = tf.one_hot(y_train, depth=10)

Now let's see what we get!

In [None]:
print(f"\nLabel for image {idx}: {y_train[idx]}")

Our labels are now encoded in a way that is understandable by our network!

## Normalization

Values in 8 bit images range from 0 to 255. However, this strong variance in pixel values can result in the network struggling to find the local minimum during gradient descent. Finding the local minimum essentially allows the network to update itself with the best parameters.

Since we want our network to have the best possible parameters, we're going to have to help it find the local minimum by **normalizing** our input data.

Let's take a look at a pixel value's range of an image in our dataset.

In [None]:
print(f"Range for X: [{X_train[idx].min()}, {X_train[idx].max()}]")

We can notice that our pixel values vary too much!

Pixel values are usually defined within $[0, 255]$. If we can *normalize* the range from $[0, 255]$ to $[0, 1]$, then our network can find find the best parameters to update itself with.

Let's code the normalization function!

https://en.wikipedia.org/wiki/Normalization_(image_processing)

If you're interested you can also read [this article](https://jermwatt.github.io/machine_learning_refined/notes/3_First_order_methods/3_9_Normalized.html) about normalization.


### FIXME - Normalization function

In [None]:
# Arguments:
#   X - np.ndarray - Values ranging from [0, 255]
#
# Return:
#   X_norm - Normalized np.ndarray in the range [0, 1]
#
# Hint: This function can be coded in one line! ;)
def normalize(X : np.ndarray) -> np.ndarray:
  return X/255

Let's test it out!

In [None]:
X_train = normalize(X_train)
print(f"Range for X normalized: [{X_train[idx].min()} , {X_train[idx].max()}]")


Can you imagine what the normalized image looks like? (*Don't peek at the answer below!*)

In [None]:
_ = plt.imshow(X_train[idx])

That's right, it looks exactly the same! Normalization is a linear operation, therefor the pixel's value is proportionnaly scaled from $[0,255]$ to $[0,1]$.

## The model

Now that we know how to load our data, and normalize it, let's design the model!

For this task, we have opted for a **Convolutional Neural Network** (CNN) model, since we want to be able to classify images.

We will be using the Keras API for the structure of our network.

It does seem overwhelming at first, but this is definitely within your reach! It will take a bit of time to understand, so if you are having trouble with designing the network later on in the notebook, you can inspire yourself with this model's structure!

You can always refer to the slides from the presentation on CNN models.

### Structure of the model

Let's start off by defining the general structure of our model!

We call the `Sequential()` method from the Keras `models` API.

For the initializer to this method, we can pass a list containing different layers. The way this list is constructed allows us to build models in a very efficient, lego-like way, as if we were using building blocks.

It's important to note that the first layer *must* always have a keyword argument specifying the input shape!

In [None]:
# Defining the structure of the model.
model = models.Sequential([
    layers.Conv2D(64, 7, padding='same', activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D(),

    layers.Conv2D(32, 5, padding='same', activation='relu'),
    layers.MaxPooling2D(),

    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10,  activation='softmax'),
])

Let's break this down a little!

- `layers.Conv2D()` creates a layer in our network dedicated to applying the convolution on the image. The first two integer parameters correspond to the amount of kernels we will be using, and the size of the kernel, respectively.
- `layers.MaxPooling2D()` creates another layer in the network dedicated to resizing the image using the max-pool algorithm.
- `layers.Flatten()` allows us to transition from a multi-dimensional array shape to a 1-dimensional vector.
- `layers.Dense()` creates a fully connected layer. The first integer parameter corresponds to the number of neurons in the layer. By default, the final dense layer will be counted as the output layer.

We also notice that most layers take an `activation` parameter, with the final layer being different.

- `relu` is an activation that is used during training of the network. It is especially useful in visual feature extraction, as it sets all negative values to 0, and keeps all positive values. We use it for all intermediary layers.

- `softmax` is an activation that we use at the very end of the network. It will activate the output neurons, and assign a value between $[0,1]$. We will use it for the final layer.

### Model parameters

Now that we've constructed our network, we'll have to specify the Loss function, the optimizer and the metrics of the network!

We can do so by calling the model's `compile` method:


In [None]:
model.compile(optimizer="SGD",
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Now you may be asking yourself what exactly the values to these variables are.

- `optimizer` corresponds to the algorithm that is used during gradient descent! Some are better than others, you can always search for a list of optimizers in TensorFlow! In here we have `SGD` which is short for `Stochastic Gradient Descent`, it is the standard algorithm used for gradient descent.

- `loss` corresponds to the function that will calculate the rate of error between our predicted value, and our expected value. `categorical_crossentropy` is used in this example, as it is useful for detecting different classes.

- `metrics` takes a list of different metrics to be evaluated during model training. By passing a list containing only `accuracy` we tell the model that we want to see how its accuracy evolves at each epoch.

You can use `model.summary()` once you compiled your model, to take a look at how it's configured!

In [None]:
model.summary()

### Training the model

Once we've compiled our model, we can call the `fit` method from our model with our training dataset, `X_train` and `y_train`.

We can see that we're calling this method with two other positional keywords. `epochs` and `batch_size`

- `epochs` corresponds to the amount of rounds in which we will update the network's parameters with gradient descent.
- `batch_size` during a single epoch, the network will randomly select `batch_size` amount of data from our `X_train` and `y_train` dataset, and use them to train the network for the entirety of that epoch.

We also see that we have `validation_data` set to the test dataset. The reason why we have this is because the `accuracy` metric is not representative of the model's ability to perform on new data. It shows the accuracy on data it was already trained with.

In general, we want `val_accuracy` and `accuracy` to be always on the same level. The moment our `val_accuracy` is significantly lower than the `accuracy` then our model is not generalizing well on new data. This is called ***overfitting***.

In order to have an idea of how well the network performs on new data during the training, we can use the test dataset as validation data.

Since the validation data does not interfere when tuning the model's paramters, we can also use it later on to evaluate the final model's performance.

We must preprocess the test dataset the same way as the train dataset before using it for validation.

In [None]:
# Normalize X_test
X_test = normalize(X_test)

# One-hot encoding for y_test
y_test  = tf.one_hot(y_test, depth=10)

In [None]:
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=128)

### Evaluating the model

Now that we've finished training our network, we can evaluate its performance by using the test dataset!

We evaluate our model with the `evaluate` method! We call this method with the test dataset.

This method will return a tuple containing the loss, and the accuracy.

In [None]:
model_result = model.evaluate(X_test, y_test, verbose=0)

In [None]:
print(f"With the test dataset, the model has:\n\n"
      f"An accuracy of:\t{model_result[1]*100:.2f}% on new data\n"
      f"A loss of:\t{model_result[0]:.2f}")

How can we interpret these results?

Well for the accuracy it's simple, the closer it is to 100%, the more accuracte our network is on a dataset. This indicates how well our model performs on data that it has never seen before, it is a good indicator of how generalized our model is.

The value of the loss indicates the rate of error the network made during the prediction on the dataset.

The higher it is, then the model can not generalize very well on new data.

The lower it is, the better.

Let's try and predict the value of an image from the test dataset!

In [None]:
# Get predicted labels for every image in test dataset
# Only need to run once!
y_pred = model.predict(X_test)

In [None]:
# idx - Select a random image from test dataset
idx = np.random.randint(X_test.shape[0])

# Select the index from the one hot encoded vector
# that corresponds to the number. Since we're
# dealing with MNIST, the index corresponds to 
# the number we're trying to predict.
# This is NOT the case for other datasets, such as
# cifar10!
y_test_val = tf.argmax(y_test[idx])
y_pred_val = tf.argmax(y_pred[idx])

print(f"Expected value for image {idx}:\t{y_test_val}\n"
      f"Predicted value for image {idx}:\t{y_pred_val}")

# Display randomly selected image
_ = plt.imshow(X_test[idx])

## Your turn!

We did it! We trained a CNN model to classify images from the MNIST handwritten digits dataset! Now it's your turn!
The dataset that you'll be working on is the cifar10 digits dataset.

This dataset contains images of airplanes, automobiles, birds, cats, deers, dogs, frogs, horses, ships and trucks.

We loaded the X and y train and test datasets, the rest is up to you!

If you have any questions, feel free to ask!

Good luck!


## Bonus challenge

If you ***really*** want to challenge yourself, try and make a network that scores more than **75%** on the test dataset!

Experiment with different layers, parameters, activation functions, and optimizers! Try training your model with a different batch size, or more epochs!

See what works best! You can even use data augmentation to enrich your dataset, adding more variability to your network!


Encoded value for each class for reference:

```
0. airplane
1. automobile
2. bird
3. cat
4. deer
5. dog
6. frog
7. horse
8. ship
9. truck
```

In [None]:

# Load the data
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Change shape from (50000, 1) -> (50000,)
y_train = y_train.flatten()
y_test  = y_test.flatten()

In [None]:
# Class names for easier understanding of the label
# Each index corresponds to the class
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

In [None]:
# FIXME - Check the dimensions of your data
print(f"X_train dimensions: {X_train.shape}")
print(f"y_train dimensions: {y_train.shape}")

In [None]:
# FIXME - Use matplotlib to visualize the images
idx = np.random.randint(X_train.shape[0])
label = class_names[y_train[idx]]

print(f"Image is: {label}")

_ = plt.imshow(X_train[idx])

In [None]:
# FIXME - Preprocess the train data
X_train = normalize(X_train)
y_train = tf.one_hot(y_train, depth=10)

In [None]:
# FIXME - Preprocess the test dataset
X_test = normalize(X_test)
y_test = tf.one_hot(y_test, depth=10)

In [None]:
# FIXME - Create your model's structure with models.Sequential()
model = models.Sequential([  
    layers.Conv2D(64, 5, padding='same', activation='relu'),
    layers.Conv2D(64, 5, padding='same', activation='relu'),
    layers.MaxPooling2D(),

    layers.Conv2D(32, 5, padding='same', activation='relu'),
    layers.Conv2D(32, 5, padding='same', activation='relu'),
    layers.MaxPooling2D(),
  
    layers.Flatten(),
    layers.Dense(128,  activation='relu'),
    layers.Dense(10,  activation='softmax'),
])

In [None]:
# FIXME - Compile your model, specify the optimizer, loss and metrics
model.compile(optimizer="Adam",
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# FIXME - Run the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=25, batch_size=128)

In [None]:
# FIXME - Evaluate your model on the test dataset
model_result = model.evaluate(X_test, y_test)

print(f"\nWith the test dataset, the model has:\n\n"
      f"An accuracy of:\t{model_result[1]*100:.2f}% on new data\n"
      f"A loss of:\t{model_result[0]:.2f}")