<center><img src="https://raw.githubusercontent.com/bazylip/gradient-live-session/main/lab3/img/logo.png" width=100 heigth=100></center>

<center><h1><b>Introduction to CNNs</b></h1></center>
<center><h2><b>Gradient PG, 2021</b></h2></center>
<center><h4><b>Marcin Walkowski</b></h4></center>

---
<img src="https://raw.githubusercontent.com/bazylip/gradient-live-session/main/lab4/img/colab.png" width=30%>

<a href="https://colab.research.google.com/github/bazylip/gradient-live-session/blob/main/lab4/introduction_to_cnns.ipynb">Run in Google Colab</a>


# Fashion MNIST Classifier

In this notebook, we will return to the problem of image classification. This time, however, we will use the popular Fashion MNIST dataset, which consists of photos of clothing items. We will try an approach based on the architecture typical for computer vision problems. Using the Keras library, we will build simple, convolutional neural networks based classifier. In this notebook, we will use the knowledge gained from the <a href="https://colab.research.google.com/github/bazylip/gradient-live-session/blob/main/lab3/introduction_to_deep_learning.ipynb">previous hands-on</a>.

Remember to complete all sections marked with #TODO!

# 1. Prepare the environment

Before you get to work, go to ```Runtime``` → ```Change runtime type``` tab and select ```GPU```. Hardware acceleration will speed up code execution. This is particularly important in computer vision problems, where data are images and deep architectures are used.

If you want to start the whole notebook from scratch, press ```Ctrl+F9```. Pressing ```Ctrl+Enter``` will only run the cell in which the cursor is currently located.

Let's start by downloading the packages we need. Like last time, we will use Keras to build our classifier model. We will use NumPy for mathematical operations, and the PyPlot package for visualization.

In [None]:
%tensorflow_version 2.x
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np

#2. Explore the dataset

The Fashion MNIST dataset comes with the Keras library. The dataset consists of pictures of clothing items and labels in the form of numbers representing specific categories of clothing. Keras provides a dataset already split into training and test subsets.

**Reminder: The training set is used only to train our model. The test set is intended for the evaluation of the obtained model. It cannot be used for training!**

In [None]:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.fashion_mnist.load_data()

In [None]:
class_names = [
    'T-shirt/top',
    'Trouser',
    'Pullover',
    'Dress',
    'Coat',
    'Sandal',
    'Shirt',
    'Sneaker',
    'Bag',
    'Ankle boot'
]

Let's begin with some data exploration.

In [None]:
print(f"Train dataset shape: {train_images.shape}")
print(f"Test dataset shape: {test_images.shape}")
print(f"Minimal value: {np.min(train_images)}")
print(f"Maximal value: {np.max(train_images)}")
print("-"*20)
print(f"Train labels: {train_labels}")
print(f"Test labels: {test_labels}")
print(f"Train examples: {len(train_labels)}")
print(f"Test examples: {len(test_labels)}")

In [None]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

As you can see, the data structure is similar to the original MNIST dataset. The images are represented as 28x28 monochrome bitmaps, and the labels are numbers between 0 and 9.


Like the last time, we normalize the pixel values ​​to the range from 0.0 to 1.0 and write them as float32.

In [None]:
train_images = #TODO normalize train images
test_images = #TODO normalize test images

print(f"New minimal value: {np.min(train_images)}")
print(f"New maximal value: {np.max(train_images)}")

In [None]:
plt.figure()
plt.imshow(train_images[0], cmap=plt.cm.binary)
plt.colorbar()
plt.grid(False)
plt.title(class_names[train_labels[0]])
plt.show()

To pass the images as input to the convolutional layer, we need to extend them with an additional dimension representing depth. This is because the implementation of convolutional layers in Keras is adapted to processing, more complex data, like colour images (RGB), where the pixel is encoded with three values. The images from the Fashion MNIST set are monochromatic, so the number of channels will be one.


In [None]:
train_images = np.expand_dims(train_images, -1)
test_images = np.expand_dims(test_images, -1)

print(f"New train image shape: {train_images[0].shape}")
print(f"New test image shape: {test_images[0].shape}")

The size one extra dimension is often called the dummy dimension.

# 3. Create the network
Our model will consist of two parts. The first part is responsible for creating the image representation vector using the convolution operation. The second part is a dense classifier, which task is to predict the label for the obtained representation.

<img src="https://raw.githubusercontent.com/bazylip/gradient-live-session/main/lab4/img/nn.jpg">

We will use the ```keras.Sequential``` type to build our model.

In [None]:
model = keras.Sequential()

input_image_shape = (28, 28, 1)
model.add(keras.Input(shape=input_image_shape))

We explicitly define the size of the input data by adding the ```keras.Input()``` object to the model.



## 3.1 Convolution layer

The convolution operation applies a filter/kernel to the input data. A kernel multiplies the input data area by the learned weights and then sums them up. The purpose of the operation is to extract information about high-level features, such as edges, from the image. By shifting the kernel by some step, we create a new data representation containing information about high-level features. By applying many different filters we can create multiple feature maps of the input image.

<img src="https://raw.githubusercontent.com/bazylip/gradient-live-session/main/lab4/img/conv.gif">

<i>Image source: <a href="https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53">A Comprehensive Guide to Convolutional Neural Networks</a></i>

Keras provides the implementation of the convolutional layer. Our task is to provide the correct parameters. We will use 32 filters of 3x3 size. As the default filter step is one, the operation will produce 32 feature maps of the input data with a size of 26x26 each. We will use the ReLU activation function. It is currently the most popular activation function used in convolutional networks

In [None]:
conv_layer_1 = keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu")

model.add(conv_layer_1)

## 3.2 Pooling layer

The purpose of the pooling layer is to reduce the spatial size of data. We want to reduce the computation time and extract only the dominant image features. Similarly to the convolution operation, a filter/kernel is shifted over the data. For Max-Pooling, the highest activation is selected from the kernel area. For Average-Pooling, we use the average activation value.

<img src="https://raw.githubusercontent.com/bazylip/gradient-live-session/main/lab4\/img/pool.gif">

<i>Image source: <a href="https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53">A Comprehensive Guide to Convolutional Neural Networks</a></i>

We will use the Max-Pooling operation available in Keras. We set the kernel size to 2x2. The default step is equal to the kernel width/height. The feature map size will be reduced from 26x26 to 13x13. Remembering that we are operating on 32 feature maps, the shape of the output data will be 13x13x32.

In [None]:
pool_layer_1 = keras.layers.MaxPooling2D(pool_size=(2, 2))

model.add(pool_layer_1)

Now let's add an additional convolution and pooling layer to our model.

In [None]:
#TODO Add Conv2D layer (I suggest 3x3 kernel and 64 filters, but fill free to experiment)
#TODO Add MaxPool2D layer (I suggest 2x2 kernel)

# model.add(conv_layer_2)
# model.add(pool_layer_2)

## 3.3 Dense classifier

After convolutional operations, we obtained a new high-level representation of the data. We will use the fully connected network to classify it. To pass processed data to the dense layer, we have to transform it into a vector. Operation Flatten available in Keras "flattens" the matrix to one dimension.



In [None]:
model.add(keras.layers.Flatten())


The Dense layer should consist of 10 nodes and use the Softmax activation function so that the output on the nth node corresponds to the probability of the nth class.

In [None]:
num_classes = 10
#TODO Add Dense layer
# model.add(dense_layer)

Our simple classifier model is now ready! As you may have noticed, even building more complex architectures with ML libraries can be fast. For a better understanding of CNN's architecture, I recommend reading thru the article <a href="https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53">"A Comprehensive Guide to Convolutional Neural Networks"</a>.

In [None]:
model.summary()

# 4. Train the model

To speed up the training process we will use the ```model.compile()``` method. As we are dealing with a classification problem, we will use the Categorical Crossentropy loss function. Due to the format of our labels, which are encoded as integers, we will use the ```keras.losses.sparse_categorical_crossentropy``` implementation.

In [None]:
loss_fn = #TODO Create loss function
optimizer = #TODO Pick optimizer (Adam is a common choice)

model.compile(loss=loss_fn, optimizer=optimizer, metrics=["accuracy"])

It's time to train our model! Adjust the number of epochs and the mini-batch size (15 epochs and batch size 128 is a good starting point).

In [None]:
batch_size = #TODO 
epochs = #TODO
model.fit(train_images, train_labels, batch_size=batch_size, epochs=epochs)

Finally, let's use the ```model.evaluate()``` method to see how our network deals with the test set.

In [None]:
score = #TODO Evalutae model on the test set
print(f"Test loss: {score[0]}")
print(f"Test accuracy: {score[1]*100:.2f}%")

If you have completed all the #TODO sections correctly, you should get an accuracy of around 90% on the test set. It is a good result but leaves room for improvement. Try different network parameter configurations and see if you can improve this score.

# 5. Use the trained model

Use the code below to see how your trained model works. Each time you execute the cell below, the image is sampled from the test set, and the model predicts the class.

In [None]:
# Sample image index
index = np.random.randint(0, len(test_images))
img = np.expand_dims(test_images[index], 0)
# Predict label
predictions_array = model.predict(img).squeeze()
predicted_label = np.argmax(predictions_array)
true_label = test_labels[index]

plt.figure(figsize=(10,5))

# Plot image
plt.subplot(1,2,1)
plt.grid(False)
plt.xticks([])
plt.yticks([])

plt.imshow(img.squeeze(), cmap=plt.cm.binary)
if predicted_label == true_label:
  color = 'blue'
else:
  color = 'red'

plt.xlabel(
    f"{class_names[predicted_label]} " + 
    f"{100*np.max(predictions_array):2.0f}% " +
    f"({class_names[true_label]})",
    color=color
    )

# Plot probabilities
plt.subplot(1,2,2)
plt.grid(False)
plt.xticks(range(10), class_names, rotation=45)
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)

thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')

plt.show()

# 6. Experiment... with more complex data!

Now that you know how to quickly build an image classification model try your hand with more complex data. The CIFAR10 dataset, available in Keras, contains pictures representing one of the ten types of objects.

In [None]:
(cifar_train_images, cifar_train_labels), (cifar_test_images, cifar_test_labels) = keras.datasets.cifar10.load_data()

cifar_class_names = [
    'Airplane',
    'Automobile',
    'Bird',
    'Cat',
    'Deer',
    'Dog',
    'Frog',
    'Horse',
    'Ship',
    'Truck'
]

The dataset format is similar to Fashion MNIST, except that the images are 32x32 and are in colour (pixel encoded with three values).

In [None]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(cifar_train_images[i], cmap=plt.cm.binary)
    plt.xlabel(cifar_class_names[cifar_train_labels[i][0]])
plt.show()