# Fashion MNIST

🎯 <b><u>Goals:</u></b>
- Working on the [**`Fashion MNIST`**](https://github.com/zalandoresearch/fashion-mnist) created by [Zalando Research](https://research.zalando.com/)
- Revisit all the main concepts of Convolutional Neural Networks

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

❓ **Question: Load the fashion MNIST dataset** ❓

In [None]:
# YOUR CODE HERE

## (1) Image preprocessing

❓ **Questions: Image preprocessing** ❓

1. Look at the shape of your images. Are they ready to be fed to a CNN ? If no, perform the necessary operation(s)
2. What kind of task are you trying to achieve ? Encode what needs to be encoded.
3. Last but not least, do you need to scale your dataset ?

In [None]:
# YOUR CODE HERE

## (2) Data exploration

👩🏻‍🏫  Great, your preprocessing is down.

👇 By the way, we collected the names of the labels for you directly from [**github/zalandoresearch/fashion-mnist/labels**](https://github.com/zalandoresearch/fashion-mnist#labels)

In [None]:
labels = ["Tshirt",
          "Trouser",
          "Pullover",
          "Dress",
          "Coat",
          "Sandal",
          "Shirt",
          "Sneaker",
          "Bag",
          "Ankle boot"]

❓ **Question: visualizing your images** ❓

Before we move on to the CNN, print some pictures randomly.

In [None]:
# YOUR CODE HERE

## (3) Working on a small dataset

🔋 The idea is that it takes to much time to train a CNN on 60K pictures. Let's work on a subset of it to build and iterate over the CNN architecture and only when we are satisfied, we can train the CNN on the full training set.

In [None]:
# YOUR CODE HERE

## (4) Building and Training CNN Model


❓ **Question: Building your own CNN architecture** ❓

Build a CNN with less than 0.5M parameters.

In [None]:
from tensorflow.keras import Sequential, layers, models


def my_cnn():
    
    pass  # YOUR CODE HERE

❓ **Question: Train your model** ❓

And don't forget to have a look at the convergences of the loss functions !

In [None]:
# YOUR CODE HERE

<details>
    <summary><i>How many sub-iterations do we have per epoch here ? </i></summary>
    
* We had 60 000 pictures and selected 10% of it in the train set $\implies 6 000$ training pictures
* We are using a batch size of 32, hence $ \lceil \frac{6 000}{32} \rceil = 187.5 $ rounded up to 188 mini-batches
* We set the validation split to 0.2, hence $ 187.5 * 0.8 = 150$ mini-batches for  the train-train set.
</details>

🎁 We coded a function `plot_loss_accuracy` for you. Feel free to use it.

In [None]:
def plot_loss_accuracy(history):
    
    with plt.style.context('seaborn-deep'):
        
        fig, ax = plt.subplots(1,2,figsize=(15,4))
    
        ## Plot Losses and Accuracies
        x_axis = np.arange(len(history.history['loss']))
        
        ax[0].set_title("Loss")
        ax[0].plot(x_axis, history.history['loss'], color = "blue", linestyle = ":", marker = "X", label = "Train Loss")
        ax[0].plot(x_axis, history.history['val_loss'], color = "orange", linestyle = "-", marker = "X", label = "Val Loss")
        
        ax[1].set_title("Accuracy")
        ax[1].plot(x_axis, history.history['accuracy'], color = "blue", linestyle = ":", marker = "X", label = "Train Accuracy")
        ax[1].plot(x_axis, history.history['val_accuracy'], color = "orange", linestyle = "-", marker = "X", label = "Val Accuracy")    
        
        ## Customization
        ax[0].grid(axis = "x", linewidth = 0.5)
        ax[0].grid(axis = "y", linewidth = 0.5)
        ax[0].legend()        
        ax[1].grid(axis = "x", linewidth = 0.5)
        ax[1].grid(axis = "y", linewidth = 0.5)     
        ax[1].legend()
        
        plt.show()

In [None]:
# YOUR CODE HERE

❓ **Question: Evaluate your CNN** ❓

In [None]:
# YOUR CODE HERE

❓ **Question: let's have a look at the confusion matrix** ❓

<b><u>Step 1</u></b>: Compute the probabilities of belonging to one of the ten classes for each picture

In [None]:
# YOUR CODE HERE

<b><u>Step 2</u></b>: Compute the predicted class for each picture

In [None]:
# YOUR CODE HERE

<b><u>Step 3a</u></b>: Finally, you are now able to compute the Confusion Matrix

In [None]:
# YOUR CODE HERE

<b><u>Step 3b</u></b>: For each class, how many occurences were your model able to capture ? (i.e. we are focusing on the recall for each class)

In [None]:
cm_normalized_by_true_class = cm / cm.sum(axis = 0)

disp_normalized_by_true_class = ConfusionMatrixDisplay(confusion_matrix = cm_normalized_by_true_class,
                                                       display_labels = labels);

fig, ax = plt.subplots(figsize=(10,10));
disp_normalized_by_true_class.plot(cmap = "PuRd", ax = ax);

❓ **Question: Do you think it would be useful to try a  VGG16**❓

<details>
    <summary>Answer</summary>

* It would be overkill as we already reached a great performance with a homemade CNN
* VGG16 was trained on RGB pictures... with our grayscaled pictures, we could [this trick](https://stackoverflow.com/questions/51995977/how-can-i-use-a-pre-trained-neural-network-with-grayscale-images) which consists in repeating the black-and-white channel twice but again it is overkill
* Moreover, the pictures are too small $ 28 \times 28 $ which doesn't fit the minimum size of $ 28 \times 28 $ required to use VGG16

</details>

<u>If you worked on this challenge locally:</u>


🏁 **Congratulations! You made it and demystified CNN!** 🏁 

💾 Don't forget to `git add/commit/push` your notebook...

🚀 ... and move on to the next challenge !

---

<u>If you worked on this challenge on Google Colab:</u>


🏁 **Congratulations! You made it and demystified CNN!** 🏁 

1. Download this notebook from your `Google Drive` or directly from `Google Colab` 
2. Drag-and-drop it from your `Downloads` folder to your local `[GITHUB_USERNAME]/data-challenges/06-Deep-Learning/03-Convolutional-Neural-Networks/01-mnist-classification`


💾 Don't forget to push 

3. Follow the usual procedure on your terminal in the `06-Deep-Learning/03-Convolutional-Neural-Networks/01-mnist-classification` folder:
      * *git add mnist_classification.ipynb*
      * *git commit -m "I think I am falling in love with CNN"*
      * *git push origin master*

*Hint*: To find where this Colab notebook has been saved, click on `File` $\rightarrow$ `Locate in Drive`.

🚀 ... and move on to the next challenge !