# Problem Session 11
## MNIST of Fashion I

In this notebook you will work on problems that relate to our neural network content. In particular, this material will touch on the following lecture notebooks:
- `Lectures/Neural Networks/1. Perceptrons`,
- `Lectures/Neural Networks/2. The MNIST Data Set`,
- `Lectures/Neural Networks/3. Multilayer Neural Networks` and
- `Lectures/Neural Networks/4. keras`.

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from seaborn import set_style

##### 1. Load the data

In this notebook you will work to build neural networks to classify images of common fashion items. First run the code below in order to load the data set. Then we will discuss the data more.

In [None]:
## docs: https://keras.io/api/datasets/fashion_mnist/
from keras.datasets import fashion_mnist

In [None]:
## This can take a little bit to run,
## especially if it is your first time running this code
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

label_dict = {0:"T-shirt/top",
                 1:"Trouser",
                 2:"Pullover",
                 3:"Dress",
                 4:"Coat",
                 5:"Sandal",
                 6:"Shirt",
                 7:"Sneaker",
                 8:"Bag",
                 9:"Ankle boot"}

##### 2. Learn about the data set

This data set is an equivalent of the MNIST data set, but with scans of common fashion items instead of hand-drawn instances of the digits $0-9$. The ten different items featured in this data set can be seen in the code chunk above in the `label_dict` variable.

First answer these questions, then run the prewritten code to see a few example images. 

- How many observations are in the training set? 
- How many in the test set? 
- What are the dimensions of the pixel grid for each image?

In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



In [None]:
## This plots the first 8 training images
fig,ax = plt.subplots(4,2, figsize=(20,40))

for i in range(8):
    ax[i//2, i%2].imshow(X_train[i], cmap='gray')
    ax[i//2, i%2].text(.5,.5,label_dict[y_train[i]], c="white", fontsize=16)
    ax[i//2, i%2].axis("off")
    
plt.subplots_adjust(hspace=.1,wspace=.05)
plt.show()

##### 3. Validation set

Make a validation set from the training set. Use $20\%$ of the training set. We will use this to compare neural net performance.

In [None]:
## code here



In [None]:
## code here



In [None]:
## code here



##### 4. Prepare the data

The maximum value of any pixel in these images is `255`, the minimum value of any pixel is `0`.

Using this information scale the data so that the maximum value goes to `1` and the minimum value goes to `0`.

Then reshape the array so that it is a two dimensional array, i.e. each column represents a single pixel while each row represents a single image.

In [None]:
## Scale here



## Reshape here



##### 5. Your first neural network

We will start by building a feed forward network with a single hidden layer using `keras`. 

Fill in the missing code below to build and fit this network.

In [None]:
## Import the following
from keras import models
from keras import layers
from keras import optimizers
from keras import losses
from keras import metrics
from keras.utils.np_utils import to_categorical


### If you have an earlier version of keras ###
# from keras.utils import to_categorical

In [None]:
## make an empty model here with models.Sequential
## only run this once
model1 =  

In [None]:
### ONLY Run this once, when your code is entered ###
## Add the first Dense layer, 
## give it 100 nodes
## the 'relu' acitavtion function and
## don't forget to set the input_shape
model1.add(  )


## Add the output layer
## This is a Dense layer
## it should have 10 nodes, because our data has 10 classes
## and its activation function should be the 'softmax'
model1.add(  )

In [None]:
## use model.summary to look at the architecture of your model


In [None]:
#### ONLY RUN ONCE ####
## Compile the model here
## Use the `rmsprop` optimizer,
## the 'categorical_crossentropy' loss function
## and return 'accuracy' as a metric
model1.compile(optimizer = 
                 loss = 
                 metrics = )

In [None]:
## fit the model and store the history in a variable
## train for 100 epochs,
## use a batch size of 512
## remember to apply to_categorical to y
## and include the validation_data
n_epochs = 

history1 = 

##### 6. Examine the accuracy

Plot the accuracy of the model on both the validation and training sets. Does it look like we chose enough epochs, or should we have used more than $100$? Does it look the model has started to overfit on the training data?

In [None]:
## store the history dictionary in history_dict1
history_dict1 = history1.history

In [None]:
set_style("whitegrid")

plt.figure(figsize=(14,8))

plt.scatter(range(1, n_epochs+1), 
            , 
            label="Training Data")
plt.scatter(range(1, n_epochs+1), 
            ,
            marker='v',
            label="Validation Data")

plt.xlabel("Epoch", fontsize=18)
plt.ylabel("Accuracy", fontsize=18)

plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

plt.legend(fontsize=14)


plt.show()

##### 7. A network with two layers

Now make a network with two hidden layers for this problem. You choose the architecture (the size of each of the two hidden layers). Compare the accuracies on the validation set for the first model and this model. Which one seems to perform better? Choose one of these two models.

In [None]:
## make a sequential model
model2 = 


## Add the layers

## layer 1


## layer 2



## out layer



## compile the network





## fit the network
history2 = 




history_dict2 = history2.history

In [None]:
plt.figure(figsize=(14,8))

plt.scatter(range(1,n_epochs+1), 
            , 
            label="Network 1")
plt.scatter(range(1,n_epochs+1), 
            , 
            marker='v', 
            label="Network 2")

plt.xlabel("Epoch", fontsize=18)
plt.ylabel("Accuracy", fontsize=18)

plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

plt.legend(fontsize=14)


plt.show()

##### 8. PCA preprocessing

It is possible that preprocessing the data by running it through PCA first could help.

Try a neural network fit to PCA transformed data. Note that because we scaled the pixels earlier, we do not need to run the data through `StandardScaler` prior to PCA. Make sure you remember to transform the validation set as well.

In [None]:
## import PCA

In [None]:
## make a pca object with n_componenets = .99



## fit the pca on X_tt



## transform X_tt



## transform X_val




In [None]:
## Build the network on the pca data here
## make a sequential model
model3 =  


## Add the layers

## Hidden layers here


## output layer here



## compile the network



## fit the network
history3 = 

history_dict3 = history3.history

In [None]:
## Compare the accuracy on the pca and non-pca networks
plt.figure(figsize=(14,8))

plt.scatter(range(1,n_epochs+1), 
            history_dict2['val_accuracy'], 
            label="Non-PCA network")
plt.scatter(range(1,n_epochs+1), 
            history_dict3['val_accuracy'], 
            marker='v', 
            label="PCA network")

plt.xlabel("Epoch", fontsize=18)
plt.ylabel("Accuracy", fontsize=18)

plt.xticks(fontsize=14)
plt.yticks(fontsize=14)

plt.legend(fontsize=14)


plt.show()

Which would you choose?

##### 9. Play around

Feel free to play around and build more networks here. Can you build a network that improves upon the best one you have built so far?

In [None]:
## code here




In [None]:
## code here




In [None]:
## code here




##### 10. Saving a trained model.

When you have a model that you are happy with, make a fresh version of the model and train it using the optimal number of epochs. Then run the code below to save the trained model on your computer. We will see how to load this model in a later lecture notebook.

In [None]:
## Create and train your final model here
final_model =  models.Sequential()




In [None]:
## This code will save the final model
final_model.save("PUT_IN_YOUR_MODEL_NAME_HERE")

--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2022.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)