# MNIST exercise - Now with fashion!

For this exercise we will use the more interesting MNIST dataset containing fashion articles collected from Zalando!
The great thing about this dataset is that it uses a more complex set of features (fashion articles is harder to classify than simple numbers) without wildly increasing the computational power needed for us to use it!

Though, for this exercise you are gonna build vital parts of the architecture yourself.
Feel free to use code from the previous notebook, but do type the code in manually instead of copy-paste.
Muscle memory does exist and helps a lot!

In [None]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

Now, instead of using that pesky old numbers dataset, mnist, we want to load "fashion_mnist" instead!
this means that our first job is to replace the function call from "keras.datasets.mnist.load_data()" to use the "fashion_mnist" instead.

Secondly, we want to normalize our images.
Normally an imagedatagenerator will do this for us automatically, but here we will do it manually!
Since pixels have a max value of 255, we can simply normalize with this value as the max.
Remember this only needs to be done for the features (pixels in this case) which only exist in the x_train and x_test variables!

Feel free to look at the previous MNIST notebook from today.

In [None]:
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets - CHANGE TO FASHION MNIST
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

# Scale images to the [0, 1] range - NORMALIZE HERE!
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# Original Model

Now we want to start developing the architecture for our network!
Let first try with the exact same network as with our MNIST numbers dataset!

Notice the definition of the model, which allows us to create a new variable with the name model without restarting the notebook each time.

In [None]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Notice here how we have made the validation split 20% instead of 10%

In [None]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

We reached 90% accuracy and 0.27 loss. Not bad.
Lets see if we can improve this.

## Reduced ConV model

For the next model to try out, we want to remove the second convolutional layer and maxpooling layer.
Thus the model should only consist of:

- input layer
- a 32 dimensional Conv2D layer with a kernel size of (3, 3) and an relu activation
- a MaxPooling2D layer with a pool size of (2, 2)
- a flattening layer
- a dropout layer with a rate of 0.5
- an output dense layer with the softmax activation

Feel free to look at the first model we have defined!

In [None]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.,
        layers.,
        layers.,
        layers.,
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

In [None]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Alright, we reached an accuracy of 89% and a loss of 0.29. thus, worse.

## Increased Dense Model

Hmm, lets try to increase the amount of dense layers we have instead.

Thus the model should only consist of:
- input layer
- a 32 dimensional Conv2D layer with a kernel size of (3, 3) and an relu activation
- a MaxPooling2D layer with a pool size of (2, 2)
- a 64 dimensional Conv2D layer with a kernel size of (3, 3) and an relu activation
- a MaxPooling2D layer with a pool size of (2, 2)
- a flattening layer
- a 64 node dense layer with the relu activation
- a dropout layer with a rate of 0.5
- an output dense layer with the softmax activation

I have kept the convolutional layers, such that you only need to add the dense layer

In [None]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        # Dense layer here!
        layers.,
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

In [None]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2)

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Only a minor decrease in loss from our original model!

Seems like the first model was the best (same result as the last model, but significantly decreased complexity and thus training times)!
Thus we use the exact same model as the MNIST digits dataset!
This will be our selected model moving forward!

Now that we have an architecture that seems to perform acceptably, we need to validate the results!

Of course we turn towards KFold.

Now, you have to define the KFold method, but as we don't have that much time we only want to create 3 folds!
Remember to set the random_state and to shuffle the data!

In [None]:
from sklearn.model_selection import KFold

acc_fold = []
loss_fold = []

# setup folds - USE THE KFold FUNCTION IMPORTED ABOVE!
kfold = 

fold_no = 1

In [None]:
for train, test in kfold.split(x_train,y_train):
    
    batch_size = 128
    epochs = 5
    
    # DEFINE THE SELECTED MODEL HERE! - ONLY THE HIDDEN LAYERS NEEDS TO BE DEFINED
    FoldModel = keras.Sequential(
        [
            keras.Input(shape=input_shape),
            # Model to be defined here!
            layers.Dense(num_classes, activation="softmax"),
        ]
    )
    
    
    FoldModel.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    
    # FILL IN THE CORRECT ARRAY INDEX HERE FOR X AND Y + CHANGE THE VALIDATION SIZE TO 20 %
    History = FoldModel.fit(x_train[train], y_train[train], batch_size=batch_size, epochs=epochs, validation_split=0.1, verbose=3)
    
    
    # FILL IN THE CORRECT ARRAY INDEX HERE FOR X AND Y
    scores = FoldModel.evaluate(x_train[test], y_train[test])
    
    print(f'Score for fold {fold_no}: {FoldModel.metrics_names[0]} of {scores[0]}; {FoldModel.metrics_names[1]} of {scores[1]}')
    acc_fold.append(scores[1]*100)
    loss_fold.append(scores[0])
    
    fold_no = fold_no + 1
    print("\n")
    
print("------------------------------------------------------------------------")
print("Done training")

In [None]:
# == Provide average scores ==
print('------------------------------------------------------------------------')
print('Score per fold')
for i in range(0, len(acc_fold)):
  print('------------------------------------------------------------------------')
  print(f'> Fold {i+1} - Loss: {loss_fold[i]} - Accuracy: {acc_fold[i]}%')
print('------------------------------------------------------------------------')
print('Average scores for all folds:')
print(f'> Accuracy: {np.mean(acc_fold)} (+- {np.std(acc_fold)})')
print(f'> Loss: {np.mean(loss_fold)}')
print('------------------------------------------------------------------------')