<a href="https://colab.research.google.com/github/Bnibling/Thinkful_Data_Science_Immersion_Portfolio/blob/main/DL_Challenge_Fashion_MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Learning Challenge - ANN models of [fashion MNIST](https://github.com/zalandoresearch/fashion-mnist)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD, RMSprop, Adam, Adagrad

from sklearn.metrics import confusion_matrix
import itertools

1. Preprocess your data so that you can feed it into ANN models.
2. Split your data into training and test sets.

In [2]:
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

input_dim = 784 #28x28
output_dim = num_classes = 10 #number of classes 0-9
batch_size = 128
num_epochs = 20

X_train = X_train.reshape(60000,784).astype('float32')
X_test = X_test.reshape(10000, input_dim).astype('float32')
X_train /= 255
X_test /= 255

y_train = to_categorical(y_train, num_classes)
y_test= to_categorical(y_test, num_classes)

3. Try different ANN models and train them on your training set.

In [3]:
def build_3_layer_model(
    neuron_list=np.array([512, 256]),
    neuron_multiplier = 1,
    opt='adam', 
    activation='relu', 
    loss='categorical_crossentropy',
    batch_size=128,
    ):
  model = Sequential()

  model.add(Dense(neuron_list[0]*neuron_multiplier, input_dim=X_train.shape[1], activation=activation))
  model.add(Dense(neuron_list[1]*neuron_multiplier, activation=activation))
  model.add(Dense(output_dim, activation='softmax'))
  model.compile(optimizer=opt, loss=loss, 
              metrics=['accuracy'])
  
  print(f'{neuron_multiplier}_{activation}_{batch_size}_{opt}_3_model')
  return model

In [4]:
def build_4_layer_model(
    neuron_list=np.array([512, 256, 128]),
    neuron_multiplier = 1,
    opt='adam', 
    activation='relu', 
    loss='categorical_crossentropy',
    batch_size=128,
    ):
  model = Sequential()

  model.add(Dense(neuron_list[0]*neuron_multiplier, input_dim=X_train.shape[1], activation=activation))
  model.add(Dense(neuron_list[1]*neuron_multiplier, activation=activation))
  model.add(Dense(neuron_list[2]*neuron_multiplier, activation=activation))
  model.add(Dense(output_dim, activation='softmax'))
  model.compile(optimizer=opt, loss=loss, 
              metrics=['accuracy'])
  
  print(f'{neuron_multiplier}_{activation}_{batch_size}_{opt}_4_model')
  return model

In [5]:
def build_5_layer_model(
    neuron_list=np.array([512, 256, 128, 64]),
    neuron_multiplier = 1,
    opt='adam', 
    activation='relu', 
    loss='categorical_crossentropy',
    batch_size=128,
    ):
  model = Sequential()

  model.add(Dense(neuron_list[0]*neuron_multiplier, input_dim=X_train.shape[1], activation=activation))
  model.add(Dense(neuron_list[1]*neuron_multiplier, activation=activation))
  model.add(Dense(neuron_list[2]*neuron_multiplier, activation=activation))
  model.add(Dense(neuron_list[3]*neuron_multiplier, activation=activation))
  model.add(Dense(output_dim, activation='softmax'))
  model.compile(optimizer=opt, loss=loss, 
              metrics=['accuracy'])
  
  print(f'{neuron_multiplier}_{activation}_{batch_size}_{opt}_5_model')
  return model

In [6]:
neuron_multipliers = [0.5, 1, 2]
activations = ['relu', 'tanh']
batch_sizes = [64, 128, 256, X_train.shape[0]]
optimizers = ['sgd', 'adam']

In [7]:
ANN_fashion_models = {f'{neuron_multiplier}_{activation}_{batch_size}_{opt}_{i+3}_model':build_model(
              neuron_multiplier=neuron_multiplier, 
              batch_size=batch_size,
              opt=opt,
              activation=activation,
              loss='categorical_crossentropy',  
              ).fit(X_train, y_train, validation_data=(X_test, y_test),
              batch_size=batch_size, epochs=num_epochs, verbose=0) 
              for opt in optimizers 
              for batch_size in batch_sizes 
              for activation in activations 
              for neuron_multiplier in neuron_multipliers
              for i, build_model in enumerate([build_3_layer_model, build_4_layer_model, build_5_layer_model])}

0.5_relu_64_sgd_3_model
0.5_relu_64_sgd_4_model
0.5_relu_64_sgd_5_model
1_relu_64_sgd_3_model
1_relu_64_sgd_4_model
1_relu_64_sgd_5_model
2_relu_64_sgd_3_model
2_relu_64_sgd_4_model
2_relu_64_sgd_5_model
0.5_tanh_64_sgd_3_model
0.5_tanh_64_sgd_4_model
0.5_tanh_64_sgd_5_model
1_tanh_64_sgd_3_model
1_tanh_64_sgd_4_model
1_tanh_64_sgd_5_model
2_tanh_64_sgd_3_model
2_tanh_64_sgd_4_model
2_tanh_64_sgd_5_model
0.5_relu_128_sgd_3_model
0.5_relu_128_sgd_4_model
0.5_relu_128_sgd_5_model
1_relu_128_sgd_3_model
1_relu_128_sgd_4_model
1_relu_128_sgd_5_model
2_relu_128_sgd_3_model
2_relu_128_sgd_4_model
2_relu_128_sgd_5_model
0.5_tanh_128_sgd_3_model
0.5_tanh_128_sgd_4_model
0.5_tanh_128_sgd_5_model
1_tanh_128_sgd_3_model
1_tanh_128_sgd_4_model
1_tanh_128_sgd_5_model
2_tanh_128_sgd_3_model
2_tanh_128_sgd_4_model
2_tanh_128_sgd_5_model
0.5_relu_256_sgd_3_model
0.5_relu_256_sgd_4_model
0.5_relu_256_sgd_5_model
1_relu_256_sgd_3_model
1_relu_256_sgd_4_model
1_relu_256_sgd_5_model
2_relu_256_sgd_3_model

4. Compare your models' training scores and interpret your results.
5. Evaluate how your models perform on your test set. Compare the results of your models.

In [None]:
model_scores = {model_name:{metric:score[-1] for metric, score in model.history.items()} for model_name, model in ANN_fashion_models.items()}
model_scores

In [None]:
scores_df = pd.DataFrame.from_dict(model_scores, orient='index')

In [24]:
scores_df.sort_values(by=['accuracy', 'loss'], ascending=False).head(10).style.background_gradient()

Unnamed: 0,loss,accuracy,val_loss,val_accuracy
2_relu_256_adam_3_model,0.142256,0.946133,0.350847,0.8958
2_relu_256_adam_4_model,0.141142,0.946017,0.359076,0.8922
2_relu_64_adam_3_model,0.139447,0.945417,0.406653,0.8953
2_relu_128_adam_3_model,0.139479,0.945033,0.345708,0.8961
2_tanh_256_adam_4_model,0.147131,0.945,0.328712,0.8927
1_relu_256_adam_3_model,0.144198,0.944917,0.379712,0.8866
1_relu_128_adam_3_model,0.145765,0.94465,0.357992,0.8933
1_tanh_256_adam_4_model,0.148014,0.944367,0.326668,0.8909
2_relu_256_adam_5_model,0.14547,0.943883,0.391217,0.8887
2_tanh_256_adam_3_model,0.151043,0.943783,0.323376,0.8919


From the table above, we can see that all of the top performing models have comparable scores across all four metrics. All of the top 10 models were trained using the `adam` optimizer and most have a neuron multiplier of 2 and utilize `relu` activation. 

In [25]:
scores_df.sort_values(by=['val_accuracy', 'val_loss'], ascending=False).head(10).style.background_gradient()

Unnamed: 0,loss,accuracy,val_loss,val_accuracy
2_relu_64_adam_5_model,0.159214,0.938933,0.365093,0.8987
0.5_relu_64_adam_3_model,0.155123,0.940217,0.35189,0.8965
2_relu_128_adam_5_model,0.15283,0.941167,0.36545,0.8964
2_relu_128_adam_3_model,0.139479,0.945033,0.345708,0.8961
1_relu_256_adam_4_model,0.15151,0.942267,0.337883,0.896
2_relu_64_adam_4_model,0.151653,0.9417,0.416766,0.8959
2_relu_256_adam_3_model,0.142256,0.946133,0.350847,0.8958
1_relu_128_adam_4_model,0.154713,0.940183,0.341279,0.8957
1_relu_128_adam_5_model,0.160879,0.93815,0.350156,0.8956
2_relu_64_adam_3_model,0.139447,0.945417,0.406653,0.8953


Similarly to the training results, when sorted by test scores, all the top 10 models were trained with the `adam` optimizer, and there is no `full-batch` models.

Given that the score values are all so close, let's look at how each individual modifier influenced the model results.

In [None]:
modifiers = neuron_multipliers + activations + batch_sizes + optimizers

In [32]:
modifier_scores = {modifier: scores_df.filter(like=str(modifier), axis=0).mean() for modifier in modifiers}

In [36]:
modifier_scores_df = pd.DataFrame.from_dict(modifier_scores, orient='index')
modifier_scores_df.sort_values(by=['val_accuracy', 'accuracy'], ascending=False, inplace=True)

In [40]:
modifier_scores_df.style.background_gradient()

Unnamed: 0,loss,accuracy,val_loss,val_accuracy
64,0.241947,0.911098,0.36505,0.877142
128,0.254399,0.907469,0.37214,0.874394
256,0.273608,0.901954,0.3909,0.867081
adam,0.284415,0.896953,0.42316,0.860319
2,0.373028,0.875535,0.475971,0.84589
1,0.416663,0.860102,0.516085,0.831694
tanh,0.484754,0.845372,0.560458,0.820535
0.5,0.525267,0.827776,0.610651,0.802667
relu,0.516689,0.828943,0.618618,0.802501
sgd,0.717029,0.777362,0.755916,0.762717


As we can see here, using a `full_batch` batch size greatly reduces the accuracies and increases the loss of the models, while other batch sizes seem to be comparable. Therefore it would be my suggestion to use a batch size of 128 or 256, so computation speed is faster as lower batch sizes don't increase the results substantially.

As for optimizers, `adam` is the clear winner here over `sgd`. However, these were all default values. Further parameter optimization is needed, such as adjusting learning rate and momentum for `sgd` to truly tailor the model. But for our adhoc purposes, `adam` is my suggestion.

For the neuron multiplier, that increases/decreases the number of nodes per layer, shows that more nodes per layer results in a slightly better outcome than fewer nodes, with clear diminishing returns on higher node values. Using a multiplier of 2 or 1 is ideal.

As for the activation function, both `relu` and `tanh` performed similiarly. Strangely, the table shows a higher mean value for `tanh` but mostly all the top 10 models in the previous tables had `relu` as the activation parameter. IT would seem that both methods are exceptable for this dataset.