<a href="https://colab.research.google.com/github/RachelRamirez/FashionMNIST_DataAugmentation/blob/main/Fashion_mnist_convnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# "Simple MNIST convnet" Architecture with Extremely Limited Data used to Train CNN

**Original Author as applied to MNIST (Numbers):** [fchollet](https://twitter.com/fchollet)<br>
**Date created:** 2015/06/19<br>
**Last modified:** 2020/04/21<br>
**Applied to Fashion MNIST** 2021/08/25

**Description:** A simple convnet that achieves ~90% test accuracy on MNIST, is applied to the Fashion MNIST.

Then data is limited to 1000 training samples to view the effects of data augmentation on increasing model accuracy.

More features are added for residual/error analysis such as confusion matrix and data augmentation.

# Experiment

The first part of this experiment is looking at the space for TrainingSize, ValidationSize, BatchSize, and EpochSize.  These all likely effect one another (40way interactions) so a screening design is used to look for maineffects.

A: BatchSize

B: Epochs

C: Training Size (Count)

D: Validation (as percentage of Training Size)

In [1]:
## Experiment 1 Variables

var_BatchSize =  10,  100
var_Epochs =     15,  50
var_TrainSize = 100,  1000
var_ValPercent = 0.2, 0.5


 

## Setup

In [2]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sn

## Prepare the data

In [3]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()

x_train = x_train[0:1000]
y_train = y_train[0:1000]

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (1000, 28, 28, 1)
1000 train samples
10000 test samples


## Build the model

In [4]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                1

# Insert DOE Variables -> Train Model

In [22]:
# scorelist = []
# for b in var_BatchSize:
#   for e in var_Epochs:
#      for t in var_TrainSize:
#        for v in var_ValPercent:
#           scorelist += [[b,e,t,v, 100, "   " ]]



# display(scorelist)

[[10, 15, 100, 0.2, 100, '   '],
 [10, 15, 100, 0.5, 100, '   '],
 [10, 15, 1000, 0.2, 100, '   '],
 [10, 15, 1000, 0.5, 100, '   '],
 [10, 50, 100, 0.2, 100, '   '],
 [10, 50, 100, 0.5, 100, '   '],
 [10, 50, 1000, 0.2, 100, '   '],
 [10, 50, 1000, 0.5, 100, '   '],
 [100, 15, 100, 0.2, 100, '   '],
 [100, 15, 100, 0.5, 100, '   '],
 [100, 15, 1000, 0.2, 100, '   '],
 [100, 15, 1000, 0.5, 100, '   '],
 [100, 50, 100, 0.2, 100, '   '],
 [100, 50, 100, 0.5, 100, '   '],
 [100, 50, 1000, 0.2, 100, '   '],
 [100, 50, 1000, 0.5, 100, '   ']]

## Train the model

In [25]:
counter = 1
scorelist = []

for b in var_BatchSize:
  for e in var_Epochs:
     for t in var_TrainSize:
       for v in var_ValPercent:
        
        batch_size = b
        epochs = e
        x_train = x_train[0:t]
        y_train = y_train[0:t]

        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=v, verbose=0)

        score = model.evaluate(x_test, y_test, verbose=0)
        # print("Test loss:", score[0])
        # print("Test accuracy:", score[1])
        scorelist += [[counter, b,  e,   t,   v,   score[0], score[1] ]]
        print("This is run", counter, ": ", b, e, t, v,  "had accuracy: ", score[1], " test loss: ", score[0])
        counter=counter+1

This is run 1 :  10 15 100 0.2 had accuracy:  0.6973000168800354  test loss:  1.6851215362548828
This is run 2 :  10 15 100 0.5 had accuracy:  0.713699996471405  test loss:  1.832808256149292
This is run 3 :  10 15 1000 0.2 had accuracy:  0.7027999758720398  test loss:  1.855257272720337
This is run 4 :  10 15 1000 0.5 had accuracy:  0.7038000226020813  test loss:  2.1999452114105225
This is run 5 :  10 50 100 0.2 had accuracy:  0.6995000243186951  test loss:  2.167445182800293
This is run 6 :  10 50 100 0.5 had accuracy:  0.7178000211715698  test loss:  2.3670055866241455
This is run 7 :  10 50 1000 0.2 had accuracy:  0.6883000135421753  test loss:  2.628786087036133
This is run 8 :  10 50 1000 0.5 had accuracy:  0.7113999724388123  test loss:  3.0950543880462646
This is run 9 :  100 15 100 0.2 had accuracy:  0.692799985408783  test loss:  2.929530382156372
This is run 10 :  100 15 100 0.5 had accuracy:  0.6934000253677368  test loss:  3.111909866333008
This is run 11 :  100 15 1000 0

In [31]:
domains.to_csv("path_to_local_git_folder/domains.csv")

NameError: ignored

In [None]:
# batch_size = 50
# epochs = 30

# model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

# model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.5)

## Evaluate the trained model

In [None]:
          score = model.evaluate(x_test, y_test, verbose=1)
          print("Test loss:", score[0])
          print("Test accuracy:", score[1])

In [None]:
predictions = model.predict(x_test)
predictions = np.argmax(predictions, axis=1)
 
y_test = np.argmax(y_test, axis=1) 

confusion_matrix = tf.math.confusion_matrix(y_test, predictions)  #First Variable is on VERTICAL, second Variable is on X HORIZONTAL
#confusion_matrix = tf.math.confusion_matrix(predictions, tf.Variable(np.ones(predictions.shape)))

f, ax = plt.subplots(figsize=(9, 7))
sn.heatmap(
    confusion_matrix,
    annot=True,
    linewidths=.5,
    fmt="d",
    square=True,
    ax=ax
)
plt.show()


In my first run, no data augmentation, 100 random samples of the Training Set, using 50% of that as the Validation Set, using a Batch Size of 50 and Epochs = 30, the resulting accuracy of the test-set is Test loss: 0.912209689617157
Test accuracy: 0.6866999864578247

In [None]:
## I want to pick the largest value on the confusion matrix not on the diagonal
confusing_part_matrix= np.array(confusion_matrix)-np.identity(confusion_matrix.shape[0])*np.diag(confusion_matrix)

confusing_part_matrix = tf.convert_to_tensor(confusing_part_matrix)

f, ax = plt.subplots(figsize=(9, 7))
sn.heatmap(
    confusing_part_matrix,
    annot=True,
    linewidths=.5
    #,fmt="d"
    #,square=True
    #,ax=ax
)
plt.show()


## find the max value of those remaining numbers
thisnumber=np.max(confusing_part_matrix)

x_thisnumber=np.argmax(confusing_part_matrix,axis=0)
y_thisnumber=np.argmax(confusing_part_matrix,axis=1)
z_thisnumber=np.argmax(confusing_part_matrix)
display(x_thisnumber,y_thisnumber)
#print("The worst the algorithm did is between " confus)
 

In [None]:
confusing_part_matrix[0]

In [None]:
z_thisnumber=np.argmax(confusing_part_matrix)
z_thisnumber
print("So the most confused classes were between: " , np.math.floor(z_thisnumber/10), " a",  LABEL_NAMES[np.math.floor(z_thisnumber/10)] , " and  ", z_thisnumber%10, " a ",  
LABEL_NAMES[z_thisnumber%10])



In [None]:

LABEL_NAMES = ['t_shirt', 'trouser', 'pullover', 'dress', 'coat', 'sandal', 'shirt', 'sneaker', 'bag', 'ankle_boots']

delta = predictions - y_test

wrong = tf.boolean_mask(predictions, delta)
print(wrong)

#  if i != 0
#    display("Prediction ", i, " is " + LABEL_NAMES[predictions[i]], "but it is ",  LABEL_NAMES[y_test[i] )


In [None]:
plt.imshow(x_test[0].reshape((28,28)), cmap=plt.cm.binary)
plt.show()