# Important Notes

- Each model has one that is tuned by Keras and one that I plugged the tuned values into. The Keras-Tuner models will be in the tuner.ipynb file while my tuned submission models will be in this file.
- Keras doesn't natively support batch size optimizations so I had to optimize them with for loops. Therefore, I wasn't able to put batch size in the combination of things Keras tested. This can cause less than optimal results.
- Each model trained to a 98% validation accuracy after 5 epochs.

# Imports

In [1]:
# Load the TensorBoard notebook extension.
# %load_ext tensorboard

from datetime import datetime

import tensorflow as tf
from tensorflow import keras
import tensorflow.keras.layers as Layer

import tensorboard
import matplotlib.pyplot as plt

import numpy as np

import keras_tuner as kt

import time


# Data Portion

In [2]:
# load the dataset and normalize
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
train_images = train_images / 255.0

x_train = train_images.reshape(-1, 28, 28, 1) #add an additional dimension to represent the single-channel
x_test = test_images.reshape(-1, 28, 28, 1)

np.random.seed(1)
tf.random.set_seed(1)

# Regular Model

In [3]:

LOG_DIR = f"{int(time.time())}"

regular_model = tf.keras.models.Sequential(name='CNN_Increasing_Filters')

regular_model.add(Layer.Conv2D(8, (3, 3), padding='same', activation='relu')) 
regular_model.add(Layer.Conv2D(9, (3, 3), padding='same', activation='relu'))
regular_model.add(Layer.Conv2D(10, (3, 3),padding='same', activation='relu'))
regular_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

regular_model.add(Layer.Conv2D(20, (3, 3), padding='same', activation='relu'))
regular_model.add(Layer.Conv2D(21, (3, 3), padding='same', activation='relu'))
regular_model.add(Layer.Conv2D(22, (3, 3),padding='same', activation='relu'))
regular_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

regular_model.add(Layer.Conv2D(44, (3, 3), padding='same', activation='relu'))
regular_model.add(Layer.Conv2D(45, (3, 3), padding='same', activation='relu'))
regular_model.add(Layer.Conv2D(46, (3, 3), padding='same', activation='relu'))
regular_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

regular_model.add(Layer.Conv2D(160, (3, 3), padding='same', activation='relu'))

regular_model.add(Layer.Flatten())
regular_model.add(Layer.Dense(80))
regular_model.add(Layer.Activation('relu'))
regular_model.add(Layer.Dense(10))
regular_model.add(Layer.Activation('softmax'))

# keras said that .01 was optimal but .001 seems to work better
regular_model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=.001), metrics=['accuracy'])

regular_model.build(input_shape=(1,28,28,1))


# Define the Keras TensorBoard callback.
logdir="logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)

batch_sizes = [64, 128, 256, 512, 1024]
# for i in range(len(1)):


regular_model.fit(
    x_train,
    train_labels, 
    batch_size=64,
    epochs=5,
    callbacks=[tensorboard_callback])

# print(f"batch_size: {int(batch_sizes[)}")
score = regular_model.evaluate(x_test, test_labels)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test loss: 3.7645273208618164
Test accuracy: 0.9901000261306763


<br><br>
# Report For Regular Model
<br>

At first, I manually changed my hyperparameters and then ran my models. I later discovered the Karas-Tuner and that drastically improved the ability of my hyperparameter testing. 
I used the tuner for most of the hyperparameters, but I had to manually test the batch size with a for loop.

# Summary

Number of models tested: 40
<br>

Best:<br>
- Number Filters 10th Conv. Layer: 160
- Number Neurons Hidden Layer: 80
- Learning Rate: .001
- Optimizer: Adam
- Batch Size: 64

<br>

- epochs: 5
- Test loss: 7.187195301055908
- Test accuracy: 0.9861000180244446


## Values Tuned With Karas-Tuner

- learning rate
- optimizer
- batch size
- Number of filters in 10th convolution layer
- Number of neurons in hidden layer


--- 

# Optimizing Neurons in Hidden Layer & 10th Conv. Layer (Keras)

First, I ran models with various combinations of the following values:

## Parameters Tested
- Filters in 10th Conv.: Range: [50 - 100], Step Size: 10
- Neurons in Hidden Layer.: Range: [20 - 100], Step Size: 20

## Number of Models Tested
- 20

## Top Three Combinations
1. 
- conv10_filters: 100
- hidden_neurons: 80
- Score: 0.9649999737739563
2. 
- conv10_filters: 80
- hidden_neurons: 80
- Score: 0.9643999934196472
3. 
- conv10_filters: 80
- hidden_neurons: 60
- Score: 0.9632999897003174

<br>

Next, since I found a good value for the number of neurons in the hidden layer, I began further testing the number of filters.

## Parameters Tested
- Filters in 10th Conv.: Range: [100 - 200], Step Size: 20

## Number of Models Tested
- 6
  
## Top Three Combinations
1. 
- conv10_filters: 160
- Score: 0.9704999923706055
2. 
- conv10_filters: 180
- Score: 0.9703999757766724
3. 
- conv10_filters: 100
- Score: 0.9675999879837036

Now that I've tuned the model, I can work on the hyper parameters.

--- 
# Optimizing Learning Rate & Optimizer (Keras)

First, I ran models with various combinations of the following values:

## Parameters Tested
- Learning Rate: Choices: [.01, .001, .0001]
- Optimizer: Choices: [adam, SGD, RMSprop]

## Number of Models Tested
- 9

## Top Three Combinations
1. 
- optimizer: adam
- learning_rate: 0.01
- Score: 0.9660000205039978
2. 
- optimizer: adam
- learning_rate: 0.001
- Score: 0.9620000123977661
3. 
- optimizer: adam
- learning_rate: 0.0001
- Score: 0.9588000178337097

As you can see, the obvious best optimizer is adam with a learning rate of .01.

--- 

# Optimizing Batch Size (For Loop)

I ran models with various batch size values:

## Parameters Tested
- Batch Size: Choices: [64, 128, 256, 512, 1024]

## Number of Models Tested
- 5

## Top Three Combinations
1. 
- batch_size: 512
- Score: .76
2. 
- batch_size: 64
- Score: .11
3. 
- batch_size: 128
- Score: .11

As you can see the best batch size for the model is 512.
<br>
<br>


# Inverted Model

In [4]:
inverted_model = tf.keras.models.Sequential(name='CNN_Decreasing_Filters')

inverted_model.add(Layer.Conv2D(80, (3, 3), padding='same', activation='relu')) 
inverted_model.add(Layer.Conv2D(75, (3, 3), padding='same', activation='relu'))
inverted_model.add(Layer.Conv2D(70, (3, 3),padding='same', activation='relu'))
inverted_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

inverted_model.add(Layer.Conv2D(55, (3, 3), padding='same', activation='relu'))
inverted_model.add(Layer.Conv2D(50, (3, 3), padding='same', activation='relu'))
inverted_model.add(Layer.Conv2D(45, (3, 3),padding='same', activation='relu'))
inverted_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

inverted_model.add(Layer.Conv2D(25, (3, 3), padding='same', activation='relu'))
inverted_model.add(Layer.Conv2D(20, (3, 3), padding='same', activation='relu'))
inverted_model.add(Layer.Conv2D(15, (3, 3), padding='same', activation='relu'))
inverted_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

inverted_model.add(Layer.Conv2D(10, (3, 3), padding='same', activation='relu'))

inverted_model.add(Layer.Flatten())
inverted_model.add(Layer.Dense(100))
inverted_model.add(Layer.Activation('relu'))
inverted_model.add(Layer.Dense(10))
inverted_model.add(Layer.Activation('softmax'))

inverted_model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=.001), metrics=['accuracy'])

inverted_model.build(input_shape=(1,28,28,1))


# Define the Keras TensorBoard callback.
logdir="logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)

# Train the model.
inverted_model.fit(
    x_train,
    train_labels, 
    batch_size=128,
    epochs=1,
    callbacks=[tensorboard_callback])

# Evaluate
score = inverted_model.evaluate(x_test, test_labels)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 6.480801582336426
Test accuracy: 0.9824000000953674


<br><br>
# Report For Inverted Model
<br>

# Summary

Number of models tested: 36
<br>

Best:<br>
- Learning Rate: .001
- Optimizer: Adam
- Batch Size: 128

<br>

- epochs: 5
- Test loss: 5.121163845062256
- Test accuracy: 0.9890999794006348


At first, I manually changed my hyperparameters and then ran my models. I later discovered the Karas-Tuner and that drastically improved the ability of my hyperparameter testing. 
I used the tuner for most of the hyperparameters, but I had to manually test the batch size with a for loop.

This network took far longer to train compared to the other models.

## Values Tuned With Karas-Tuner

- learning rate
- optimizer
- batch size

---

# Optimizing Learning Rate & Optimizer (Keras)

First, I ran models with various combinations of the following values:

## Parameters Tested
- Learning Rate: Choices: [.01, .001, .0001]
- Optimizer: Choices: [adam, SGD, RMSprop]

## Number of Models Tested
- 9

## Top Five Combinations
1. 
- optimizer: adam
- learning_rate: 0.01
- Score: 0.982699990272522
2. 
- optimizer: adam
- learning_rate: 0.001
- Score: 0.9800000190734863
3. 
- optimizer: adam
- learning_rate: 0.0001
- Score: 0.9796000123023987
4. 
- optimizer: RMSprop
- learning_rate: 0.01
- Score: 0.9779999852180481
5. 
- optimizer: RMSprop
- learning_rate: 0.0001
- Score: 0.9771999716758728

As you can see, the obvious best optimizer is adam with a learning rate of .01. Though the test says to use .01, for some reason, the .001 learning rate seems to perform better.

--- 

# Optimizing Batch Size (For Loop)

I ran models with various batch size values:

## Parameters Tested
- Batch Size: Choices: [128, 512, 1024]

## Number of Models Tested
- 27

## Top Three Combinations
1. 
- batch_size: 128
- Score: 0.982699990272522
2. 
- batch_size: 512
- Score: 0.9668999910354614
3. 
- batch_size: 1024
- Score: 0.9431999921798706

As you can see the best batch size for the model is 128. Interestingly, the bigger batch sizes seemed to perform better with a smaller learning rate of .0001 opposed to .01 with a smaller batch size.
<br>
<br>

# Hourglass Model

In [5]:
hourglass_model = tf.keras.models.Sequential(name='CNN_HourGlass_Filters')

hourglass_model.add(Layer.Conv2D(30, (3, 3), padding='same', activation='relu')) 
hourglass_model.add(Layer.Conv2D(31, (3, 3), padding='same', activation='relu'))
hourglass_model.add(Layer.Conv2D(32, (3, 3),padding='same', activation='relu'))
hourglass_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

hourglass_model.add(Layer.Conv2D(60, (3, 3), padding='same', activation='relu'))
hourglass_model.add(Layer.Conv2D(65, (3, 3), padding='same', activation='relu'))
hourglass_model.add(Layer.Conv2D(60, (3, 3),padding='same', activation='relu'))
hourglass_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

hourglass_model.add(Layer.Conv2D(32, (3, 3), padding='same', activation='relu'))
hourglass_model.add(Layer.Conv2D(31, (3, 3), padding='same', activation='relu'))
hourglass_model.add(Layer.Conv2D(30, (3, 3), padding='same', activation='relu'))
hourglass_model.add(Layer.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

hourglass_model.add(Layer.Conv2D(25, (3, 3), padding='same', activation='relu'))

hourglass_model.add(Layer.Flatten())
hourglass_model.add(Layer.Dense(128))
hourglass_model.add(Layer.Activation('relu'))
hourglass_model.add(Layer.Dense(10))
hourglass_model.add(Layer.Activation('softmax'))

hourglass_model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=.001), metrics=['accuracy'])

hourglass_model.build(input_shape=(1,28,28,1))

# Define the Keras TensorBoard callback.
logdir="logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)

# Train the model.
hourglass_model.fit(
    x_train,
    train_labels, 
    batch_size=512,
    epochs=1,
    callbacks=[tensorboard_callback])

# Evaluate
score = hourglass_model.evaluate(x_test, test_labels)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 17.370874404907227
Test accuracy: 0.9616000056266785


<br><br>
# Report For Hourglass Model
<br>

# Summary

Number of models tested: 36
<br>

Best:<br>
- Learning Rate: .001
- Optimizer: Adam
- Batch Size: 128

<br>

- epochs: 5
- Test loss: 6.594151496887207
- Test accuracy: 0.9869999885559082


At first, I manually changed my hyperparameters and then ran my models. I later discovered the Karas-Tuner and that drastically improved the ability of my hyperparameter testing. 
I used the tuner for most of the hyperparameters, but I had to manually test the batch size with a for loop.

This network took far longer to train compared to the other models.


## Values Tuned With Karas-Tuner

- learning rate
- optimizer
- batch size

---

# Optimizing Learning Rate & Optimizer (Keras)

First, I ran models with various combinations of the following values:

## Parameters Tested
- Learning Rate: Choices: [.01, .001, .0001]
- Optimizer: Choices: [adam, SGD, RMSprop]

## Number of Models Tested
- 9

## Top Five Combinations
1. 
- optimizer: adam
- learning_rate: 0.001
- Score: 0.9689000248908997
2. 
- optimizer: adam
- learning_rate: 0.01
- Score: 0.965399980545044
3. 
- optimizer: adam
- learning_rate: 0.0001
- Score: 0.961899995803833
4. 
- optimizer: RMSprop
- learning_rate: 0.0001
- Score: 0.9186999797821045
5. 
- optimizer: RMSprop
- learning_rate: 0.01
- Score: 0.8939999938011169

As you can see, the obvious best optimizer is adam with a learning rate of .001. This is the only model that actually had .001 as a better learning rate than .01.

---

# Optimizing Batch Size (For Loop)

I ran models with various batch size values:

## Parameters Tested
- Batch Size: Choices: [128, 512, 1024]

## Number of Models Tested
- 27

## Top Three Combinations
1. 
- batch_size: 128
- Score: 0.984499990940094
2. 
- batch_size: 512
- Score: 0.9689000248908997
3. 
- batch_size: 1024
- Score: 0.943799972534179

As you can see the best batch size for the model is 128. Interestingly, the bigger batch sizes seemed to perform better with a smaller learning rate of .0001 opposed to .01 with a smaller batch size.
<br>
<br>

<br><br>
# Final Summary

- Each model trained to a 98% validation accuracy after the 5th epoch. This suggests that for this problem, the number of filters and/or hidden neurons isn't as important when training the model.
- The main differences between the models was that some of the models trained faster than others. The hourglass model seemed to train the fastest for me, but that may also be attributed to the number of trainable parameters. 
- Despite the keras-tuner tests, the best learning rate for each model proved to be .001. 
- Each model also favored the adam optimizer and RMSprop as a close second. These two optimizers greatly outperformed the stochastic gradient descent optimizer. 
- Lastly, each model seemed to achieve a higher validation accuracy when the batch size was smaller; though, it's important to keep in mind that smaller batch sizes drastically increases training times. I think this is because smaller batches means more input sets and it gives the network more opportunities to perform backprop.
- I think the training favored the regular and hourglass models because the network doesn't perform as well with initially having larger number of filters.