<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Let's-explore-how-different-optimizers-effect-performance" data-toc-modified-id="Let's-explore-how-different-optimizers-effect-performance-1">Let's explore how different optimizers effect performance</a></span></li><li><span><a href="#Prepare-data" data-toc-modified-id="Prepare-data-2">Prepare data</a></span></li><li><span><a href="#Define-architecture" data-toc-modified-id="Define-architecture-3">Define architecture</a></span></li><li><span><a href="#Stochastic-gradient-descent-(SGD)-in-Keras" data-toc-modified-id="Stochastic-gradient-descent-(SGD)-in-Keras-4">Stochastic gradient descent (SGD) in Keras</a></span></li><li><span><a href="#Train-model" data-toc-modified-id="Train-model-5">Train model</a></span></li><li><span><a href="#Reflection-questions-to-answer" data-toc-modified-id="Reflection-questions-to-answer-6">Reflection questions to answer</a></span></li><li><span><a href="#Sources-of-Inspiration" data-toc-modified-id="Sources-of-Inspiration-7">Sources of Inspiration</a></span></li></ul></div>

<center><h2>Let's explore how different optimizers effect performance</h2></center>

![](http://myselph.de/mnistExamples.png)

In [1]:
reset -fs

In [2]:
from tensorflow import keras 

----
Prepare data
----

In [3]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
image_size = 784 # 28 x 28 pixels

x_train = x_train.reshape(x_train.shape[0], image_size) # Transform from matrix to vector
x_train = x_train.astype('float32')
x_train /= 255 # Normalize inputs from 0-255 to 0.0-1.0

x_test = x_test.reshape(x_test.shape[0], image_size) # Transform from matrix to vector
x_test = x_test.astype('float32')
x_test /= 255 # Normalize inputs from 0-255 to 0.0-1.0

print(f'Number of train examples: {x_train.shape[0]:,}')
print(f'Number of test examples:  {x_test.shape[0]:,}')

# Convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Number of train examples: 60,000
Number of test examples:  10,000


----
Define architecture
-----

![](images/MNIST_neuralnet_image.png)

In [4]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense, Dropout

In [5]:
# Define straightfoward MLP model (Do not change)
model = Sequential()
model.add(Dense(128, input_dim=image_size))
model.add(Activation('relu'))
model.add(Dropout(0.15))
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.15))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

In [6]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
activation (Activation)      (None, 128)               0         
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_1 (Activation)    (None, 128)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1

In [7]:
model_1 = model
model_2 = model
model_3 = model

Stochastic gradient descent (SGD) in Keras
------

Includes support for momentum and learning rate decay

```
Arguments

lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
```

In [8]:
from tensorflow.keras.optimizers import SGD

In [34]:
# TODO: Tune SGD to improve fit
# How fast to get to pretty good asymptotic peformance?
optimizer = SGD(
                learning_rate=0.04, # How about double?
                momentum=0.9,       # Try 0.9
                decay=1e-6,          # Try 1e-6
               )

# Test loss: 0.12
#Test accuracy: 96.29%

In [None]:
# # TODO: After you experiment with SGD, try other optimitzer and see what happens

In [41]:
from tensorflow.keras.optimizers import RMSprop, Adam

In [None]:
# # Read the docs
# RMSprop?
# Adam?

In [110]:
optimizer = RMSprop(
    learning_rate=0.001,
    rho=0.9,
    momentum=0.0,
    epsilon=1e-07,
    centered=False,
    name="RMSprop",
)

#Test loss: 0.14
#Test accuracy: 97.47%


optimizer = Adam(
    learning_rate=0.001,
    beta_1=0.99,
    beta_2=0.9999,
    epsilon=1e-09,
    amsgrad=False,
    name="Adam",
)

# Test loss: 0.1
#Test accuracy: 97.64%

In [111]:
model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

----
Train model
-----

In [112]:
batch_size = 32 # Number of samples before a backprop pass
epochs = 1     # Number of passes over complete dataset

# General guidelines:
# Try 1 epoch to make sure model works
# Try 5 epochs to make sure your model is learning
# Then try a bunch, like 10 or more for this dataset 

In [113]:
history = model.fit(x_train, 
                    y_train,
                    batch_size=batch_size, 
                    epochs=epochs,
                    verbose=True)



__Now we wait...__

![](images/waiting.png)

In [114]:
loss, accuracy = model.evaluate(x_test, 
                               y_test, 
                               verbose=False)

In [115]:
print(f"Test loss: {loss:.2}")
print(f"Test accuracy: {accuracy:.2%}")

Test loss: 0.11
Test accuracy: 97.59%


<center><h2>Reflection questions to answer</h2></center>

- Which optimizer and optimizer hyperparameters appear to be the best?
- Does the Keras API make sense to you?

Adam was the best optimizer. I understand the learning rate, momentum make sense to me but I'm not sure about epsilon or centered.

<center><h2>Sources of Inspiration</h2></center>

- https://www.kaggle.com/fchollet/simple-deep-mlp-with-keras
- https://keras.io/api/optimizers/rmsprop/
- https://keras.io/api/optimizers/adam/
- https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1)



<br>
<br> 
<br>

----