<table align="center">
   <td align="center"><a target="_blank" href="https://colab.research.google.com/github/umbcdata602/fall2020/blob/master/lab_mnist_keras.ipynb">
<img src="https://www.tensorflow.org/images/colab_logo_32px.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
</table>

# Lab: MNIST keras

Classification of MNIST digits with Keras

### References

* Raschka's [ch13_part1.ipynb](https://github.com/rasbt/python-machine-learning-book-3rd-edition/blob/master/ch13/ch13_part1.ipynb) -- github
    * loads mnist (fashion and digits) as tensorflow datasets
* [Raschka ch13_part2.ipynb](https://github.com/rasbt/python-machine-learning-book-3rd-edition/tree/master/ch13) -- github
    * multilayer NN to classify the Iris (tensorflow) dataset
* Tensorflow example: [Training a neural network on MNIST with Keras](https://www.tensorflow.org/datasets/keras_example) - tensorflow.org
    * uses tensorflow datasets API to classify MNIST with 1 hidden layer
* Tensorflow example: [Basic Classification: Classifying images of clothing](https://www.tensorflow.org/tutorials/keras/classification) -- tensorflow.org
    * fashion mnist and mnist digits as tensorflow datasets
    * includes plots of model predictions (logits)
    * shows logits for correctly and incorrectly classified images

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
import matplotlib.pyplot as plt

# Load MNIST


In [None]:
# Ref: https://www.tensorflow.org/datasets/keras_example
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True, # returns tuple instead of dict
    with_info=True,
)

In [None]:
# Ref: https://www.tensorflow.org/datasets/keras_example
def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(128)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

# Build and train the model

* No activation when `from_logits = True`
* Default `batch_size = 32`
* References
    * [tf.keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) API docs
        * Default: `activation=None`
    * [tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) API docs
        * Default: `batch_size=32`
    * [tf.keras.losses.SparseCategoricalCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy) API docs
        * Default: `y_pred` encodes a probability distribution (`from_logits=False`)
        * Note: Using `from_logits=True` may be more numerically stable
        * This loss function expects labels to be integers
        * With one-hot-encoding, use [CategoricalCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy) instead

In [None]:
# Build the model
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28)),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(10)                             
])

# Compile the model
model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

In [None]:
# Each epoch takes about 1 second with GPU (2 seconds with CPU)
history = model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


# Softmax activation

* What does the model actually predict?
* By default, loss function uses `from_logits=False`, i.e., expect model to produce probabilities
    * [tf.keras.losses.sparse_categorical_crossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/sparse_categorical_crossentropy) API docs -- tensorflow.org
* By default, layers use `activation=None`
    * [tf.keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) API docs -- tensorflow.org
* For multinomial classification
    * Loss function: categorical cross entropy
    * Use `activation="softmax"` to get probabilities
* Tensorflow example: [Basic Classification: Classifying images of clothing](https://www.tensorflow.org/tutorials/keras/classification) -- tensorflow.org
* Trainable parameters
    * Review: Perceptron & Adaline
    * Figure reference: Chapter 2 of Raschka

<img src="https://github.com/rasbt/python-machine-learning-book-3rd-edition/raw/master/ch02/images/02_09.png" width="600"/>

# Trainable parameters

With no hidden layers...
* Flatten has no trainable parameters
* Dense(10): 
    * Each unit has 794 weights (1 per pixel)
    * Each unit has 1 bias
    * 10 * (784 + 1) = 7850

In [None]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                7850      
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________
