# MNIST Tensorflow Quick Intro

_Copied from [TensorFlow Tutorials](https://www.tensorflow.org/tutorials/quickstart/beginner)_

Shows quick intro to TF using [MNIST dataset](http://yann.lecun.com/exdb/mnist/). First, import `tensorflow` and convert samples from integers -> FP

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Preprocessing
x_train, x_test = x_train / 255.0, x_test / 255.0

The MNIST dataset is a series of handwritten numbers for use with models to learn to classify handwritten digits.

In [None]:
# add dimension for plotting digits
x_3d = x_train[...,tf.newaxis]
im_list = []
num_samples = 16
for i in range(num_samples):
    im_list.append(x_3d[i])

print("Expected digits:")
print(y_train[0:num_samples])
print("Handwritten digits:")

fig  = plt.figure(figsize=(4., 4.))
grid = ImageGrid(fig, 111, # similar to subplot(111)
                 nrows_ncols=(4, 4), # creates 2x2 grid of axes
                 axes_pad=0.1, # pad between axes in inch
                )
for ax, im in zip(grid, im_list):
    ax.imshow(im[:,:,0], 'gray')
plt.show()

Build Keras sequential model by stacking layers. Also choose optimizer and loss function for training:

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

For each example the model returns a vector of "logits"/"log-odds" scores for each class:

In [None]:
predictions = model(x_train[:1]).numpy()
predictions

The `tf.nn.softmax` function converts logits -> "probabilities" for each class (**NOTE:** while one could have `softmax` as part of the activation function for the last layer of the network, it is discouraged as its impossible to provide an exact and numerically stable loss calculation for all models in that case):

In [None]:
tf.nn.softmax(predictions).numpy()

`losses.SparseCategoricalCrossentropy` converts a vector of logits and returns a scalar loss for each example. The loss is equal to the negative log probability of the true class (zero if the model is sure of the correct class).

Thus, the untrained model gives probabilities close to random (0.1 for each class), so initial loss should be close to $$-\log(0.1) \approx 2.3$$

In [None]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()

Compile the model and fit to minimize loss across 5 epochs:

In [None]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

The `evaluate` method checks the models performance against a validation or test set, and shows our trained accuracy on the dataset:

In [None]:
model.evaluate(x_test, y_test, verbose=2)