# MNIST Tensorflow Quick Intro

_Copied from [TensorFlow Tutorials](https://www.tensorflow.org/tutorials/quickstart/beginner)_

Shows quick intro to TF using [MNIST dataset](http://yann.lecun.com/exdb/mnist/). First, import `tensorflow` and convert samples from integers -> FP

In [23]:
import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Build Keras sequential model by stacking layers. Also choose optimizer and loss function for training:

In [24]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10)
])

For each example the model returns a vector of "logits"/"log-odds" scores for each class:

In [25]:
predictions = model(x_train[:1]).numpy()
predictions

array([[-0.31826448, -0.77816063, -0.1293759 ,  0.20483407, -0.18944708,
         0.02010749,  0.10319845,  0.42724004,  0.21663046, -0.17527917]],
      dtype=float32)

The `tf.nn.softmax` function converts logits -> "probabilities" for each class (**NOTE:** while one could have `softmax` as part of the activation function for the last layer of the network, it is discouraged as its impossible to provide an exact and numerically stable loss calculation for all models in that case):

In [26]:
tf.nn.softmax(predictions).numpy()

array([[0.07374999, 0.04656199, 0.08908308, 0.12443449, 0.08388931,
        0.10344631, 0.11240897, 0.15542842, 0.12591106, 0.08508631]],
      dtype=float32)

`losses.SparseCategoricalCrossentropy` converts a vector of logits and returns a scalar loss for each example. The loss is equal to the negative log probability of the true class (zero if the model is sure of the correct class).

Thus, the untrained model gives probabilities close to random (0.1 for each class), so initial loss should be close to $$-\log(0.1) \approx 2.3$$

In [27]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()

2.2687025

Compile the model and fit to minimize loss across 5 epochs:

In [28]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x13f270eb0>

The `evaluate` method checks the models performance against a validation or test set, and shows our trained accuracy on the dataset:

In [29]:
model.evaluate(x_test, y_test, verbose=2)

313/313 - 0s - loss: 0.0733 - accuracy: 0.9777


[0.07325803488492966, 0.9776999950408936]