# MNIST: Subclassing and GradientTape edition

An example showing how to use Keras [Subclassing](https://www.tensorflow.org/guide/keras) in TensorFlow 2.0. We'll also use a [GradientTape](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape) to write our training loop. 

You can find more details about this style, and how it compares to the previous one, in this [article](https://medium.com/tensorflow/what-are-symbolic-and-imperative-apis-in-tensorflow-2-0-dfccecb01021).

### Install the nightly build

This is early stage, a few bugs and rough edges are to be expeced.

In [13]:
!pip install tf-nightly-2.0-preview



In [0]:
import tensorflow as tf

In [15]:
print("You have version", tf.__version__)
assert tf.__version__ >= "2.0" # TensorFlow ≥ 2.0 required

You have version 2.0.0-dev20190130


In [0]:
import numpy as np

from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.nn import relu

### Parameters

In [0]:
epochs = 10
batch_size = 128

### Load the dataset

In [0]:
mnist = tf.keras.datasets.mnist

# Dataset will be cached locally after it's downloaded
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixel values to [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

# These types are required for the operation we use later to compute loss.
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)

### Batch and shuffle the data

Next, we'll use `tf.data` to batch up our dataset. 

* This is a bit of a low-level approach (batching is handled automatically by `model.fit`, which we're not using here). The TensorFlow team is working on a helpful [library](https://github.com/tensorflow/datasets) of built-in datasets, as well, to make this easier. It's really nice, but doesn't support 2.0 quite yet.

Then, we'll shuffle it.

* In the code below, you'll notice we have a `buffer_size` parameter. What is this, and why is it necessary? Datasets are streams. These can be potentially infinite (e.g., if you're reading images from a directory, and performing data augmentation). Since we can't shuffle a stream, we maintain a buffer in memory of `shuffle_size` elements, and randomize that. Since MNIST is tiny, we'll just keep a buffer of the entire dataset.

In [0]:
shuffle_buffer = len(x_train)
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(shuffle_buffer)
train_dataset = train_dataset.batch(batch_size)

### Define a model

Using this style feels like Object-Oriented Python + NumPy development. Initialize your layers in the constructor, then write your forward pass in the call method.

In [0]:
class MyModel(Model):
  def __init__(self):
    super(MyModel, self).__init__()
    self.flatten = Flatten()
    self.d1 = Dense(128)
    self.d2 = Dense(10)

  def call(self, x):
    # Unroll the images into arrays
    x = self.flatten(x)
    x = self.d1(x)
    x = relu(x)
    x = self.d2(x)
    return x 

### Calculate loss


In [0]:
def loss(logits, labels):
  return tf.reduce_mean(
      tf.nn.sparse_softmax_cross_entropy_with_logits(
          logits=logits, labels=labels))

### Training loop

Here, we've used a [GradientTape](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/GradientTape), rather than the built-in `model.fit` method, to train our model. 

In [0]:
def train_on_batch(model, images, labels):
  with tf.GradientTape() as tape:
    logits = model(images)
    loss_value = loss(logits, labels)
  grads = tape.gradient(loss_value, model.variables)
  optimizer.apply_gradients(zip(grads, model.variables))
  return loss_value

### A method to calculate accuracy

This method takes the logits (the output of our model) and the labels, and returns an accuracy score. There are helper methods for this in the codebase, but I figured it'd be useful to show you how to do it from scratch. In your head, you can replace `tf.*` with `np.*` to understand what it's doing.

In [0]:
def calc_accuracy(logits, labels):
  predictions = tf.argmax(logits, axis=1)
  batch_size = int(logits.shape[0])
  acc = tf.reduce_sum(
      tf.cast(tf.equal(predictions, labels), dtype=tf.float32)) / batch_size
  return acc * 100

### Train the model

In this section we will:
* Initialize our model and optimizer. Note that both have internal state, so if you'd like to restart training from scratch, you should intialize both again.

* Iterate over the dataset, grabbing batches of images and labels.

* Call the model on each batch (our forward pass).

* Use the training loop above to calculate loss, gradients, and update the weights (backward pass).

After each epoch, we will print out the accuracy on the train and test sets. As discussed in class, obviously do this with validation data, not test.

In [24]:
model = MyModel()

optimizer = tf.keras.optimizers.Adam()

for epoch in range(epochs):
  print('Epoch', epoch + 1)
  for (batch, (images, labels)) in enumerate(train_dataset):
    loss_value = train_on_batch(model, images, labels)
    step = optimizer.iterations.numpy() 
    if step % 100 == 0:
      print('Step %d\tLoss: %.4f' % (step, loss_value))
  
  print('Train accuracy %.2f' % calc_accuracy(model(x_train), y_train))
  print('Test accuracy %.2f\n' % calc_accuracy(model(x_test), y_test))

Epoch 1
Step 100	Loss: 0.3558
Step 200	Loss: 0.2786
Step 300	Loss: 0.1871
Step 400	Loss: 0.1419
Train accuracy 94.53
Test accuracy 94.39

Epoch 2
Step 500	Loss: 0.1715
Step 600	Loss: 0.1882
Step 700	Loss: 0.1802
Step 800	Loss: 0.1690
Step 900	Loss: 0.0944
Train accuracy 96.26
Test accuracy 95.82

Epoch 3
Step 1000	Loss: 0.0755
Step 1100	Loss: 0.0961
Step 1200	Loss: 0.0965
Step 1300	Loss: 0.0736
Step 1400	Loss: 0.1139
Train accuracy 97.30
Test accuracy 96.49

Epoch 4
Step 1500	Loss: 0.1151
Step 1600	Loss: 0.1948
Step 1700	Loss: 0.0993
Step 1800	Loss: 0.0791
Train accuracy 97.91
Test accuracy 96.92

Epoch 5
Step 1900	Loss: 0.0682
Step 2000	Loss: 0.1072
Step 2100	Loss: 0.1451
Step 2200	Loss: 0.0727
Step 2300	Loss: 0.0696
Train accuracy 98.29
Test accuracy 97.13

Epoch 6
Step 2400	Loss: 0.0415
Step 2500	Loss: 0.0399
Step 2600	Loss: 0.0572
Step 2700	Loss: 0.0474
Step 2800	Loss: 0.0438
Train accuracy 98.59
Test accuracy 97.26

Epoch 7
Step 2900	Loss: 0.0640
Step 3000	Loss: 0.0394
Step 3100	L

Okay! As a next step, you can play with the model to see if you can increase accuracy. There's a lot we can do to optimize runtime performance as well (for example, by using [tf.function](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/function) to compile Python code), but we can get into that down the road.