<table align="center">
  <td align="center"><a target="_blank" href="http://introtodeeplearning.com">
        <img src="http://introtodeeplearning.com/images/colab/mit.png" style="padding-bottom:5px;" />
      Visit MIT Deep Learning</a></td>
  <td align="center"><a target="_blank" href="https://colab.research.google.com/github/aamini/introtodeeplearning/blob/master/lab2/Part1_MNIST.ipynb">
        <img src="http://introtodeeplearning.com/images/colab/colab.png?v2.0"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
  <td align="center"><a target="_blank" href="https://github.com/aamini/introtodeeplearning/blob/master/lab2/Part1_MNIST.ipynb">
        <img src="http://introtodeeplearning.com/images/colab/github.png"  height="70px" style="padding-bottom:5px;"  />View Source on GitHub</a></td>
</table>

# Copyright Information

In [1]:
# Copyright 2020 MIT 6.S191 Introduction to Deep Learning. All Rights Reserved.
# 
# Licensed under the MIT License. You may not use this file except in compliance
# with the License. Use and/or modification of this code outside of 6.S191 must
# reference:
#
# © MIT 6.S191: Introduction to Deep Learning
# http://introtodeeplearning.com
#

In [2]:
# Import Tensorflow 2.0
# %tensorflow_version 2.x
import tensorflow as tf 
physical_devices = tf.config.experimental.list_physical_devices('GPU') 
try: 
  tf.config.experimental.set_memory_growth(physical_devices[0], True) 
except: 
  # Invalid device or cannot modify virtual devices once initialized. 
  pass 

# !pip install mitdeeplearning
import mitdeeplearning as mdl

import matplotlib.pyplot as plt
import numpy as np
import random
from tqdm import tqdm

# Check that we are using a GPU, if not switch runtimes
#   using Runtime > Change Runtime Type > GPU
# assert len(tf.config.list_physical_devices('GPU')) > 0

## 1.1 MNIST dataset 

Let's download and load the dataset and display a few random samples from it:

In [3]:
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = (np.expand_dims(train_images, axis=-1)/255.).astype(np.float32)
train_labels = (train_labels).astype(np.int64)
test_images = (np.expand_dims(test_images, axis=-1)/255.).astype(np.float32)
test_labels = (test_labels).astype(np.int64)

Our training set is made up of 28x28 grayscale images of handwritten digits. 

Let's visualize what some of these images and their corresponding training labels look like.

In [4]:
BATCH_SIZE = 64
EPOCHS = 5

In [5]:
def build_cnn_model():
    cnn_model = tf.keras.Sequential([

        # TODO: Define the first convolutional layer
        tf.keras.layers.Conv2D(24, (3,3), activation=tf.nn.relu), 

        # TODO: Define the first max pooling layer
        tf.keras.layers.MaxPool2D((2,2)),

        # TODO: Define the second convolutional layer
        tf.keras.layers.Conv2D(36, (3,3), activation=tf.nn.relu),

        # TODO: Define the second max pooling layer
        tf.keras.layers.MaxPool2D((2,2)),

        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation=tf.nn.relu),

        # TODO: Define the last Dense layer to output the classification 
        # probabilities. Pay attention to the activation needed a probability
        # output
        tf.keras.layers.Dense(10, activation=tf.nn.softmax)
#         '''TODO: Dense layer to output classification probabilities'''
    ])
    
    return cnn_model
  
cnn_model = build_cnn_model()
# Initialize the model by passing some data through
cnn_model.predict(train_images[[0]])
# Print the summary of the layers in the model.
print(cnn_model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              multiple                  240       
_________________________________________________________________
max_pooling2d (MaxPooling2D) multiple                  0         
_________________________________________________________________
conv2d_1 (Conv2D)            multiple                  7812      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 multiple                  0         
_________________________________________________________________
flatten (Flatten)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  115328    
_________________________________________________________________
dense_1 (Dense)              multiple                  1

## 1.4 Training the model 2.0

Earlier in the lab, we used the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) function call to train the model. This function is quite high-level and intuitive, which is really useful for simpler models. As you may be able to tell, this function abstracts away many details in the training call, and we have less control over training model, which could be useful in other contexts. 

As an alternative to this, we can use the [`tf.GradientTape`](https://www.tensorflow.org/api_docs/python/tf/GradientTape) class to record differentiation operations during training, and then call the [`tf.GradientTape.gradient`](https://www.tensorflow.org/api_docs/python/tf/GradientTape#gradient) function to actually compute the gradients. You may recall seeing this in Lab 1 Part 1, but let's take another look at this here.

We'll use this framework to train our `cnn_model` using stochastic gradient descent.

In [6]:
# tf.keras.backend.clear_session()

In [9]:
# Rebuild the CNN model
from IPython.display import clear_output
cnn_model = build_cnn_model()
epochs = 10
batch_size = 64
loss_history = mdl.util.LossHistory(smoothing_factor=0.95) # to record the evolution of the loss
plotter = mdl.util.PeriodicPlotter(sec=2, xlabel='Iterations', ylabel='Loss', scale='semilogy')
lr = 1e-3
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3) # define our optimizer

from sklearn.metrics import accuracy_score

# @tf.function
def grad_stats(grads):
    grad_sum = np.array([(grad.numpy()).sum() for grad in grads])
    zero_grad_count = (grad_sum==0).sum()
    if zero_grad_count>0:
        print(f"'{zero_grad_count}' out of '{grad_sum.shape[0]}' grads are zero")

import time
if hasattr(tqdm, '_instances'): tqdm._instances.clear() # clear if it exists
step_counter = 1
epoch_time = 0
for e in range(epochs):
    epoch_start_time = time.time()
    for i, idx in enumerate(range(0, train_images.shape[0], batch_size)):
        step_counter+=1
        (images, labels) = train_images[idx:idx+batch_size], train_labels[idx:idx+batch_size]
        images = tf.convert_to_tensor(images)

        with tf.GradientTape() as tape:
            logits = cnn_model(images)
            loss_value = tf.keras.backend.sparse_categorical_crossentropy(labels, logits)

        grads = tape.gradient(loss_value, cnn_model.trainable_weights)
        
        for i in range(len(cnn_model.trainable_weights)): 
            cnn_model.trainable_weights[i].assign(cnn_model.trainable_weights[i] - (lr*grads[i]))        
#         optimizer.apply_gradients(zip(grads, cnn_model.trainable_weights))

        prediction_probs = logits
        predictions = np.argmax(logits, axis=1)
        acc_score = accuracy_score(labels, predictions)

        val_probs = cnn_model(test_images[0:100])
        val_predictions = np.argmax(val_probs, axis=1)
        val_accuracy = accuracy_score(test_labels[0:100], val_predictions)
#         loss_numpy = np.mean(loss_value.numpy())
        if step_counter%100==0:
            print("epoch: {}, step: {}, training_acc: {}, val_acc: {}".format(e+1, step_counter, acc_score, val_accuracy))
            print("\t", end="")
            grad_stats(grads)
#         break

    epoch_time = time.time()-epoch_start_time

epoch: 1, step: 100, training_acc: 0.8125, val_acc: 0.82
	'1' out of '8' grads are zero
epoch: 1, step: 200, training_acc: 0.84375, val_acc: 0.95
	epoch: 1, step: 300, training_acc: 0.953125, val_acc: 0.96
	epoch: 1, step: 400, training_acc: 0.9375, val_acc: 0.97
	epoch: 1, step: 500, training_acc: 0.984375, val_acc: 0.98
	epoch: 1, step: 600, training_acc: 1.0, val_acc: 0.98
	epoch: 1, step: 700, training_acc: 1.0, val_acc: 1.0
	epoch: 1, step: 800, training_acc: 1.0, val_acc: 0.98
	epoch: 1, step: 900, training_acc: 0.953125, val_acc: 1.0
	epoch: 2, step: 1000, training_acc: 0.984375, val_acc: 1.0
	epoch: 2, step: 1100, training_acc: 0.96875, val_acc: 0.97
	epoch: 2, step: 1200, training_acc: 0.96875, val_acc: 1.0
	epoch: 2, step: 1300, training_acc: 0.96875, val_acc: 1.0
	'1' out of '8' grads are zero


KeyboardInterrupt: 

In [None]:
def display_predictions(test_images, cnn_model, n_images = 20):
    start_point = np.random.randint(test_images.shape[0]-n_images)
    my_test_images = test_images[start_point:start_point+n_images]
    predictions = np.argmax(cnn_model(my_test_images), axis=1)
    
    plt.figure(figsize=(15,3))
    for i in range(my_test_images.shape[0]):
        plt.subplot(1, n_images, i+1)
        plt.title(predictions[i])
        plt.imshow(my_test_images[i].reshape(28,28))
        plt.axis("off")
    plt.show()
    
display_predictions(test_images, cnn_model, n_images=10)