# Setup

In this section, we will make sure the hardware and software environment has been configured properly.

In [1]:
!nvidia-smi

Mon Dec 30 09:37:26 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

As of December 2019, Colab defaults to TensorFlow 1.15. We will install TensorFlow 2.0 (or newer).

In [0]:
!pip install tensorflow>=2.0 -Uq

In [3]:
import matplotlib.pyplot as plt
import tensorflow.compat.v2 as tf
from sklearn.manifold import TSNE
from sklearn.utils import shuffle
from sklearn.decomposition import PCA
import plotly.graph_objects as go

print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.0.0


# Classifying Handwritten Digits (MNIST)

In this introductory lab, we will be using `tf.keras` to create a simple model to classify handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). This is a very classic problem, and is often used as the "Hello World" tutorial in deep learning. It consists of 60,000 handwritten digits, and the task is to recognise these digits as 0 to 9.

## Loading the MNIST dataset

Luckily for us, the mechanism to load MNIST is already built into `tf.keras`, which makes life really easy for us. For most real world datasets, we will not be so lucky. However, for certain widely use research datasets, you should be able to find them in `tf.keras.datasets` or in [TensorFlow Datasets](https://www.tensorflow.org/datasets). 

In [4]:
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# dataset parameters
num_classes = 10
width, height = 28, 28

# reshape to [number, width, height, channels]

x_train = x_train.reshape(x_train.shape[0], width, height, 1)
x_test = x_test.reshape(x_test.shape[0], width, height, 1)
input_shape = (width, height, 1)

x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 255
x_test /= 255

print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# convert class label to one-hot (categorical) class vectors

y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


## Building a Model

We will now be building a simple model in `tf.keras` using the [Sequential API](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/Sequential).

There is also the [Functional API](https://www.tensorflow.org/beta/guide/keras/functional) or the [Model Subclassing](https://keras.io/models/about-keras-models/#model-subclassing) design pattern which gives you more flexibility. 

In [5]:
l_input = tf.keras.layers.Input(shape=(width, height, 1))
l_flat = tf.keras.layers.Flatten()(l_input)
l_dense = tf.keras.layers.Dense(10, activation='tanh', kernel_regularizer=tf.keras.regularizers.l1(0.001))(l_flat)
preds = tf.keras.layers.Dense(10, activation='softmax')(l_dense)

model = tf.keras.models.Model(inputs=l_input, outputs=preds)

model.summary()

model.compile(optimizer="adam",
              loss="categorical_crossentropy",
              metrics=["accuracy"])

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                7850      
_________________________________________________________________
dense_1 (Dense)              (None, 10)                110       
Total params: 7,960
Trainable params: 7,960
Non-trainable params: 0
_________________________________________________________________


We'll also create a another model (using the exact same layers!) which effectively is a truncated version of the original model. This allows us to get the intermediate activations of the model.

In [6]:
model_int = tf.keras.models.Model(inputs=l_input, outputs=l_dense)
model_int.summary()
model_int.compile(optimizer="adam",
                  loss="categorical_crossentropy")

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                7850      
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________


## Training a Model

In [7]:
model.fit(x_train, y_train,
          batch_size=64,
          epochs=5)

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f33b3a9ec50>

## Visualizing Learnt Representations

In [0]:
activations = model_int.predict(x_test, batch_size=512)

### PCA

In [9]:
pca = PCA(n_components=3)
transformed_values = pca.fit_transform(activations)

color_list = [
    '#1f77b4',  # muted blue
    '#ff7f0e',  # safety orange
    '#2ca02c',  # cooked asparagus green
    '#d62728',  # brick red
    '#9467bd',  # muted purple
    '#8c564b',  # chestnut brown
    '#e377c2',  # raspberry yogurt pink
    '#7f7f7f',  # middle gray
    '#bcbd22',  # curry yellow-green
    '#17becf'   # blue-teal
]

(_, _), (_, y_labels) = tf.keras.datasets.mnist.load_data()

colors = [color_list[i] for i in y_labels]

fig = go.Figure(data=[
    go.Scatter3d(x=transformed_values[:,0],
    y=transformed_values[:,1],
    z=transformed_values[:,2],
    mode="markers",
    text=y_labels,
    marker=dict(
        size=5,
        color=colors,
        opacity=0.2
    ))
])

fig.update_layout(
    title="3D Plot of Activations after PCA",
)

fig.show()

### TSNE

In [10]:
tsne = TSNE(n_components=3, n_iter=300, n_iter_without_progress=100)
print("Running TSNE (slow) ...")
transformed_values = tsne.fit_transform(activations[:2000])
print("Done!")

fig = go.Figure(data=[
    go.Scatter3d(x=transformed_values[:,0],
    y=transformed_values[:,1],
    z=transformed_values[:,2],
    mode="markers",
    text=y_labels,
    marker=dict(
        size=5,
        color=colors,
        opacity=0.5
    ))
])

fig.update_layout(
    title="3D Plot of Activations after TSNE",
)

fig.show()

Running TSNE (slow) ...
Done!
