# Handwriting recognition

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import jax.numpy as jnp
import jax
import time

## Data import and visualization

Import the MNIST train dataset ([https://en.wikipedia.org/wiki/MNIST_database](https://en.wikipedia.org/wiki/MNIST_database))

In [None]:
# This dataset is contained in the sample data directory of Google Colab online runtimes
data = np.genfromtxt('sample_data/mnist_train_small.csv', delimiter=',')
data.shape

Store the data in a matrix and the labels in a vector.

**REMARK**: in this lab we will work with features/classes on rows and samples on columns.

In [None]:
labels = data[:,0]
x_data = data[:,1:].transpose() / 255
labels.shape, x_data.shape

Visualize the first 30 pictures with the corresponding labels

In [None]:
fig, axs = plt.subplots(ncols = 10, nrows = 3, figsize = (20,6))
axs = axs.reshape((-1,))
for i in range(30):
  image_i = x_data[:,i].reshape((28,28))
  axs[i].imshow(image_i, cmap='gray')
  axs[i].set_title(int(labels[i]))
  axs[i].axis('off')

Create a [one-hot](https://en.wikipedia.org/wiki/One-hot) representation of the labels, that is a matrix where each row corresponds to a class (i.e. a digit).
the entries of the matrix are 1 if the sample corresponds to that digit, 0 otherwise.

In [None]:
y_data = np.zeros((10, 20000))
for i in range(10):
  y_data[i, labels==i] = 1

Check that the matrix has exactly one element "1" in each column.

In [None]:
row_sums = np.sum(y_data, axis = 0)
row_sums.min(), row_sums.max()

## ANN training

Write a function to initialize the parameters (with Glorot Normal initialization) and a function implementing a feedforward ANN with tanh activation function.

To the last layer of the ANN, apply a *soft-max* layer. If $z_1, \dots, z_n$ are the activations of the last layer neurons, the soft-max layer produces $\hat{z}_1, \dots, \hat{z}_n$, defined as
$$
\hat{z}_i = \frac{e^{z_i}}{\sum_{j=1}^n e^{z_j}}
$$
In this manner the outputs of the ANN satisfy by construction:
- $\hat{z}_i \in [0,1]$
- $\sum_{j=1}^n \hat{z}_j = 1$

Therefore, they can be intepreted as probabilities.

When the ANN will be trained, we will take the digit corresponding the the hightest proabability as prediction of the model.

Test the ANN and check that the above properties are satisfied.

Implement the following metrics:
- mean square error
- cross entropy
- accuracy (fraction of samples correctly classified)

In [None]:
def MSE(x, y, params):
  ...

def cross_entropy(x, y, params):
  ...

def accuracy(x, y, params):
  ...

print('MSE:       %f' % MSE(x_data, y_data, params))
print('X entropy: %f' % cross_entropy(x_data, y_data, params))
print('accuracy:  %f' % accuracy(x_data, y_data, params))

Put 10000 images in the training set and 1000 images in the validation set.

Run this cell. We will use it later.

In [None]:
from IPython import display

class Callback:
  def __init__(self, refresh_rate = 250):
    self.refresh_rate = refresh_rate
    self.fig, self.axs = plt.subplots(1, figsize=(16,8))
    self.epoch = 0
    self.__call__(-1)

  def __call__(self, epoch):
    self.epoch = epoch
    if (epoch + 1) % self.refresh_rate == 0:
      self.draw()
      display.clear_output(wait=True)
      display.display(plt.gcf())
      time.sleep(1e-16)

  def draw(self):
    if self.epoch > 0:
      self.axs.clear()
      epochs = np.arange(1,len(history_train_Xen) + 1)
      self.axs.loglog(epochs, history_train_Xen, label = 'train_Xen')
      self.axs.loglog(epochs, history_valid_Xen, label = 'valid_Xen')
      self.axs.loglog(epochs, history_valid_MSE, label = 'valid_MSE')
      self.axs.loglog(epochs, history_valid_acc, label = 'valid_acc')

      self.axs.legend()
      self.axs.set_title('epoch %d - accuracy %0.1f%%' % (self.epoch + 1, 100*history_valid_acc[-1]))

Train an ANN-based classifier with two hidden layers with 50 neurons each.
Use 500 epochs of the RMSProp algorithm, with decay rate 0.9 and $\delta = 10^{-7}$ and fixed learning rate $\lambda = 0.002$. Use minibatches with batch size of 1000.

Use the cross-entropy loss to drive the training.
To monitor training, store every 10 training epochs the following metrics in the following lists:
- `history_train_Xen`: cross-entropy (training set)
- `history_valid_Xen`: cross-entropy (validation set)
- `history_valid_MSE`: MSE (validation set)
- `history_valid_acc`: accuracy (validation set)

In [None]:
# Hyperparameters
layers_size = [784, 50, 50, 10]
# Training options
num_epochs = 500
batch_size = 1000
learning_rate = 2e-3
decay_rate = .9
delta = 1e-7

history_train_Xen = list()
history_valid_Xen = list()
history_valid_MSE = list()
history_valid_acc = list()

## Testing

Load the dataset `sample_data/mnist_test.csv` and compute the accuracy of the classifier on this dataset.

In [None]:
data_test = np.genfromtxt('sample_data/mnist_test.csv', delimiter=',')
data_test.shape
labels_test = data_test[:,0]
x_test = data_test[:,1:].transpose() / 255
y_test = np.zeros((10, x_test.shape[1]))
for i in range(10):
  y_test[i, labels_test==i] = 1
x_test.shape, y_test.shape

Use the following script to visualize the predictions on a bunch of test images.

In [None]:
offset = 0
n_images = 40

images_per_row = 10
y_predicted = ANN(x_test[:,offset:offset+n_images], params)

def draw_bars(ax, y_predicted, label):
    myplot = ax.bar(range(10), (y_predicted))
    ax.set_ylim([0,1])
    ax.set_xticks(range(10))

    label_predicted = np.argmax(y_predicted)
    if label == label_predicted:
      color = 'green'
    else:
      color = 'red'
    myplot[label_predicted].set_color(color)

import math
n_rows = 2 * math.ceil(n_images / images_per_row)
_, axs = plt.subplots(n_rows, images_per_row, figsize = (3*images_per_row, 3*n_rows))
row = 0
col = 0
for i in range(n_images):
  axs[2*row,col].imshow(x_test[:,offset+i].reshape((28,28)), cmap='gray')
  axs[2*row,col].set_title(int(labels_test[offset+i]))
  axs[2*row,col].axis('off')

  draw_bars(axs[2*row+1,col], y_predicted[:,i], labels_test[offset+i])

  col += 1
  if col == images_per_row:
    col = 0
    row += 1


# Adversarial attacks

You have trained your classifier. Cool, isn't it? Let us now try to fool it.

Consider the last image of the training set. Visualize it and visualize the associated predictions of the classifier.

In [None]:
x = x_data[:,-1][:,None]
y = y_data[:,-1][:,None]
label = np.argmax(y)

_, axs = plt.subplots(1,2, figsize = (8,4))
axs[0].imshow(x.reshape((28,28)), cmap = 'gray')
axs[0].axis('off')

y_pred = ANN(x, params)

draw_bars(axs[1], y_pred[:,0], label)

An adversarial attack consists of an (almost imperceptible) modification of the image, aimed at fooling the classifier into making a mistake.
See e.g. [this article](https://www.wired.com/story/tesla-speed-up-adversarial-example-mgm-breach-ransomware/)

To hack the classifier, compute the gradient of cross entropy loss funcion with respect to the input (not to the parameters!). Then, superimpose a multiple of the gradient to the original image.

Visualize the original and the hacked images and the corresponding prediction of the classifier.

In [None]:
gradient = ...
x_updated = ...
y_updated = ...

_, axs = plt.subplots(1,5,figsize=(20,4))
axs[0].imshow(x.reshape((28,28)), cmap = 'gray')
axs[0].set_title('original picture')
draw_bars(axs[1], y_pred[:,0], label)
axs[2].imshow(gradient.reshape((28,28)), cmap = 'gray')
axs[2].set_title('gradient')
axs[3].imshow(x_updated.reshape((28,28)), cmap = 'gray')
axs[3].set_title('hacked picture')
draw_bars(axs[4], y_updated[:,0], label)