# Classifying digits - a quick start with Keras in Jupyter

Jupyter notebooks provide a convenient way to quickly explore data, algorithms and visualizations. We shall use them as hands-on tutorials to Keras, and start right-away with a classic application of neural networks: **classification of hand-written digits**.

## The dataset

We shall use the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) which is a real classic and can easily be accessed using the deep learning library [Keras](https://keras.io) or the machine learning library [scikit-learn](https://scikit-learn.org).

In [None]:
from keras.datasets import mnist

In [None]:
help(mnist)

Let us load the dataset:

In [None]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

We obtain four *tensors*, that is, arrays of numbers:

In [None]:
type(x_train), type(y_train)

[NumPy](https://numpy.org) is the go-to library in Python for manipulating multi-dimensional arrays. For an overview on the class [`numpy.ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html), you can type `help(type(x_train))` or, much better, have a look at the [NumPy documentation](https://numpy.org/doc).

For example, we can find out the shape of our arrays as follows:

In [None]:
x_train.shape, y_train.shape, x_test.shape, y_test.shape

The array `x_train` consists of 60000 samples, where each is a grayscale image of size 28x28. Which digit is the first sample?

In [None]:
x_train[0]

The corresponding digits are stored in `y_train`:

In [None]:
y_train[0]

We can draw our training digits using the venerable `matplotlib` library as follows:

In [None]:
import matplotlib.pyplot as plt

In [None]:
for i in range(0,15):
    plt.subplot(3,5, 1+i)
    plt.imshow(256-x_train[i], cmap='gray')
    plt.xticks([])
    plt.yticks([])

## Classification with a simple neural network

### Step 1: Build a simple neural network 

We build very simple neural network with just three densely connected layers, using the [`Sequential`](https://keras.io/getting-started/sequential-model-guide/) class of [Keras](https://keras.io).

In [None]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(64, activation='relu', input_shape=(28*28,)),
    Dense(10, activation='softmax')])

model.summary()

### Step 2: Prepare the input and the output

The **input** to our network will be the pixel values of an image as a vector of dimension $28*28=784$. To reshape the training and test samples, we can use the `reshape` method as follows:

In [None]:
import numpy as np

In [None]:
x_train_flat = x_train.reshape((x_train.shape[0], -1))
x_test_flat = x_test.reshape((x_test.shape[0], -1))

x_train_flat.shape, x_test_flat.shape

The `softmax` activation function in the last layer maps every 10-dimensional vector to a probability distribution $(p_0,\ldots,p_n)$. The idea is that $p_i$ is the probability that the input image was the digit $i$. Accordingly, for training, we have to one-hot-encode our labels. This can conveniently be done using [NumPy](https://numpy.org) as follows:

In [None]:
import numpy as np

eye = np.eye(10)
y_train_ohe = eye[y_train]
y_test_ohe = eye[y_test]

y_train[:3], y_train_ohe[:3]

### Step 3: Compile and train the network

To train our `model`, we need to call its [`compile`](https://keras.io/getting-started/sequential-model-guide/) method.

In [None]:
help(model.compile)

Behind the scenes, this step creates a computation graph for training the network using gradient descent. We need to select

- a loss function which, for each training sample, measures the deviation of the true label from the predicted   probability distribution,
- an optimizer that implements a gradient descent strategy.

Additionally, we can track a metrics.

In [None]:
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='AdaDelta')

Now we are ready to train the model using `model.fit`. To speed up things, we only use the first 5000 samples.

In [None]:
nr_samples = 5000
history = model.fit(x_train_flat[:nr_samples], y_train_ohe[:nr_samples], batch_size=32, validation_split=0.2, epochs=10)

### Intermezzo - the training history

The [`fit`](https://keras.io/getting-started/sequential-model-guide/) method returns a `History` object with useful information on the training:

In [None]:
scores = history.history
scores

We can plot these scores using matplotlib:

In [None]:
epochs = range(0, len(scores['loss']))
    
plt.subplot(1,2,1)
plt.plot(epochs, scores['loss'])
plt.plot(epochs, scores['val_loss'])
plt.grid()
plt.subplot(1,2,2)
plt.plot(epochs, scores['accuracy'])
plt.plot(epochs, scores['val_accuracy'])
plt.grid()

A simpler way to plot this data is to use the [Pandas](https://pandas.pydata.org) library:

In [None]:
import pandas as pd

df = pd.DataFrame(history.history)
df

In [None]:
ax = df[['loss', 'val_loss']].plot.line()

In [None]:
ax = df[['accuracy', 'val_accuracy']].plot.line()

### Step 4: Use the model for prediction

We can now use the model for prediction using its `predict` method. Let us input the first test digit:

In [None]:
plt.imshow(x_test[0], cmap='gray')

In [None]:
prediction = model.predict(x_test_flat[0:1])
prediction

Remember that the model outputs a probability distribution, where the $i$-th component represents the confidence of the model that the input represents the digit $i$. So, the predicted digit in this case is...

In [None]:
np.argmax(prediction)