# Simple Neural Network to Recognize Handwritten Digits

In this lab we will use supervised learning to train a simple neural network to recognize digits. Most of the code is taken from this post: https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/, which also contains advanced improvements on the network for those interested. 

The training setup looks like this:

![Model training setup](model.jpg)

## Setting the Stage

### Importing Useful Libraries

We start by importing some useful libraries: numpy for matrices, keras for machine learning and mathplotlib for visualizing data. We also make mathplotlib plots appear 'inline' i.e. in the notebook, along with the rest of the content.

In [None]:
%matplotlib inline
import math
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
import matplotlib.pyplot as plt
import matplotlib.image as img

### Initialize Training and Validation Data

Download MNIST dataset (if needed) and separate training data and validation data. 'X' matrices are input data and 'y' vectors are actual outputs, i.e. labels.

In [None]:
(X_train_raw, y_train_raw), (X_test_raw, y_test_raw) = mnist.load_data()

print("X_train_raw shape:", X_train_raw.shape)
print("y_train_raw shape:", y_train_raw.shape)
print("X_test_raw shape:", X_test_raw.shape)
print("y_test_raw shape:", y_test_raw.shape)

# Some useful constans
NUM_TRAINING_SAMPLES = X_train_raw.shape[0]
IMAGE_WIDTH = X_train_raw.shape[1]
IMAGE_HEIGHT = X_train_raw.shape[2]
NUM_PIXELS_PER_IMAGE = IMAGE_WIDTH * IMAGE_HEIGHT

### Some useful functions for printing sample digits

These functions are useful for visualizing training data. We'll use them later.

In [None]:
def print_labeled_image(image, digit, **kwargs):
    title_options = {'fontsize': 16, 'fontweight': 'bold', 'verticalalignment': 'bottom'}
    plt.imshow(image, cmap=plt.get_cmap('gray'), **kwargs)
    plt.title('Sample - ' + str(digit) + ':', title_options, 'left')

def print_samples(images, digits, number_of_samples, offset = 0, **kwargs):
    number_of_columns = 2
    number_of_rows = math.ceil(number_of_samples / 2)
    plt.figure(figsize = (number_of_columns * 4, number_of_rows * 4))
    for i in range(number_of_samples):
        sample_index = offset + i
        plt.subplot(number_of_rows, number_of_columns, i + 1)
        print_labeled_image(images[sample_index], digits[sample_index], **kwargs)
    plt.show()

def print_flattened_samples(flattened_images, digits, number_of_samples, offset = 0):
    number_of_columns = 1
    number_of_rows = number_of_samples
    plt.figure(figsize = (number_of_columns * 10, number_of_rows * 1.5))
    for i in range(number_of_samples):
        plt.subplot(number_of_rows, number_of_columns, i + 1)
        print_labeled_image([flattened_images[offset + i]], digits[offset + i], extent=(0, 784, 0, 50), aspect='equal')
    plt.show()

## Preparing Training Data

Let's first have a look at the input and output data in its raw shape. We print some samples just to show what it looks like.

In [None]:
print_samples(X_train_raw, y_train_raw, 4)

### Preparing Input Data (X)

We convert each image to a vector of length 784, each value representing the "whiteness" of a pixel. This is necessary for feeding the data into the training process.

In [None]:
# flatten 28*28 images to a 784 vector for each image
X_train_flattened = X_train_raw.reshape(X_train_raw.shape[0], NUM_PIXELS_PER_IMAGE).astype('float32')
X_test_flattened = X_test_raw.reshape(X_test_raw.shape[0], NUM_PIXELS_PER_IMAGE).astype('float32')
print("X_train_flattened shape: ", X_train_flattened.shape)
print("X_test shape: ", X_test_flattened.shape)

# normalize inputs from 0-255 to 0-1
X_train = X_train_flattened / 255
X_test = X_test_flattened / 255

print_flattened_samples(X_train, y_train_raw, 4)

### Preparing Output Data (y)

The output (y) data also needs to be converted. The output of our neural network will be a vector of length 10, each value representing the propability of the input sample being an image of that digit. 

To make it easier to compare that output to the actual y values, we convert the output into vectors representing a 100% probability that the sample is an image of that digit.

__Example:__ The output value __7__ is converted into a vector __[0, 0, 0, 0, 0, 0, 0, 1, 0 , 0]__

This is also known as _one hot encoding_.

In [None]:
y_train = np_utils.to_categorical(y_train_raw)
y_test = np_utils.to_categorical(y_test_raw)

NUM_CLASSES = y_test.shape[1]

print("y_train shape: ", y_train.shape)
print("Example output transformation : %d => %s" % (y_train_raw[0], str(y_train[0])))

## Creating and Training a Neural Network

Finally we get to the fun part: The neural networking! We'll rely heavily on the Keras library which makes it very simple to create and train the model.

First we define a function that creates a neural network architecture with one hidden layer and initializes its weights to random numbers. NUMBER_OF_PIXELS_PER_IMAGE is 784 = 28x28. NUM_CLASSES is 10, one for each number from 0 to 9. The network architecture looks like this:

![Neural network architecture](network.jpg)

Then we build the model and specify what optimization algorithm and cost function we'd like to use when training.

In [None]:
def create_model():
    # define architecture
    model = Sequential()
    model.add(Dense(NUM_PIXELS_PER_IMAGE, input_dim=NUM_PIXELS_PER_IMAGE, kernel_initializer='normal', activation='relu'))
    model.add(Dense(NUM_CLASSES, kernel_initializer='normal', activation='softmax'))
    # Build model with a cost function (loss) and an optimizer.
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

This function trains a given model with given training data and prints the result. The training data is divided into batches of 200 samples each and we train the model on them until all samples have been seen 10 times.

In [None]:
def train_model(model, X_train, y_train):
    # Train the model
    model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
    # Final evaluation of the model
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

Finally: We create an untrained model, then train it. Once this is done, it should be able to reasonably recognize hand-written digits.

In [None]:
model = create_model()
train_model(model, X_train, y_train)

## Evaluating the Trained Model

We start with a utility function to read an image file, assumed to be a 28x28 black-and-white .png image containing one hand-written digit.

In [None]:
def plot_image(image):
    plt.imshow(image, cmap=plt.get_cmap('gray'))
    plt.show()
    
def read_digit(filename, show_image=False):
    fromfile = img.imread(filename)
    grayscale_image = fromfile[:,:,0] # Reduced to grayscale from RGB
    if show_image:
        plot_image(grayscale_image)
   
    return grayscale_image.reshape(784)

Then we define a function to manually test the model with a handwritten digit of your own.

In [None]:
def evaluate_with_image(model, filename):
    test_digit = read_digit(filename, show_image=True)
    # let the model predict what digit this is and visualize the result
    predicted = model.predict(numpy.array([test_digit]), 1)
    plt.bar([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], predicted[0], align='center')
    plt.show()
    print("Most likely:", numpy.argmax(predicted))

And finally we use the evaluation function to test the model. Try it with a few samples of your own!

In [None]:
evaluate_with_image(model, "../drawingboard/digit.png")

## Lab Excercise

### Your task: train a new model to recognize sevens with a horizontal bar.

#### Hints:
1. You need to add training data containing sevens with horizontal bars, both input (X_train) and output (y_train)
2. There are 5 files containing handwritten sevens in "../sevens/digit{1-5}.png"

#### Some useful functions:

`[ expr(i) for i in range(MAX)]` <-- creates a list with MAX elements, values expr(i) from 0 to MAX, i.e.

`[ i*i for i in range(4)]` => `[0, 1, 4, 9]`

`[1, 2, 3].append(4)` => `[1, 2, 3, 4]`

`numpy.append(numpy.array([[1, 2],[3, 4]]), [[5, 6]], axis=0)` => `array([[1, 2], [3, 4], [5, 6]])`

In [None]:
# This function might be useful
def read_seven(sequence_number):
    return read_digit("../sevens/digit" + str(sequence_number) + ".png")

# extend the training data set with a large set of sevens
# 
# ... input images
# X_train_with_sevens = < your code here >
#
# ... and output digits
# y_train_with_sevens = < your code here >
#
# Then initialize and train the model


# Finally evaluate the result
evaluate_with_image(model, "../drawingboard/digit.png")

## Appendix: Some utilities for the curious

Use `describe model()` if you're curious about the representation and current state (weights) of the neural network.

In [None]:
def print_weights(layer):
    weights = layer.get_weights()
    input_weights = weights[0]
    bias_vector = weights[1]
    print("input weights shape:", numpy.array(input_weights).shape)
    print("bias vector shape:", numpy.array(bias_vector).shape)
    print("input weights first neuron:", input_weights[0])
    print("bias vector:", bias_vector)

    
def describe_model(model):
    for layer in model.layers:
        print("units:", layer.units)
        print("input:", layer.input)
        print("output:", layer.output)
        print_weights(layer)