# Neural Networks in Keras with MNIST

**Time**
- Teaching: 2.5 hours
- Challenges: 30 minutes

**Questions**
- "How to we load and manipulate input data for deep learning?"
- "How can we use keras to design custom neural networks?"
- "How do we validate our models?"
- "How do we decide neural network architectures that perform best on MNIST?"
- "How do we compare our models and test their ability to generalize?"


**Learning Objectives**
- "Understand the data format of inputs to neural networks."
- "The ability to implement, troubleshoot, and modify your own neural networks."

* * * * *

## Install packages

If your browser url adress is currently: 

`https://dlab.datahub.berkeley.edu/user/YOURNAME`

Then you are good to go! You are in the dlab's datahub which already has the packages needed to get started along with 2GB of memory.

If you want to work locally you need to:
1. Make sure you have a working version of Python
2. Install Keras
3. Install Jupyter

TODO: Add specific windows/mac instructions and test on system.
- My preference is to do this in a venv/conda, but that may create too much technical overhead :(

## Import packages
After installation is complete, we now are able to import the keras library. In order to simplify the syntax we will import the specific functions we need from keras.

Note you can simply import keras, then call each function from the module, for example:

`from tensorflow import keras`

`keras.dataset.mnist()`

In [None]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import to_categorical

# Note you may get a warning about CUDA and GPU set up
# You can ignore these for now

We also need to import packages to help us with visualizing and manipulating out data. Remember, this is half the battle!

In [None]:
import numpy as np
import matplotlib.pyplot as plt

### Keras and MNIST

Keras, Tensorflow, and MNIST, oh my! 

#### Keras
A deep-learning framework developed by google with a user-friendly API built for researchers to quickly prototype models.

#### Tensorflow
A backend engine for Keras.

#### MNIST
A dataset of 60,000 training images and 10,000 test images of handwritten digits. It is often considered the "hello world" of deep learning.

#### Problem Statement

You work for a bank and they need to automate the processing of reading mobile check deposits. 

At they moment they have overworked staff, looking at each photo of the checks and manually inputting the check number, account number, and amount of money. 

Can we make their life easier!?

<img src="https://www.usglobalmail.com/wp-content/uploads/2016/12/check-deposits.png" width="600" />

#### Our Solution

We will build a model that can correctly identify numbers from a picture.

As a first pass, we want to build a model that can take as input an image of a single handwritten digit, and predict the correct target digit.

Let's dive right in and build a neural network model with keras that is able identify handwritten digits! 

## Input Data 

They say 80% of a Data Scientists work is data cleaning.

Getting data in the right format is a non-trivial and critical task when  building deep learning models!

### Reading in the data

In [None]:
def get_mnist_data(subset=True):
    """
    Returns the MNIST dataset as a tuple:
    (x_train, y_train, x_val, y_val, x_test, y_test)
    
    When subset=TRUE:
    Returns only a subset of the mnist dataset.
    Especially important to use if you are on datahub and only have 1-2GB of memory.
    """
    
    if subset:
        N_TRAIN = 5000
        N_VALIDATION = 1000
        N_TEST = 1000
    else:
        N_TRAIN = 48000
        N_VALIDATION = 12000
        N_TEST = 10000
    
    (x_train_and_val, y_train_and_val), (x_test, y_test) = mnist.load_data()
    
    x_train = x_train_and_val[:N_TRAIN,:,:]
    y_train = y_train_and_val[:N_TRAIN]
    
    x_val = x_train_and_val[N_TRAIN: N_TRAIN + N_VALIDATION,:,:]
    y_val = y_train_and_val[N_TRAIN: N_TRAIN + N_VALIDATION]
    
    x_test = x_test[:N_TEST]
    y_test = y_test[:N_TEST]
    
    return x_train, y_train, x_val, y_val, x_test, y_test
    
    

In [None]:
# Load the data
# Set subset=False if you want to use the full dataset!
# Note that this will require 2+ GB of memory and will make training take longer

x_train, y_train, x_val, y_val, x_test, y_test = get_mnist_data(subset=True)

### Understanding the data

In [None]:
def data_summary(data):
    """
    Takes a list of our data partitions and returns the shape.
    """
    
    for i, data_partition in enumerate(data):
        if i == 0:
            print("Training Data")
        elif i == 2:
            print()
            print("Validation Data")
        elif i == 4:
            print()
            print("Testing Data")

        print(f"Shape: {data_partition.shape}")
    
    

In [None]:
data_summary([x_train, y_train, x_val, y_val, x_test, y_test])

## Challenge 1: Understanding the Input Data

1. Why do we split our data into train, validation, and test sets?
2. What is the shape of our input data partitions?
3. What is the type of the data?

**BONUS:**

4. What is the distribution of the target classes within the data, is it balanced?

### Understanding and visualizing the images

Let's extract just one example from the training data

In [None]:
one_image = x_train[0]

Let's look inside and see how it's stored!

In [None]:
one_image

Not very helpful to see the data in it's raw format 

Instead, let's utilize the shape attribute and matplotlib to help us visualize this image data!

In [None]:
one_image.shape

In [None]:
plt.imshow(one_image, cmap=plt.cm.binary);

Which dimension represent rows vs. columns in our image?

Let's find out through a test by grabbing all of the first dimension, and half of the second.

If the image is cutoff column-wise, we know the dimensions are \[ row pixel, column pixel \]

In [None]:
one_image_first_dimension = one_image[:,0:14]

In [None]:
plt.imshow(one_image_first_dimension, cmap=plt.cm.binary);

So now we feel solid with our input data format!

The input data has 3 dimensions \[ index for image, pixel row, pixel column \]

### Building a Neural Network in Keras

First we define our neural network architecture.

In [None]:
first_network = Sequential()
first_network.add(Dense(64, activation= "relu", input_shape=(28*28,)))
first_network.add(Dense(10, activation="softmax"))

We define a simple neural network with a single "hidden" layer, not very *deep*.

We can check out a summary of our model layout with the method `model_object.summary()`

In [None]:
first_network.summary()

Even this TINY neural network has over 50,000 parameters!

Next we need to give the model:
1. An optimizer strategy
2. A way to calculate loss
3. The metric we want to get out during training.

In [None]:
first_network.compile(optimizer = 'rmsprop', 
                     loss = 'categorical_crossentropy',
                     metrics = ['accuracy'])

Lastly, we train the model using the `model_name.fit()` method

If you run the code below.... you would find that the model would fail to train. I do not suggest running it to save memory!

In [None]:
#history = first_network.fit(x_train, y_train, epochs=5, batch_size=128, validation_data=(x_val, y_val))

Remember that half the battle with model training is understanding what format we need the data in!

We won't go over this in too much detail, but in a snapshot we will:
1. Flatten the pixel dimensions from (28,28) into a single dimension (784)
2. Change x data type from integer with pixel values \[0, 225\] to a float values \[0,1\].
3. Expand the y targets from a single dimension with values \[0:9\] to 10 dimensions each representing a target class. 
    - In each row, the target class column will have a value of 1, while the other columns will have a value of 0
    - This is a common technique to reformat categorical targets known as [One Hot Encoding](https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/).
 

In [None]:
def transform_data(xdata, ydata):
    """
    Deletes the current data partitions.
    Reads in MNIST data.
    Transforms image data:
        1. Flattens pixel dimensions from 2 -> 1
        2. Scales pixel values between [0,1]
    """
    
    x = {}
    for name, partition in zip(["x_train", "x_val", "x_test"],xdata):
        flatten = partition.reshape((partition.shape[0], 28 * 28))
        scaled = flatten.astype('float32') / 255
        x[name] = scaled
    
    y = {}
    for name, partition in zip(["y_train", "y_val", "y_test"],ydata):
        y[name] = to_categorical(partition)
    
    return x['x_train'], y['y_train'], x['x_val'], y['y_val'], x['x_test'], y['y_test']

In [None]:
x_train_trans, y_train_trans, x_val_trans, y_val_trans, x_test_trans, y_test_trans = transform_data([x_train, x_val, x_test],
                                                                                                    [y_train, y_val, y_test])

In [None]:
data_summary([x_train_trans, y_train_trans, x_val_trans, y_val_trans, x_test_trans, y_test_trans])

We have succesfully flattened our input images!

In [None]:
history = first_network.fit(x_train_trans, 
                            y_train_trans, 
                            epochs=5, 
                            batch_size=128, 
                            validation_data=(x_val_trans, y_val_trans), )

### Visualizing the accuracy

In [None]:
def plot_epoch_accuracy(history_dict):
    """
    Plots the training and validation accuracy of a neural network.
    """
    
    acc = history_dict['accuracy']
    val_acc = history_dict['val_accuracy']
    epochs = range(1, len(acc) + 1)
    plt.plot(epochs, acc, color = 'navy', alpha = 0.8, label='Training Accuracy')
    plt.plot(epochs, val_acc, color = 'green', label='Validation Accuracy')
    plt.title('Training and validation Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    return plt.show()

In [None]:
plot_epoch_accuracy(history.history)

Not too shabby for a *tiny* neural network!



In [None]:
second_network = Sequential()
second_network.add(Dense(512, activation= "relu", input_shape=(28*28,)))
second_network.add(Dense(10, activation="softmax"))

In [None]:
second_network.compile(optimizer = 'rmsprop', 
                     loss = 'categorical_crossentropy',
                     metrics = ['accuracy'])

In [None]:
history_two = second_network.fit(x_train_trans, 
                            y_train_trans, 
                            epochs=5, 
                            batch_size=128, 
                            validation_data=(x_val_trans, y_val_trans))

In [None]:
plot_epoch_accuracy(history_two.history)

How are you doing now?

Let's keep training this model for a few more epochs.

In [None]:
history_two_more_epochs = second_network.fit(x_train_trans, 
                            y_train_trans, 
                            epochs=10, 
                            batch_size=128, 
                            validation_data=(x_val_trans, y_val_trans))

In [None]:
def combined_epoch_plot(history_dicts):
    """
    Combines two history dictionaries for extended epochs.
    """
    
    combined_history = {key: [] for key in history_dicts[0].keys()}
    
    for hist in history_dicts:
        for key in combined_history.keys():
            combined_history[key].extend(hist[key])
    
    return plot_epoch_accuracy(combined_history)
        
        

In [None]:
combined_epoch_plot([history_two.history, history_two_more_epochs.history])

Diagnosing our accuracy plot, it looks like even with more epochs, our model is *still* failing to generalize. 

The dunning kruger effect in deep learning.

One solution is to give our network more nodes/neurons in our hidden layer to extract more features!

In [None]:
third_network = Sequential()
third_network.add(Dense(512, activation= "relu", input_shape=(28*28,)))
third_network.add(Dense(512, activation= "relu"))
third_network.add(Dense(10, activation="softmax"))

In [None]:
third_network.compile(optimizer = 'rmsprop', 
                     loss = 'categorical_crossentropy',
                     metrics = ['accuracy'])

In [None]:
history_two_layers = third_network.fit(x_train_trans, 
                            y_train_trans, 
                            epochs=10, 
                            batch_size=128, 
                            validation_data=(x_val_trans, y_val_trans))

In [None]:
plot_epoch_accuracy(history_two_layers.history)

Our model is still failing to generalize and reach validation accuracy that approaches the training accuracy!

One way we can prevent our model from overfitting to the training data is to add dropout in our hidden layers.

### TLDR for Dropout
Dropout makes our network a bit "forgetful" during training. We get to set a proportion of neurons the network will randomly "forget" or "drop", which we set as a probability in the `model.add(Dropout(probability_here))` function.

This forces our model to generalize, preventing it from overfitting to the training data. 

Check out [Hinton et al. 2012](https://arxiv.org/abs/1207.0580) describing how dropout helps neural network generalization.
- *Fun fact: they also use MNIST to test model generalization :)*

In [None]:
fourth_network = Sequential()
fourth_network.add(Dense(512, activation= "relu", input_shape=(28*28,)))
fourth_network.add(Dropout(0.5))
fourth_network.add(Dense(512, activation= "relu"))
fourth_network.add(Dropout(0.5))
fourth_network.add(Dense(10, activation="softmax"))

In [None]:
fourth_network.compile(optimizer = 'rmsprop', 
                     loss = 'categorical_crossentropy',
                     metrics = ['accuracy'])

In [None]:
history_dropout = fourth_network.fit(x_train_trans, 
                            y_train_trans, 
                            epochs=20, 
                            batch_size=128, 
                            validation_data=(x_val_trans, y_val_trans))

In [None]:
plot_epoch_accuracy(history_dropout.history)

After implementing dropout, do you think the model is better able to generalize?

## Challenge 2: Build your own neural network

1. Try training a new neural network that has a different:
    - Architecture
    - Activation Function
    - Epochs
2. How does the validation accuracy compare to the one we made?

Now that we are satisfied with this model, let's see how it performs on our holdout test set!

In [None]:
def get_model_accuracy(model, x_test, y_test):
    """
    Takes a model and a test set of data.
    Returns the accuracy.
    """
    
    score = model.evaluate(x_test, y_test, verbose=0)
    
    accuracy = round(score[1]*100, 1)
    
    return accuracy

In [None]:
get_model_accuracy(fourth_network, x_test_trans, y_test_trans)

How does this compare with our other models?

In [None]:
for i, mod in enumerate([first_network, second_network, third_network, fourth_network]):
    acc = get_model_accuracy(mod, x_test_trans, y_test_trans)
    print(f"Model {i+1} has an accuracy of {acc}%")

## Challenge 3: Evaluate your own model

Use your own model from challenge 2 to evaluate its performance on the test set!

### Visualize the test results

In [None]:
def plot_wrong_predictions(model, x_test, y_test):
    """
    Plots 25 incorrectly predicted images.
    """
    
    # Back transform images
    x_images = x_test.reshape(x_test.shape[0], 28, 28)
    
    # Format predictions and targets
    predictions = model.predict(x_test)
    predicted = np.argmax(predictions, axis=1)
    target = np.argmax(y_test, axis = 1)
    
    # Get wrong indices
    wrong_indices = np.where(predicted != target)[0]
    
    fig, axes = plt.subplots(5,5, figsize = (25,25))
    axes = axes.ravel()
    
    for ax, index in zip(axes, wrong_indices[:26]):
        ax.imshow(x_images[index], cmap=plt.cm.binary, interpolation='nearest')
        ax.set_title(f"Predicted {predicted[index]}, Actual is {target[index]}")
    
    return plt.show()

In [None]:
plot_wrong_predictions(fourth_network, x_test_trans, y_test_trans)

Do you feel our model made reasonable mistakes?

### Convolutional Neural Networks (CNNs)

The neural networks we have created so far are known as *vanilla neural networks*. 

These have many great usecases, but for problems in computer vision, we often use a different architecture called covolutional neural networks.

We will review the the details of how the work in our slides tomorrow, but for today, let's just compare their efficacy to vanilla neural nets!

In [None]:
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten

In [None]:
convnet = Sequential()

convnet.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
convnet.add(MaxPooling2D((2, 2)))
convnet.add(Conv2D(64, (3, 3), activation='relu'))
convnet.add(MaxPooling2D((2, 2)))
convnet.add(Conv2D(64, (3, 3), activation='relu'))

convnet.add(Flatten())
convnet.add(Dense(64, activation='relu'))
convnet.add(Dense(10, activation='softmax'))

convnet.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

Since our convnet has a *sense* of the two-dimensional image, we need to back transform our images to be two dimensional.

Why not use the original pixel data then?

We still want to keep the transformation of values between \[0, 1\].

In [None]:
def back_transform_2d(data):
    """
    Takes a list of flattened input pixel data.
    Reshapes pixel data from a single vector to two dimensions.
    """
    
    two_dimensional_data = []
    
    for d in data:
        transformed = d.reshape(d.shape[0], 28, 28, 1)
        two_dimensional_data.append(transformed)
    
    return [t for t in two_dimensional_data]
    

In [None]:
x_train_2d, x_val_2d, x_test_2d = back_transform_2d([x_train_trans, x_val_trans, x_test_trans])

In [None]:
x_train_2d.shape

In [None]:
history_convnet = convnet.fit(x_train_2d,
                              y_train_trans, 
                              epochs=10, 
                              batch_size=128, 
                              validation_data=(x_val_2d, y_val_trans))

In [None]:
plot_epoch_accuracy(history_convnet.history)

So how well does our CNN perform on the test set?

In [None]:
get_model_accuracy(convnet,x_test_2d, y_test_trans)

In [None]:
plot_wrong_predictions(convnet, x_test_2d, y_test_trans)