# Build FeedForward Neural Network on MINST Dataset 4/21/2017

Author Sylvia Gao, Deep Learning Reasearch Assistant from Data Lab 

Reference:
1. MINST for beginners (https://www.tensorflow.org/get_started/mnist/beginners)
2. Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python (https://elitedatascience.com/keras-tutorial-deep-learning-in-python)

## Introduction

This is the first section of serial tutorial about how to build different type of feedforward neural network. In this section, we will know:
1. what is the MINST dataset
2. how to build the simplest feedforward neural network on MINST dataset
3. Training the model
4. Test trained model


## Before build the model:

1. Be sure to install keras before starting this tutorial
2. Be sure you have basic knowledge about feedforward neural network (FNN), which includes:
     
     The structure of Neural Network; Activation Function; Loss Function; Dropout; Regularization
     
If not, please see some introduction materials about feed forward neural network


## 1.Get to know about MINST dataset

MNIST is a simple computer vision dataset. It consists of images of handwritten digits like these:
<img src='http://rodrigob.github.io/are_we_there_yet/build/images/mnist.png?1363085077'>

It also includes the label, telling us which digit it is, for example, the labels of images in the first line are 1, 1, 5, 4, 3

In this tutorial, we're going to train a feedforwar neural network model to look at images and predict what digits they are.

## 2.how to build the simplest feedforward neural network on MINST dataset

In this section, we will build feedforward neural network model to predict the handwriting digits. The key recipes for the model are:
-  normalizing dataset to range [0,1]
-  two hidden layer
-  activation function is:Relu
-  the loss function is:cross entropy
-  dropout rate is 0
-  use none regulization

### 2.1 import packages for preparation

In [1]:
from __future__ import print_function

import keras.callbacks as cb
from keras.datasets import mnist

#import the core layers from Keras,these are the layers used most common in NN model.
from keras.layers.core import Activation, Dense, Dropout

#import Sequential model type from Keras. 
#This is a linear stack of neural network layers, and good for FNN model.
from keras.models import Sequential
from keras.optimizers import SGD
from keras.regularizers import l1, l2
from keras.utils import np_utils

%matplotlib inline
from matplotlib import pyplot as plt

#import numpy 
import numpy as np
import time

Using TensorFlow backend.


### 2.2 Preporcessing

In this section, we will load the MINST dataset.Since each image is 28 pixels by 28 pixels. 

<img src="https://www.tensorflow.org/images/MNIST-Matrix.png">
We can interpret this as a big array of numbers. Then we can flatten this array into a vector of 28x28 = 784 numbers. We also need to transform the labels to a binary vector.

The result is that mnist.train.images is a tensor (an n-dimensional array) with a shape of [55000, 784]. The first dimension is an index into the list of images and the second dimension is the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.

<img src="https://www.tensorflow.org/images/mnist-train-xs.png">

In [2]:
def PreprocessDataset():
    from sklearn import preprocessing
    from numpy._distributor_init import NUMPY_MKL 
    ## Load pre-shuffled MINST data into train and test sets
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    ## Transform labels from 1-dimensional class arrays to 10-dimensional class matrices
    ## i.e., from '7' to [0,0,0,0,0,0,0,1,0,0]
    y_train = np_utils.to_categorical(y_train, 10)
    y_test = np_utils.to_categorical(y_test, 10)
    
    ## Process features. Convert data type and normalize values
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    ## Reshape from a matrix of 28 x 28 pixels to 1-D vector of 784 dimensions
    x_train = np.reshape(x_train, (60000, 784))
    x_test = np.reshape(x_test, (10000, 784))
    ## Min-Max Normalize value to [0, 1]
    x_train /= 255
    x_test /= 255
    
    return x_train, x_test, y_train, y_test

x_train, x_test, y_train, y_test = PreprocessDataset()

Then lets see what the data will look like after we transform handwriting picture after preprocessing

In [3]:
## Show part of training data: features and labels
## Each row is a sample, and each column represents a feature.
print("{:^43}".format("x"), "|", "{:^4}".format("y"))
print("="*50)
for sample_id in range(10):
    print("{:.2f} {:.2f} ... {:.2f} {:.2f} {:.2f} ...  {:.2f} {:.2f}".format(
            x_train[sample_id][0], x_train[sample_id][1],
            x_train[sample_id][156], x_train[sample_id][157], x_train[sample_id][158],
            x_train[sample_id][-2], x_train[sample_id][-1]), "| ",
           "{:.0f}".format(y_train[sample_id][0]))

                     x                      |  y  
0.00 0.00 ... 0.49 0.53 0.69 ...  0.00 0.00 |  0
0.00 0.00 ... 0.99 0.99 0.99 ...  0.00 0.00 |  1
0.00 0.00 ... 0.00 0.00 0.00 ...  0.00 0.00 |  0
0.00 0.00 ... 0.00 0.00 0.49 ...  0.00 0.00 |  0
0.00 0.00 ... 0.00 0.00 0.00 ...  0.00 0.00 |  0
0.00 0.00 ... 0.10 0.39 0.48 ...  0.00 0.00 |  0
0.00 0.00 ... 0.00 0.00 0.00 ...  0.00 0.00 |  0
0.00 0.00 ... 0.99 0.99 0.99 ...  0.00 0.00 |  0
0.00 0.00 ... 0.00 0.00 0.00 ...  0.00 0.00 |  0
0.00 0.00 ... 0.00 0.00 0.00 ...  0.00 0.00 |  0


### 2.3 Define the model:

Now lets define the model:

In [None]:
def DefineModel():
    ##Number of layers:2
    first_layer_width = 128
    second_layer_width = 64    
    
    ##Activation Function:Relu
    activation_func = 'relu' 
  
    ##Loss Function:cross entropy.
    loss_function = 'categorical_crossentropy'
    
    ##Dropout rate:0
    dropout_rate = 0.0
    
    ##Regularization:
    # weight_regularizer = None
    weight_regularizer = None

    ##Learning Rate:0.1
    learning_rate = 0.1
    
    ## Initialize model.
    model = Sequential()

    ## First hidden layer with 'first_layer_width' neurons. 
    ## Also need to specify input dimension.
    ## 'Dense' means fully-connected.
    model.add(Dense(first_layer_width, input_dim=784, W_regularizer=weight_regularizer))
    model.add(Activation(activation_func))
    if dropout_rate > 0:
        model.add(Dropout(0.5))

    ## Second hidden layer.
    if second_layer_width > 0:
        model.add(Dense(second_layer_width))
        model.add(Activation(activation_func))
        if dropout_rate > 0:
            model.add(Dropout(0.5))         
    
    ## Last layer has the same dimension as the number of classes
    model.add(Dense(10))
    ## For classification, the activation is softmax
    model.add(Activation('softmax'))
    ## Define optimizer. In this tutorial/codelab, we select SGD.
    ## You can also use other methods, e.g., opt = RMSprop()
    opt = SGD(lr=learning_rate, clipnorm=5.)
    ## Define loss function = 'categorical_crossentropy' or 'mean_squared_error'
    model.compile(loss=loss_function, optimizer=opt, metrics=["accuracy"])

    return model

### 2.4 Define Mini-batch:

In [None]:
def TrainModel(data=None, epochs=20):
    ##Mini-batch:
    ##uses mini-batch of size 128.
    ##batch = 128

    batch=128
    start_time = time.time()
    model = DefineModel()
    if data is None:
        print("Must provide data.")
        return
    x_train, x_test, y_train, y_test = data
    print('Start training.')
    ## Use the first 55,000 (out of 60,000) samples to train, last 5,500 samples to validate.
    history = model.fit(x_train[:55000], y_train[:55000], nb_epoch=epochs, batch_size=batch,
              validation_data=(x_train[55000:], y_train[55000:]))
    print("Training took {0} seconds.".format(time.time() - start_time))
    return model, history

## 3.Start Training

In [None]:
trained_model, training_history = TrainModel(data=[x_train, x_test, y_train, y_test])

### 3.1 Define Plotting

In [None]:
def PlotHistory(train_value, test_value, value_is_loss_or_acc):
    f, ax = plt.subplots()
    ax.plot([None] + train_value, 'o-')
    ax.plot([None] + test_value, 'x-')
    ## Plot legend and use the best location automatically: loc = 0.
    ax.legend(['Train ' + value_is_loss_or_acc, 'Validation ' + value_is_loss_or_acc], loc = 0) 
    ax.set_title('Training/Validation ' + value_is_loss_or_acc + ' per Epoch')
    ax.set_xlabel('Epoch')
    ax.set_ylabel(value_is_loss_or_acc) 

### 3.2 observe training process

In [None]:
PlotHistory(training_history.history['loss'], training_history.history['val_loss'], 'Loss')
PlotHistory(training_history.history['acc'], training_history.history['val_acc'], 'Accuracy')

### 3.3 observe regulation results

In [None]:
def drawWeightHistogram(x):
    ## the histogram of the data
    fig = plt.subplots()
    n, bins, patches = plt.hist(x, 50)
    plt.xlim(-0.5, 0.5)
    plt.xlabel('Weight')
    plt.ylabel('Count')
    zero_counts = (x == 0.0).sum()
    plt.title("Weight Histogram. Num of '0's: %d" % zero_counts)

In [None]:
w1 = trained_model.layers[0].get_weights()[0].flatten()
drawWeightHistogram(w1)

## 4.Define Testing Procedure

In [None]:
def TestModel(model=None, data=None):
    if model is None:
        print("Must provide a trained model.")
        return
    if data is None:
        print("Must provide data.")
        return
    x_test, y_test = data
    scores = model.evaluate(x_test, y_test)
    return scores

## 5.Test Trained Model

In [None]:
test_score = TestModel(model=trained_model, data=[x_test, y_test])
print("Test loss {:.4f}, accuracy {:.2f}%".format(test_score[0], test_score[1] * 100))

In [None]:
def ShowInputImage(data):
    """Visualize input image."""
    plot = plt.figure()
    plot.set_size_inches(2,2)
    plt.imshow(np.reshape(-data, (28,28)), cmap='Greys_r')
    plt.title("Input")
    plt.axis('off')
    plt.show()
    
def ShowHiddenLayerOutput(input_data, target_layer_num):
    """Visualize output from the target hidden layer."""
    from keras import backend as K
    ## Backend converter: to TensorFlow
    target_layer = K.function(trained_model.inputs, [trained_model.layers[target_layer_num].output])
    ## Extract output from the target hidden layer.
    target_layer_out = target_layer([input_data])
    plot = plt.figure()
    plot.set_size_inches(2,2)
    plt.imshow(np.reshape(-target_layer_out[0][0], (16,-1)), cmap='Greys_r')
    plt.title("Hidden layer " + str(target_layer_num))
    plt.axis('off')
    plt.show()

def ShowFinalOutput(input_data):
    """Calculate final prediction."""
    from keras import backend as K
    ## Backend converter: to TensorFlow
    ## Calculate final prediction.
    last_layer = K.function(trained_model.inputs, [trained_model.layers[-1].output])
    last_layer_out = last_layer([input_data])
    print("Final prediction: " + str(np.argmax(last_layer_out[0][0])) )

ShowInputImage(x_test[0])
ShowHiddenLayerOutput(x_test, 1)
ShowFinalOutput(x_test)