# Neural Nets: Convolution

In this lab we try to build a model to detect handwritten digits. This lab should introduce you in the use of keras and should enable you to build and train your own CNNs.

In [None]:
# imports
from keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np

# nn
from keras import backend as K
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten, Reshape, Activation
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.utils import np_utils

# image manipulation
import cv2

%matplotlib inline

First of all, we load our data set that is devided into a training and a testing set.

In [None]:
# load digit dataset with training and test images
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In this data set we have 10 different classes. Each data point in this data set is an image of resoution 28x28 and shows a handwirtten digit.

In [None]:
nb_classes = 10
# dimension
img_rows, img_cols = x_train[0].shape
print('number of rows: ' + str(img_rows) + '; number of cols: ' + str(img_cols))

To get a better feeling for the data we take a look at the first 10 instances.

In [None]:
# The data consists of images of digits, let's
# have a look at the first 4 images, stored in the `images` attribute of the
# dataset. For all images, we know which digit they represent: it is given in the 'target' of
# the dataset.
num_to_show = 10
for i in range(num_to_show):
    image = x_train[i]
    label = y_train[i]
    plt.subplot(2, num_to_show, i + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title(str(label))
plt.show()

# print some statistics
print('number of train images: ' + str(len(x_train)))

<b>Exercise 1:</b>  
Create a histogram showing the class distribution.

In [None]:
# YOUR CODE GOES HERE


We have to reshape the training and test data so that a model created by keras can handle it. We do so by just adding an extra dimension.

In [None]:
# transform data set
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

<b>Simple Neural Model</b>

Our first simple model is a neural net with one hidden layer consisting of 512 hidden units and a ReLU activation function. To prevent overfitting a dropout layer is added after that. The input for this net is an image that is converted to a flat vector in the first layer. Please have a look at the architecture and try to understand the structure of this neural net.

In [None]:
def getSimpleModel():
    # simple model with dense layers
    simpleModel = Sequential()
    simpleModel.add(Flatten(input_shape=input_shape))
    simpleModel.add(Dense(512, activation='relu'))
    simpleModel.add(Dropout(0.2))
    simpleModel.add(Dense(nb_classes, activation='softmax'))

    simpleModel.compile(loss='categorical_crossentropy', optimizer='SGD', metrics=['accuracy'])
    return simpleModel

simpleModel = getSimpleModel()
simpleModel.summary()

<b>Simple Convolutional Neural Net</b>

The second neural net we are using is a convolutional neural net. This network consists of a convolutional layer a max pooling layer and a dense layer in the end. 

In [None]:
# simple cnn
def getCNNModel():
    nb_filters_one = 32
    nb_filters_two = 64
    nb_conv = 3
    nb_pool = 2
    dense_size = 128
    cnnModel = Sequential()
    cnnModel.add(Conv2D(nb_filters_one, kernel_size=(nb_conv, nb_conv),
                     activation='relu',
                     input_shape=input_shape))
    cnnModel.add(MaxPooling2D(pool_size=(nb_pool, nb_pool)))
    cnnModel.add(Dropout(0.25))
    cnnModel.add(Flatten())
    cnnModel.add(Dense(dense_size, activation='relu'))
    cnnModel.add(Dropout(0.5))
    cnnModel.add(Dense(nb_classes, activation='softmax'))

    cnnModel.compile(loss='categorical_crossentropy', optimizer='SGD', metrics=['accuracy'])
    return cnnModel

cnnModel = getCNNModel()
cnnModel.summary()

<b>Exercise 2:</b>  
Compare the two different network architectures. What can you say about the number of trainable parameters? Which neural net will probably work better?

<b>Answer:</b>
number of parameter in cnn is higher. and cnn will probably work better

To use the integer label for the neural net training we have to encode them in a one-hot-encoding way.

In [None]:
oneHotLabelTrain = np_utils.to_categorical(y_train, nb_classes)
oneHotLabelTest  = np_utils.to_categorical(y_test,  nb_classes)

Now we can train both models and save the training and testing accuracies for the different epochs in a callback. This can really take some time.

In [None]:
batch_size = 128
simpleModel = getSimpleModel()
learnHistSimple = simpleModel.fit(x_train,oneHotLabelTrain,validation_data=(x_test,oneHotLabelTest),
                                  batch_size=batch_size,
                                  epochs=10)
cnnModel = getCNNModel()
learnHistCNN    = cnnModel.fit(x_train,oneHotLabelTrain,validation_data=(x_test,oneHotLabelTest),
                                  batch_size=batch_size,
                                  epochs=10)

In [None]:
print(learnHistCNN.history)
print(np.arange(1,numEpochs+1,1).shape)
print(np.array(learnHistSimple.history['loss']).shape)

<b>Exercise 3:</b>  
Plot the learning curves for the two neural nets, showing the training and testing loss over the number of epochs. What are the learning curves telling you?

In [None]:
# YOUR CODE GOES HERE


So the results for the neural net (simple model) are much worse than the results for the CNN.

<b>Excercise 4:</b>

Normalize the input data so that all values are between 0 and 1. After that, retrain the simple model. Are the results better? Can you explain the results?

In [None]:
# YOUR CODE GOES HERE
# normalize images
x_train_rescaled = 
x_test_rescaled = 

# retrain model
batch_size = 128
simpleModel = getSimpleModel()
learnHistSimple = simpleModel.fit(x_train_rescaled,oneHotLabelTrain,validation_data=(x_test_rescaled,oneHotLabelTest),
                                  batch_size=batch_size,
                                  epochs=10)

<b>Excercise 5:</b>
    
Write a function that randomly places the digits from the input data on a 2-dimensional image of size 28x28. Do this by firstly resizing the image to the size 14x14 and than placing this digit on a grid of 28x28. After that the data set should look like shown in the image above.

<img src="files/non-centered.png",width=600,height=600>

<b>Hint</b>: Maybe the function cv2.resize(...) could be helpful.

In [None]:
def add_random_noise(inputData):
    out_data = np.zeros(inputData.shape)
    # YOUR CODE GOES HERE
    
    
    
    return out_data

Now we can have a look at the new data set of non-centered digits.

In [None]:
# try to create a more realistic data set
x_train_non_centered = add_random_noise(x_train)
x_test_non_centered = add_random_noise(x_test)

for i in range(50):
    image = x_train_non_centered[i]
    image = image.reshape([img_rows,img_cols])
    label = y_train[i]
    plt.subplot(5, 10, i + 1)
    plt.axis('off')
    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    plt.title(label)
plt.show()


# scale data
x_train_non_centered /= 255.
x_test_non_centered /= 255.

if K.image_data_format() == 'channels_first':
    x_train_non_centered = x_train_non_centered.reshape(x_train_non_centered.shape[0], 1, img_rows, img_cols)
    x_test_non_centered = x_test_non_centered.reshape(x_test_non_centered.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train_non_centered = x_train_non_centered.reshape(x_train_non_centered.shape[0], img_rows, img_cols, 1)
    x_test_non_centered = x_test_non_centered.reshape(x_test_non_centered.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)


<b>Excercise 6:</b>
    
Train the simple nn and the cnn on this new data set for 10 epochs and compare the trainng and testing results with each other. What conclusions can you draw?

In [None]:
batch_size = 128
# YOUR CODE GOES HERE
simpleModel = getSimpleModel()
cnnModel = getCNNModel()
learnHistSimple = 
learnHistCNN = 


<b>Additional Excercise</b>

Try to build a model that is able to get better classification results on the non-centered data set.