# Foundations of Data Mining: Resit Assignment Part 2

Please complete all assignments in this notebook. You should submit this notebook, as well as a PDF version (See File > Download as).

In [1]:
%matplotlib inline
from preamble import *
plt.rcParams['savefig.dpi'] = 100 # This controls the size of your figures
# Comment out and restart notebook if you only want the last output of each cell.
InteractiveShell.ast_node_interactivity = "all"

## Fully convolutional neural networks (10 points)

The goal of this excercise is to develop a model for recognizing seqences of MNIST digits in an image. You are given **one** image of size $(28, 28 \times n)$ where $n$ is an integer. So, the input is a composition of $n$ MNIST images. The model needs to be able to recognize the sequence of digits that form the composition image. 

$n$ can take different values larger of 1. Training many different models for different values of $n$ is not very efficient. A more efficient strategy is to develop a model for detecting single digits and apply that model on the input image by sliding it (to cover the larger surface) and then post processing its output. Sliding a CNN model over a large image in the general case results in re-computing many of the computations. One approach that deals with this efficiently is converting the CNN model into a fully convolutional neural network(*).

Any Dense layer can be converted to a Convolutional layer. For example, a Dense layer with K=128 neurons that inputs an activation map 7×7×512 can be equivalently expressed as a Convolutional layer with filter size of 7x7, (padding = 0, stride 1x1) and K=128 number of filters. Rather than flattening the activation map and feeding it into a dense layer we are using a Convolutional layer and setting the filter size to be exactly the size of the input volume (For more detais: https://arxiv.org/pdf/1411.4038.pdf).

To solve this excercise extend/modify the code given bellow to:
* Produce a CNN model for single digit detection of MNIST images
* Convert this model into a fully convolutional neural network (FCNN)
* Run the FCNN model on a image that is a composition of a sequence of MNIST images of length $n=5$
* Post-process the FCNN output from the previous point to produce a numerical sequence of digits as output
* Discuss how your implementation would need to be modified if the input is not a concatenation of MNIST image,s but rather the MNIST images are pasted on a random location on a large canvas (no implementatation is necessary)

(*) FCNN models are very useful for processing very large images, such as produced in digital pathology or satelite imaging. This models are applicable to any domain where the size of the objects in the image is much smaller than the image itself. 

In [None]:
## Imports
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Reshape, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
import numpy as np
# Training parameters
batch_size = 128
num_classes = 10
epochs = 1

# Data preparation

# input image dimensions
img_rows, img_cols = 28, 28
num_digits = 5

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
    input_shape_test = (1, img_rows, img_cols*num_digits)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)
    input_shape_test = (img_rows, img_cols * num_digits, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

In [None]:
# CNN Model definition
def build_cnn_model():

    model = Sequential()
    # <<<<<<<<<<<<<<<<<<
    # Add convolutional layers here to finalize the CNN model
    # <<<<<<<<<<<<<<<<<<
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])
    return model


In [1]:
# FCNN Model definition
def build_full_conv_model():

    model = Sequential()
    # <<<<<<<<<<<<<<<<<<<
    # Add convolutional layers here that match the CNN architecture
    # <<<<<<<<<<<<<<<<<<<
    
    # This is are two example Conv layers that do the same operations 
    # as the Dense layers with 128 neurons and the final classification layer as given in the CNN model 
    # replace the <q> value of the size of the filters, such that it matches the dimension of the activation map
    model.add(Conv2D(128, kernel_size=(<q>, <q>), activation='relu'))
    model.add(Dropout(0.5))
    model.add(Conv2D(num_classes, kernel_size=(1, 1)))
    model.add(Reshape((-1, num_classes)))
    model.add(Activation("softmax"))

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer=keras.optimizers.Adadelta(),
                  metrics=['accuracy'])
    return model

In [None]:
def transplant_weight(weights_in, weights_out):
    for layer in range(len(weights_in)):
        weights_out[layer].flat = weights_in[layer].flat
    return weights_out

In [None]:
def build_concat_mnist(images, labels, size=100, length=5):
    # form the appropriate tensor for the result
    image_width = images.shape[2]
    out = np.zeros((size, images.shape[1], image_width*length, images.shape[3]))
    out_labels = np.zeros((size, length, labels.shape[1]))
    # randomly sample images (size x length samples)
    indexes = np.random.randint(0, images.shape[0], (size, length))
    # fill in the final tensor for images and labels
    for i in range(size):
        for j in range(length):
            start = image_width*j
            end = image_width*(j+1)
            out[i, :, start:end] = images[indexes[i, j]]
            out_labels[i, j] = labels[indexes[i, j]]
    return out, out_labels

In [None]:
# build test data
(x_concat_test, y_concat_test) = build_concat_mnist(x_test, y_test)

# build models
model = build_cnn_model()
full_conv_model = build_full_conv_model()

# model file
model_file = "mnist-model-{epoch:02d}-{val_loss:.2f}.hdf5"

# train
model = train_cnn_model(model_file, model)
# During testing you can load the CNN parameters, rather than train each time
#model_load = "mnist-model.hdf5"
#model.load_weights(model_load)

model_weights = model.get_weights()

full_conv_weights = full_conv_model.get_weights()
full_conv_weights = transplant_weight(model_weights, full_conv_weights)
full_conv_model.set_weights(full_conv_weights)

# Execute the FCNN
output = full_conv_model.predict(x_concat_test, batch_size=batch_size)
# The output of the fully conv model needs to be processed to produce the sequence of digits
output = np.argmax(output, axis=2)
print("Model output")

# <<<<<<<<<<<<<<<
# post processing the model output
# <<<<<<<<<<<<<<<


# the true labels for comparison
labels = np.argmax(y_concat_test, axis=2)
print("Label outputs")
print(labels)