<H1 Align=center> Challenge Machine Learning Avancée - MDI341  </H1>

### Objective of the Challenge
Lean a predictive system which predicts templates (vector of 128 elements) from low resolution images, in such a way to minize the gap to original templates that were predicted form high resolution images.

### Datasets and general procedure to generate the predictions
* Fit the model on the training set (X_train, Y_train) of dimension (100'000 x 2304, 100'000 x 128)
* Evaluate the performance on the validation set (X_valid, Y_valid) of dimension (10'000 x 2304 , 10'000 x 128)
* Prediction on the testing set X_test of dimension 10'000 x 2304 and generate Y_test of dimension 10'000 x 128

### Pre-processing of the data and data augmentation
* Centering-Reduction of the images so that they have zero mean and unit variance. 
* By rotating, mirroring, adjusting contrast, etc. it is possible to generate additional images from the original ones. Thanks to this data augmentation thechnique, it is possible to extend our training set with slightly modified images with known labels, which makes our model generalize better

### Coding of the model
Since we are manipulating images in this challenge, I decided to implement a Convolutional Neural Network model. To do so, I have implemented it using TensorFlow initially and afterwards TFLearn and Keras (which are Deep Learning Libraries which run on top of TensorFlow).

### Architecture of the Convolutional Neural Networks
We were asked to build a CNN architecture that contains less than 50k parameters. Therefore, I implemented an architecture that respects this constraint and that has the following form : 
* Convolutional Layer 1 : 4 x 4 kernel with 10 outputs (filtered images). A relu activation is applied on these outputs. 
* Pooling 1 : (2, 2) window size with a stride of 2, so that the size of the filtered images is reduced by 2.
* Convolutional Layer 2 : 4 x 4 kernel with 10 outputs. A relu activation is applied on these outputs. 
* Pooling 2 : (2, 2) window size with a stride of 2, so that the size of the filtered images is reduced by 2.
* Convolutional Layer 3 : 4 x 4 kernel with 10 outputs. A relu activation is applied on these outputs. 
* Pooling 3 : (2, 2) window size with a stride of 2, so that the size of the filtered images is reduced by 2.
* Fully Connected Layer 3 : which takes as input the output of Pooling 3 (6x6x10) and flattening it on a single vector and connected it to an output layer of dimension 128. 

This architecture has 49'628 parameters.

To avoid overfitting, I have added a dropout of 0.75 on the Fully Connected layer which allows to randomly disable, at each operation of optimization of the weights (backpropagation), a portion of connections between the nodes. 

### Weights initialization
To optimize the convergence speed of the Gradient Descent procedure, I used the Xavier Initiliazer to set the initial
weights properly (close to the global minimum).

### Optimizer with an exponential decay learning rate 
For the Gradient Descent optimization part, I chose to use an AdamOptimizer which allows to minize the loss function by updating the weight matrices W and b of each layer (backpropagation operation). When training a model, it is often recommended to lower the learning rate as the training progresses. Thus, I applied an exponential decay function to a provided initial learning. 

For the training procedure, I process the data by batches of 128 images and I save the model every 10 batches.

### Imports

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
import tflearn
import keras as kr
import warnings
warnings.filterwarnings("ignore")

from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
from tflearn.layers.core import input_data
from sklearn import preprocessing

### Data loading

In [None]:
# Training set
images_train_fname    = 'data_train.bin'
templates_train_fname = 'fv_train.bin'

# Validation set
images_valid_fname    = 'data_valid.bin'
templates_valid_fname = 'fv_valid.bin'

# Testing set
images_test_fname     = 'data_test.bin'

# number of images
num_train_images = 100000
num_valid_images = 10000
num_test_images  = 10000

# size of the images 48*48 pixels in gray levels
image_dim = 48 * 48

# dimension of the templates
template_dim = 128

# read the training files
with open(images_train_fname, 'rb') as f:
    X_train = np.fromfile(f, dtype=np.uint8, count=num_train_images * image_dim).astype(np.float32)
    X_train = X_train.reshape(num_train_images, image_dim)

with open(templates_train_fname, 'rb') as f:
    Y_train = np.fromfile(f, dtype=np.float32, count=num_train_images * template_dim)
    Y_train = Y_train.reshape(num_train_images, template_dim)

# read the validation files
with open(images_valid_fname, 'rb') as f:
    X_valid = np.fromfile(f, dtype=np.uint8, count=num_valid_images * image_dim).astype(np.float32)
    X_valid = X_valid.reshape(num_valid_images, image_dim)

with open(templates_valid_fname, 'rb') as f:
    Y_valid = np.fromfile(f, dtype=np.float32, count=num_valid_images * template_dim)
    Y_valid = Y_valid.reshape(num_valid_images, template_dim)

# read the test file
with open(images_test_fname, 'rb') as f:
    X_test = np.fromfile(f, dtype=np.uint8, count=num_test_images * image_dim).astype(np.float32)
    X_test = X_test.reshape(num_test_images, image_dim)

In [None]:
# Investigation of the shape of the data
print("Shape X_train : {}".format(X_train.shape))
print("Shape Y_train : {}\n".format(Y_train.shape))
print("Shape X_valid : {}".format(X_valid.shape))
print("Shape Y_valid: {}\n".format(Y_valid.shape))
print("Shape X_test : {}".format(X_test.shape))

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(12, 4))

for i, ax in enumerate(axes):
    ax.imshow(X_train[i].reshape(48, 48), cmap=plt.cm.gray)

### Data Centering-Reduction and Data Augmentation

In [None]:
# Real-time data preprocessing
img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()

# Real-time data augmentation
img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_rotation(max_angle=25.)

### Performance measure: MSE

In [None]:
def compute_pred_score(y_true, y_pred):
    err_y = np.mean((y_true - y_pred) ** 2)
    return err_y

## Linear Regression 

In [None]:
%%time 
# Simple linear regression 

# Learn the matrix W from the training data
W = np.linalg.solve(np.dot(X_train.T, X_train),
                    np.dot(X_train.T, Y_train))

# Training prediction
Y_train_pred = np.dot(X_train, W)

# Training error
err_train = compute_pred_score(Y_train, Y_train_pred)

print('Error on the training data : %s' % err_train)

In [None]:
%%time
# Monitor the validation error
Y_valid_pred = np.dot(X_valid, W)
err_valid = compute_pred_score(Y_valid, Y_valid_pred)

print('Error on the validation data : %s' % err_valid)

In [None]:
# Generate the prediction
test_template_pred = np.dot(X_test, W)

f = open('template_pred.bin', 'wb')
for i in range(num_test_images):
    f.write(test_template_pred[i, :])
f.close()

## Multilayel Convolutional Network (TensorFlow)

#### Convolution and pooling

In [None]:
def convolution(x, W, b):  
    # convolution of input image x by weight tensor W => generation of output features (filtered images).
    x = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def max_pooling(x, k=2):  
    # by default (window size : 2 x 2) with a stride of 2.
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='SAME')

#### Weight initialization

In [None]:
def weight_variable(shape):  
    # Weight W initialization
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):  
    # Bias b initialization
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

#### CNN 

In [None]:
def conv_net(x, weights, biases, dropout):
    
    # 1st Convolutional Layer
    with tf.variable_scope('conv1'):
        x = tf.reshape(x, shape=[-1, 48, 48, 1])
        prep_x = input_data(placeholder=x, data_preprocessing=img_prep, data_augmentation=img_aug)
        h_conv1 = convolution(prep_x, weights['wc1'], biases['bc1'])
    with tf.variable_scope('pool1'):
        h_pool1 = max_pooling(h_conv1)
    
    
    # 2nd Convolutional Layer
    with tf.variable_scope('conv2'):
        h_conv2 = convolution(h_pool1, weights['wc2'],  biases['bc2'])
    with tf.variable_scope('pool2'):
        h_pool2 = max_pooling(h_conv2)
    
    
    # 3rd Convolutional Layer
    with tf.variable_scope('conv3'):
        h_conv3 = convolution(h_pool2, weights['wc3'], biases['bc3'])
    with tf.variable_scope('pool3'):
        h_pool3 = max_pooling(h_conv3)

        
    # Fully (Densely) Connected Layer 
    with tf.variable_scope('fc'):
        h_pool3_flat = tf.reshape(h_pool3, [-1, weights['wf1'].get_shape().as_list()[0]])
        h_fc1 = tf.matmul(h_pool3_flat, weights['wf1']) + biases['bf1']
        h_fc1_drop = tf.nn.dropout(h_fc1, dropout)  # Apply Dropout
    
    return h_fc1_drop

#### To run batchs

In [None]:
class images_container():

    def __init__(self, _X, _y, _batch_size):
        self.current_position = 0
        self.batch_size = _batch_size
        
        self.X = _X
        self.y = _y
        
    def next_batch(self):
        if(self.current_position + self.batch_size > len(self.X)):
            self.current_position = 0
        
        X_batch = self.X[self.current_position: self.current_position + self.batch_size]
        y_batch = self.y[self.current_position: self.current_position + self.batch_size]
            
        self.current_position += self.batch_size
        return X_batch, y_batch

#### Get total number of parameters of the model

In [None]:
def get_total_param():
    """
    Be sure you are not over the 50 K !!!
    """
    total_parameters = 0
    #iterating over all variables
    for variable in tf.trainable_variables():  
        local_parameters=1
        shape = variable.get_shape()  #getting shape of a variable
        for i in shape:
            local_parameters*=i.value  #mutiplying dimension values
        total_parameters+=local_parameters
    
    return total_parameters

#### Simulation framework

In [None]:
# Graph inputs
with tf.name_scope('data'):
    X = tf.placeholder(tf.float32, [None, image_dim], name="X_placeholder")
    Y = tf.placeholder(tf.float32, [None, template_dim], name="Y_placeholder")
dropout = tf.placeholder(tf.float32, name="dropout") 

tf.summary.scalar('dropout_keep_probability', dropout)

# Weights and biases
weights = {
    # 1st conv. layer : 4x4 filter taking 1 input (image) and generating 10 outputs
    'wc1': weight_variable([4, 4, 1, 10]),
    # Pooling
    # 2nd conv. layer : 4x4 filter, 10 inputs, 10 outputs
    'wc2': weight_variable([4, 4, 10, 10]),
    # 3rd conv. layer : 4x4 filter, 10 inputs, 10 outputs
    'wc3': weight_variable([4, 4, 10, 10]),
    # fully connected, 6*6*10 inputs, 128 outputs
    'wf1': weight_variable([6 * 6 * 10, 128]),
}

biases = {
    'bc1': bias_variable([10]),
    'bc2': bias_variable([10]),
    'bc3': bias_variable([10]),
    'bf1': bias_variable([128]),
}

# Building the model
pred = conv_net(X, weights, biases, dropout)

# Get total number of parameters
print("Total number of parameters : {}".format(get_total_param()))

# Loss function
with tf.name_scope('loss'):
    loss = tf.reduce_sum(tf.square(Y - pred))
tf.summary.scalar('loss', loss)

# Merge all summaries 
merged = tf.summary.merge_all()

# Optimizer
start_learning_rate = 0.01
decay_steps = 1000
decay_rate = 0.95
global_step = tf.Variable(0, trainable=False)
# Decaying learning rate
learning_rate = tf.train.exponential_decay(start_learning_rate, global_step, decay_steps, decay_rate)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_step)

In [None]:
%%time

# Simulation parameters
BATCH_SIZE = 128
DISPLAY_STEP = 10
DROPOUT = 0.75
n_batches = int(num_train_images / BATCH_SIZE)  # to process all the training dataset (781 batches in total)
N_EPOCHS = 5  # train the model N_EPOCHS times

# Train the model, write training summaries by adding them to the summary_writer 
with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer())  # Run session and initializing variables
    saver = tf.train.Saver() # To save the model 
    writer = tf.summary.FileWriter("./log_files", sess.graph)  # Generate tons of logs (including summaries)

    train_data = images_container(X_train, Y_train, BATCH_SIZE)
    step = 1
    
    # Keep training until reach max iterations
    while step < (N_EPOCHS * n_batches):  # 3905 iterations
        batch_X, batch_Y = train_data.next_batch()
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={X: batch_X, Y: batch_Y, dropout: DROPOUT})

        if step % DISPLAY_STEP == 0:
            loss_train, summary = sess.run([loss, merged], feed_dict={X: batch_X, Y: batch_Y, dropout: 1.0}) 
            writer.add_summary(summary, step)
            
            saver.save(sess, "./log_files/model.ckpt", step)
            
            # Make predictions and compute score on validation set
            Y_valid_pred = sess.run(pred, {X: X_valid, dropout: 1.0})
            # Look at the score on validation set
            loss_valid = compute_pred_score(Y_valid_pred, Y_valid)
            print("Epoch ({}/{}) - Batch ({}/{}) ==> Training loss: {}, Validation loss: {}".format(int(step / n_batches) + 1, 
                                                    N_EPOCHS, np.mod(step, n_batches), n_batches, loss_train, loss_valid))
            
        step += 1
        
    print("Optimization finished")

    # Predictions on the testing set
    Y_test = sess.run(pred, {X: X_test, dropout: 1.0})

    # Write prediction in test file
    f = open('template_pred.bin', 'wb')
    for i in range(num_test_images):
        f.write(Y_test[i, :])
    f.close()

sess.close()

## Multilayel Convolutional Network (TFLearn)

In [None]:
import tflearn
from tflearn import Momentum
from tflearn.data_utils import shuffle, to_categorical
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.estimator import regression
from tflearn.data_preprocessing import ImagePreprocessing
from tflearn.data_augmentation import ImageAugmentation
from tflearn.metrics import R2
from tflearn.losses import L2

In [None]:
X_train = X_train.reshape([-1, 48, 48, 1])
X_valid = X_valid.reshape([-1, 48, 48, 1])

# Real-time data preprocessing
img_prep = ImagePreprocessing()
img_prep.add_featurewise_zero_center()
img_prep.add_featurewise_stdnorm()

# Real-time data augmentation
img_aug = ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_rotation(max_angle=25.)

# Convolutional network building
network = input_data(shape=[None, 48, 48, 1],
                     data_preprocessing=img_prep,
                     data_augmentation=img_aug)

trunc_norm = tflearn.initializations.truncated_normal(stddev=0.1)

network = conv_2d(network, 10, 4, activation='prelu', weights_init='xavier', name='conv1')
network = max_pool_2d(network, 2, name='pool1')

network = conv_2d(network, 10, 4, activation='prelu', weights_init='xavier', name='conv2')
network = max_pool_2d(network, 2, name='pool2')

network = conv_2d(network, 10, 4, activation='prelu', weights_init='xavier', name='conv3')
network = max_pool_2d(network, 2, name='pool3')

network = fully_connected(network, 128, activation='prelu', weights_init='xavier', name ='fc1', regularizer='L2')
network = dropout(network, 0.75)

network = regression(network, optimizer='adam', loss='mean_square', learning_rate=0.0001)


# Train using classifier
model = tflearn.DNN(network, tensorboard_verbose=3, checkpoint_path="models_tflearn/model.tfl.ckpt")

model.fit(X_train, Y_train, n_epoch=40, validation_set=(X_valid, Y_valid), snapshot_step=250, run_id='cnn_image')

# Save model when training is complete to a file
model.save('model.tfl.ckpt')

# Predictions on the testing set
Y_test = model.predict(X_test)

# Write prediction in test file
f = open('template_pred.bin', 'wb')
for i in range(num_test_images):
    f.write(Y_test[i, :])
f.close()

## Multilayel Convolutional Network (Keras)

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD, Adam
from keras.utils import np_utils
from keras.layers import Dense, Activation, LocallyConnected2D, AveragePooling2D, Dropout, MaxPooling2D
from keras.layers.advanced_activations import PReLU
from keras.models import load_model

model = Sequential()

# 1st Convolutional Layer
model.add(Conv2D(10, kernel_size=(4, 4), input_shape=(48, 48, 1), padding='same', kernel_initializer='glorot_normal', name='Conv1'))
model.add(PReLU(shared_axes=[1,2]))
model.add(AveragePooling2D(pool_size=(2,2), name='pool1'))
model.add(Dropout(0.1))

# 2nd Convolutional Layer
model.add(Conv2D(10, kernel_size=(4, 4), padding='same', kernel_initializer='glorot_normal', name='Conv2'))
model.add(PReLU(shared_axes=[1,2]))
model.add(AveragePooling2D(pool_size=(2,2), name='pool2'))
model.add(Dropout(0.1))

# 3rd Convolutional Layer
model.add(Conv2D(10, kernel_size=(4, 4), padding='same', kernel_initializer='glorot_normal', name='Conv3'))
model.add(PReLU(shared_axes=[1,2]))
model.add(AveragePooling2D(pool_size=(2,2), name='pool3'))
model.add(Dropout(0.1))

model.add(LocallyConnected2D(filters=128, kernel_size=(6,6), strides=(1,1)))
model.add(Dropout(0.1))

model.add(Flatten())

In [None]:
model.summary()

In [None]:
def compute_keras_pred_score(y_true, y_pred):
    err_y = keras.backend.mean((y_true - y_pred) ** 2)*100.
    return err_y

In [None]:
learning_rate = 0.001
adam = Adam(lr=0.001, decay=0.00)
model.compile(loss='mean_squared_error', optimizer=adam)

batch_size = 128
nb_epoch = 35

X_train = X_train.reshape([-1, 48, 48, 1])
X_valid = X_valid.reshape([-1, 48, 48, 1])
X_test = X_test.reshape([-1, 48, 48, 1])

checkpointer = keras.callbacks.ModelCheckpoint(filepath="./logs/weights.hdf5", verbose=1, save_best_only=False, mode='auto', period=1)
tbCallBack = keras.callbacks.TensorBoard(log_dir='./Graph', histogram_freq=1,write_graph=True, write_images=True)

history = model.fit(X_train, Y_train, batch_size=batch_size, epochs=nb_epoch,verbose=1, callbacks=[tbCallBack, checkpointer])


model.save('model_keras.h5')  
model = load_model('model_keras.h5')

# Prediction
Y_test = model.predict(X_test)
f = open('model_keras.bin', 'wb')
for i in range(num_test_images):
    f.write(Y_test[i, :])
f.close()