# CHALLENGE MDI 341

## Presentation

My first step was to construct a class "CNN_Model" which could make it fast and easy to create a Tensorflow graph for a convolutional network, as well as train it, save/restore checkpoints, and write summaries for visualization in Tensorboard. You can see it in the "model.py" file. The most useful defined functions are:
- model.add_conv
- model.add_pool
- model.add_batch_norm
- model.add_fully_connected
- model.add_mse_loss
- model.add_adam_optimizer
- model.add_summaries
- model.initialize_session
- model.train
- model.close_sessio
- model.predict

I also made a "Batch_Normalizer" class for batch normalization, in the "batch_norm.py" file.  

The "data_reader.py" file contains a method "load_data" for loading data from binary files into Numpy arrays.  

Finally the "utils.py" file contains some useful functions : "create_logs_and_checkpoints_folders" is called when initializing the model with a name; "add_description" is used to save the code which created the model graph into a text file, so that we can restore the model when needed after copy-pasting this code; "num_parameters" counts the number of trainable parameters in the model, which is limited to 50000; "predict_test_and_save" does the final predictions and save it to disk in a binary file.  

The graph is constructed using the "model.add_" functions. Then you train it using "model.initialize_session", "model.train", and model.close_session". Data is fed into 2 placeholders (one for images, one for templates). For training, I feed batches of 100 images, generated by random sampling from the Numpy arrays already loaded in memory at initialization. Every batch, I write the batch loss in a summary to visualize it in Tensorboard. For validation, I chose a step size of 500 batches, which means that for every 0.5 epoch, I feed the whole validation data in the placeholders and compute the validation loss, and write it to another summary for visualization in Tensorboard, and I also save the model in a checkpoint file for being able to stop the training and restore it later.  

An important point is that I constructed and trained several different models. So to be able to differentiate them in Tensorboard and when restoring models, there is a "name" parameter to feed when you initialize the model class. The folders for logs and checkpoints will then be created accordingly if they don't exist. Which means that if you want to restore a model, you just need to pass the right "name" parameter at initialization.  

In my zip file, you can find the logs and checkpoint for my best model (I didn't put all the models for memory reasons). Here is the code used to create/restore this model, and then to predict test templates and save it to disk:

## Module imports

In [1]:
import os
import sys
import numpy as np
import tensorflow as tf

import utils
import data_reader
from batch_norm import Batch_Normalizer
from model import CNN_Model

## Parameters

In [2]:
# File parameters
DATA_DIR = 'data'
TRAIN_IMAGES = 'data_train.bin'
TRAIN_TEMPLATES = 'fv_train.bin'
VALID_IMAGES = 'data_valid.bin'
VALID_TEMPLATES = 'fv_valid.bin'
TEST_IMAGES = 'data_test.bin'
LOG_DIR = 'logs'
SAVE_DIR = 'checkpoints'

# Data parameters
NUM_TRAIN_IMAGES = 100000
NUM_VALID_IMAGES = 10000
NUM_TEST_IMAGES = 10000
IMAGE_DIM = 48
TEMPLATE_DIM = 128

# Batch parameters
BATCH_SIZE = 100
VALID_BATCH_SIZE = 1000
STEP_SIZE = 500

# Batch normalization parameters
EMA_DECAY = 0.99
BN_EPSILON = 0.01

## Model initialization

This model consists of 3 convolutional learning blocks, and a fully-connected layer of size 128. Each block contains 2 convolutional layers with kernels 3x3.  
The first 2 blocks end with a max pooling of stride 2, and the last one an average pooling of stride 6 (used to reduce
images to size 1x1 before the fully-connected layer, or else the number of parameters would be too high). I added a batch normalization after each convolution.

In [3]:
model = CNN_Model('config3', DATA_DIR, LOG_DIR, SAVE_DIR, IMAGE_DIM, TEMPLATE_DIM, TRAIN_IMAGES, TRAIN_TEMPLATES, 
                  VALID_IMAGES, VALID_TEMPLATES, NUM_TRAIN_IMAGES, NUM_VALID_IMAGES, EMA_DECAY, BN_EPSILON)

model.add_conv('conv_1_1', ksize=[3, 3, 1, 10])
model.add_batch_norm('bn_1_1')
model.add_conv('conv_1_2', ksize=[3, 3, 10, 10])
model.add_pool('max_pool_1', 'max', ksize=[1, 2, 2, 1], stride=[1, 2, 2, 1])
model.add_batch_norm('bn_1_2')

model.add_conv('conv_2_1', ksize=[3, 3, 10, 20])
model.add_batch_norm('bn_2_1')
model.add_conv('conv_2_2', ksize=[3, 3, 20, 20])
model.add_pool('max_pool_2', 'max', ksize=[1, 2, 2, 1], stride=[1, 2, 2, 1])
model.add_batch_norm('bn_2_2')

model.add_conv('conv_3_1', ksize=[3, 3, 20, 40])
model.add_batch_norm('bn_3_1')
model.add_conv('conv_3_2', ksize=[3, 3, 40, 40])
model.add_pool('avg_pool_3', 'avg', ksize=[1, 6, 6, 1], stride=[1, 6, 6, 1])
model.add_batch_norm('bn_3_2')

model.add_fully_connected('fc', size=128)

model.add_mse_loss('mse_loss')
model.add_adam_optimizer('optimizer', init_learning_rate=0.00001, decay=False)
model.add_summaries('summaries')

## Number of parameters

In [4]:
print("Number of trainable parameters = %d" % utils.num_parameters(model))

Number of trainable parameters = 49018


## Validation loss

This model has been trained for 47 epochs. I train it here with 1 more image and a step size of 1 so that it computes the validation loss (not elegant, I know).

In [7]:
model.initialize_session(restore=True)
model.train(n_batches=1, step_size=1, batch_size=1, valid_batch_size=1000, save=False)
model.close_session()

Restored session checkpoints\config3\model.ckpt-47000
Batch 47001:
--> Train loss = 0.007816
--> Valid loss = 0.006573


## Predictions

In [4]:
utils.predict_test_and_save('template_pred_config3.bin', model, DATA_DIR, TEST_IMAGES, 
                            VALID_TEMPLATES, NUM_TEST_IMAGES, IMAGE_DIM, TEMPLATE_DIM)

Restored session checkpoints\config3\model.ckpt-47000
Predictions saved at data\template_pred_config3.bin
