#### I. Imports

In [2]:
import keras
import numpy as np

from keras.datasets import mnist
from keras.optimizers import Adam
from keras.models import Sequential
from keras.preprocessing import image
from keras.layers.core import Dense
from keras.layers.core import Lambda
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.pooling import MaxPooling2D
from keras.layers.convolutional import Convolution2D
from keras.layers.normalization import BatchNormalization
from keras.utils.np_utils import to_categorical

Using Theano backend.


I want to import Vgg16 as well because I'll want it's low-level features

In [None]:
# import os, sys
# sys.path.insert(1, os.path.join('../utils/'))

Actually, looks like Vgg's ImageNet weights won't be needed.

In [None]:
# from vgg16 import Vgg16
# vgg = Vgg16()

#### II. Load Data

In [3]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

#### III. Preprocessing
Keras Convolutional layers expect color channels, so expand an empty dimension in the input data, to account for no colors.

In [4]:
x_train = np.expand_dims(x_train, 1) # can also enter <axis=1> for <1>
x_test = np.expand_dims(x_test, 1)
x_train.shape

(60000, 1, 28, 28)

One-Hot Encoding the outputs:

In [5]:
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

Since this notebook's models are all mimicking Vgg16, the input data should be preprocessed in the same way: in this case normalized by subtracting the mean and dividing by the standard deviation. It turns out this is a good idea generally.

In [6]:
x_mean = x_train.mean().astype(np.float32)
x_stdv = x_train.std().astype(np.float32)
def norm_input(x): return (x - x_mean) / x_stdv

#### Create Data Batch Generator
```ImageDataGenerator``` with no arguments will return a generator. Later, when data is augmented, it'll be told how to do so. I don't know what batch-size should be set to: in Lecture it was 64.

In [6]:
gen = image.ImageDataGenerator()
trn_batches = gen.flow(x_train, y_train, batch_size=64)
tst_batches = gen.flow(x_test, y_test, batch_size=64)

General workflow, going forward:
* Define the model's architecture.
* Run 1 Epoch at default learning rate (0.01 ~ 0.001 depending on optimizer) to get it started.
* Jack up the learning to 0.1 (as high as you'll ever want to go) and run 1 Epoch, possibly more if you can get away with it.
* Lower the learning rate by a factor of 10 and run for a number of Epochs -- repeat until model begins to overfit (acc > valacc)

Points on internal architecture:
* Each model will have a data-preprocessing ```Lambda``` layer, which normalizes the input and assigns a shape of (1 color-channel x 28 pixels x 28 pixels)
* Weights are flattened before entering FC layers
* Convolutional Layers will come in 2 pairs (because this is similar to the Vgg model). 
* Convol layer-pairs will start with 32 3x3 filters and double to 64 3x3 layers
* A MaxPooling Layer comes after each Convol-pair.
* When Batch-Normalization is applied, it is done after every layer but last (excluding MaxPooling).
* Final layer is always an FC softmax layer with 10 outputs for our 10 digits.
* Dropout, when applied, should increase toward later layers.
* Optimizer used in Lecture was Adam(), all layers but last use a ReLU activation, loss function is categorical cross-entropy.

### 1. Linear Model
aka 'Dense', 'Fully-Connected'

In [24]:
def LinModel():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28)),
        Flatten(),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [25]:
Linear_model = LinModel()
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                          validation_data=tst_batches, nb_val_samples=trn_batches.n)

  .format(self.name, input_shape))


Epoch 1/1


<keras.callbacks.History at 0x111b44690>

In [27]:
Linear_model.optimizer.lr=0.1
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                          validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/1


<keras.callbacks.History at 0x112838790>

In [28]:
Linear_model.optimizer.lr=0.01
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4,
                          validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1128388d0>

In [29]:
Linear_model.optimizer.lr=0.001
Linear_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=8,
                          validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<keras.callbacks.History at 0x1088a2ed0>

### 2. Single Dense Layer
This is what people in the 80s & 90s thought of as a 'Neural Network': a single Fully-Connected hidden layer. I don't yet know why the hidden layer is ouputting 512 units. For natural-image recognition it's 4096. I'll see whether a ReLU or Softmax hidden layer works better.

By the way, the training and hyper-parameter tuning process *should* be automated. I want to use a NN to figure out how to do that for me.

In [30]:
def FCModel():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28)),
        Dense(512, activation='relu'),
        Flatten(),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [32]:
FC_model = FCModel()
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                      validation_data=tst_batches, nb_val_samples=tst_batches.n)

  .format(self.name, input_shape))


Epoch 1/1


<keras.callbacks.History at 0x113c7ce10>

In [34]:
FC_model.optimizer=0.1
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                      validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/1


<keras.callbacks.History at 0x113e6bbd0>

In [35]:
FC_model.optimizer=0.01
FC_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4,
                      validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x113e7cd90>

With an accuracy of 0.9823 and validation accuracy of 0.9664, the model's starting to overfit significantly and hit its limits, so it's time to go on to the next technique.

### 3. Basic 'VGG' style Convolutional Neural Network

I'm specifying an output shape equal to the input shape, to suppress the warnings keras was giving me; and it stated it was defaulting to that anyway. Or maybe I should've written ```output_shape=input_shape```

Aha: yes it's as I thought. See [this thread](http://forums.fast.ai/t/warning-output-shape-argument-not-specified/416/10) -- output_shape warnings were added to Keras, and neither vgg16.py (nor I until now) were specifying output_shape. It's fine.

The first time I ran this, I forgot to have 2 pairs of Conv layers. At the third λr=0.01 epoch I had acc/val of 0.9964, 0.9878

Also noticing: in lecture JH was using a GPU which I think was an NVidia Titan X. I'm using an Intel Core i5 CPU on a MacBook Pro. His epochs took on average 6 seconds, mine are taking 180~190. Convolutions are also the most computationally-intensive part of the NN being built here.

Interestingly, the model with 2 Conv-layer pairs is taking avg 160s. Best Acc/Val: ```0.9968/0.9944```

Final: ```0.9975/0.9918``` - massive overfitting

In [46]:
def ConvModel():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
        Convolution2D(32, 3, 3, activation='relu'),
        Convolution2D(32, 3, 3, activation='relu'),
        MaxPooling2D(),
        Convolution2D(64, 3, 3, activation='relu'),
        Convolution2D(64, 3, 3, activation='relu'),
        MaxPooling2D(),
        Flatten(),
        Dense(512, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [47]:
CNN_model = ConvModel()
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1,
                       validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/1


<keras.callbacks.History at 0x123ed46d0>

In [48]:
CNN_model.optimizer=0.1
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                       validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/1


<keras.callbacks.History at 0x124513250>

In [49]:
CNN_model.optimizer=0.01
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                       validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x124513150>

In [50]:
# Running again until validation accuracy stops increasing
CNN_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                       validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x124513210>

### 4. Data Augmentation

In [7]:
gen = image.ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
                           height_shift_range=0.08, zoom_range=0.08)
trn_batches = gen.flow(x_train, y_train, batch_size=64)
tst_batches = gen.flow(x_test, y_test, batch_size=64)

In [55]:
CNN_Aug_model = ConvModel()
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                  validation_data=tst_batches, nb_val_samples=tst_batches.n)
# upping LR
print("Learning Rate, η = 0.1")
CNN_Aug_model.optimizer.lr=0.1
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                  validation_data=tst_batches, nb_val_samples=tst_batches.n)
# brining LR back down for more epochs
print("Learning Rate, η = 0.01")
CNN_Aug_model.optimizer.lr=0.01
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                  validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/1
Learning Rate, η = 0.1
Epoch 1/1
Learning Rate, η = 0.01
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x122656110>

In [56]:
# 4 more epochs at η=0.01
CNN_Aug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                  validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1245137d0>

### 5. Batch Normalization + Data Augmentation

[See this thread](http://forums.fast.ai/t/batchnormalization-axis-1-when-used-on-convolutional-layers/214) for info on BatchNorm axis.

In [8]:
def ConvModelBN():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
        Convolution2D(32, 3, 3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(32, 3, 3, activation='relu'),
        MaxPooling2D(),
        BatchNormalization(axis=1),
        Convolution2D(64, 3, 3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(64, 3, 3, activation='relu'),
        MaxPooling2D(),
        Flatten(),
        BatchNormalization(),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [9]:
CNN_BNAug_model = ConvModelBN()
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.1")
CNN_BNAug_model.optimizer=0.1
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=2, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNAug_model.optimizer=0.01
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/1
Learning Rate, η = 0.1
Epoch 1/2
Epoch 2/2
Learning Rate, η = 0.01
Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x11440ff90>

In [10]:
# some more training at 0.1 and 0.01:
print("Learning Rate, η = 0.1")
CNN_BNAug_model.optimizer=0.1
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNAug_model.optimizer=0.01
CNN_BNAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
                              validation_data=tst_batches, nb_val_samples=tst_batches.n)

Learning Rate, η = 0.1
Epoch 1/1
Learning Rate, η = 0.01
Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x10c08eb90>

### 6. Dropout + Batch Normalization + Data Augmentation

In [11]:
def ConvModelBNDo():
    model = Sequential([
        Lambda(norm_input, input_shape=(1, 28, 28), output_shape=(1, 28, 28)),
        Convolution2D(32, 3, 3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(32, 3, 3, activation='relu'),
        MaxPooling2D(),
        BatchNormalization(axis=1),
        Convolution2D(64, 3, 3, activation='relu'),
        BatchNormalization(axis=1),
        Convolution2D(64, 3, 3, activation='relu'),
        MaxPooling2D(),
        Flatten(),
        BatchNormalization(),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(10, activation='softmax')
    ])
    model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [12]:
CNN_BNDoAug_model = ConvModelBNDo()
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.1")
CNN_BNDoAug_model.optimizer.lr=0.1
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)
print("Learning Rate, η = 0.01")
CNN_BNDoAug_model.optimizer.lr=0.01
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/1
Learning Rate, η = 0.1
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Learning Rate, η = 0.01
Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x1146df550>

In [13]:
# 6 more epochs at 0.01
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=6, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x11b359210>

In [14]:
print("Learning Rate η = 0.001")
CNN_BNDoAug_model.optimizer.lr=0.001
CNN_BNDoAug_model.fit_generator(trn_batches, trn_batches.n, nb_epoch=12, verbose=1,
                                validation_data=tst_batches, nb_val_samples=tst_batches.n)

Learning Rate η = 0.001
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x11b0d5c90>

### 7. Ensembling

Define a function to automatically train a model:

In [None]:
def train_model():
    model = ConvModelBNDo()
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=1, verbose=1,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    model.optimizer.lr=0.1
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=4, verbose=0,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    model.optimizer.lr=0.01
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=12, verbose=0,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    model.optimizer.lr=0.001
    model.fit_generator(trn_batches, trn_batches.n, nb_epoch=12, verbose=0,
                                    validation_data=tst_batches, nb_val_samples=tst_batches.n)
    return model

Create an array of models

In [None]:
# this'll take some time
models = [train_model for m in xrange(6)]

Save the models' weights -- bc this wasn't computationally cheap

In [None]:
from os import getcwd
path = getcwd() + 'data/mnist/'
model_path = path + 'models/'
for i,m in enumerate(models):
    m.save_weights(model_path + 'MNIST_CNN' + str(i) + '.pkl')

Create an array of predictions from the models on the test-set. I'm using a batch size of ```256``` because that's what was done in lecture, and prediction is such an easier task that I think the large size just helps things go faster.

In [None]:
ensemble_preds = np.stack([m.predict(x_test, batch_size=256) for m in models])

Finally, take the average of the predictions:

In [None]:
avg_preds = ensemble_preds.mean(axis=0)