Wayne Nixalo - 4 Jun 2017

Codealong of Practical Deep Learning I Lesson 4 [statefarm JNB](https://github.com/fastai/courses/blob/master/deeplearning1/nbs/statefarm.ipynb). My comments are in italics.

## Enter State Farm

In [None]:
import theano

In [None]:
%matplotlib inline
from __future__ import print_function, divison
path = "data/statefarm/"
import utils; reload(utils)
from IPython.display import FileLink

In [None]:
batch_size=32

## Setup Batches

In [None]:
batches = get_batches(path + 'train', batch_size=batch_size)
val_batches = get_batches(path + 'valid', batch_size=batch_size)
test_batches = get_batches(path + 'test', batch_size=batch_size)

In [None]:
(val_classes, trn_classes, val_labels, trn_labels, 
    val_filenames, filenames, test_filenames) = get_classes(path)

Rather than using batches, we could just import all the data into an array to save some processing time. (In mose examples, I'm using the batches, however - just because that's how I happened to start out.)

In [None]:
# trn = get_data(path + 'train')
# val = get_data(path + 'valid')

In [None]:
# save_array(path + 'results/val.dat', val)
# save_array(path + 'results/trn.dat', trn)

In [None]:
# val = load_array(path + 'results/val.dat')
# trn = load_array(path + 'results/trn.dat')

## Re-run sample experiments on full dataset

We should find that everything that worked on the sample (see statefarm-sample.ipynb), works on the full dataset too. Only better! Because now we have more data. So let's see how they go - the models in this section are exact copies of the sample notebook models.

### Single Conv Layer

In [None]:
def conv1(batches):
    model = Sequential([
                BatchNormalization(axis=1, input_shape=(3,224,224)),
                Convolution2D(32, 3, 3, activation='relu'),
                BatchNormalization(axis=1),
                MaxPooling2D((3,3)),
                Convolution2D(64, 3, 3, activation='relu'),
                BatchNormalization(axis=1),
                MaxPooling2D((3,3)),
                Flatten(),
                Dense(200, activation='relu'),
                BatchNormalization(),
                Dense(10, activation='softmax')
            ])
    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
                        nb_val_samples=val_batches.nb_sample)
    
    model.optimizer.lr = 1e-3
    model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches,
                        nb_val_samples=val_batches.nb_sample)
    return model

In [None]:
model = conv1(batches)

Interestingly, with no regularization or augmentation, we're getting some reasonable results from our simple convolutional model. So with augmentation, we hopefully will see some very good results.

### Data Augmentation

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)

In [None]:
model = conv1(batches)

In [None]:
model.optimizer.lr = 1e-4
model.fit_generator(batches, batches.nb_sample, nb_epoch=15, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)

I'm shocked by *how* good these results are! We're regularly seeing 75-80% accuracy on the validation set, which puts us into the top third or better of the competition. With such a simple model and no dropout or semi-supervised learning, this really speaks to the power of this approach to data augmentation. *Noted.*

### Four Conv/Pooling pairs + Dropout

Unfortunately, the results are still very unstable - the validation accuracy jumps from epoch to epoch. Perhaps a deeper model with some dropout would help.

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path + 'train', gen_t, batch_size=batch_size)

In [None]:
model = Sequential([
            BatchNormalization(axis=1, input_shape=(3, 224, 224)),
            Convolution2D(32, 3, 3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D(),
            Convolution2D(64, 3, 3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D(),
            Convolution2D(128, 3, 3, activation='relu'),
            BatchNormalization(axis=1),
            MaxPooling2D(),
            Flatten(),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Droupout(0.5),
            Dense(200, activation='relu'),
            BatchNormalization(),
            Dropout(0.5),
            Dense(10, activation='softmax')
        ])

In [None]:
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)

In [None]:
model.optimizer.lr=1e-3

In [None]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=10, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)

In [None]:
model.optimizer.lr=1e-5

In [None]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=10, validation_data=val_batches,
                    nb_val_samples=val_batches.nb_sample)

This is looking quite a bit better - the accuracy is similar, but the stability is higher. There's still some way to go however...

### Imagenet Conb Features

Since we have so little data, and it is similar to ImageNet images (full-color photos), using pre-trained VGG weights is likely to be helpful - in fact it seems likely that we won't need to fine-tune the convolutional layer weights much, if at all. So we can pre-compute the output of the last convolutional layer, as we did in lesson 3 when we experimented with dropout. (However this means that we can't use full data augmentation, since we can't pre-compute something that changes every image.)

*NOTE: there is a work-around to this, discussed in lecture: add augmented-versions of the data to the dataset first.*

In [None]:
vgg = Vgg16()
model = vgg.model
last_conv_idx = [i for i, l in enumerate(model.layers) if type(l) is Convolution2D][-1]
conv_layers = model.layers[:last_conv_idx + 1]

In [None]:
conv_model = Sequential(conv_layers)

In [None]:
# ยก batches shuffle must be set to False when pre-computing features !
batches = get_batches(path + 'train', batch_size=batch_size, shuffle=False)

In [None]:
(val_classes, trn_classes, val_labels, trn_labels,
    val_filenames, filenames, test_filenames) = get_classes(path)

In [None]:
conv_feat = conv_model.predict_generator(batches, batches.nb_sample)
conv_val_feat = conv_model.predict_generator(val_batches, val_batches.nb_sample)
conv_test_feat = conv_model.predict_generator(test_batches, test_batches.nb_sample)

In [None]:
save_array(path + 'results/conv_feat.dat', conv_feat)
save_array(path + 'results/conv_val_feat.dat', conv_val_feat)
save_array(path + 'results/conv_test_feat.dat', conv_test_feat)

In [None]:
conv_feat = load_array(path + 'results/conv_feat.dat')
conv_val_feat = load_array(path + 'results/conv_val_feat.dat')
# conv_test_feat = load_array(path + 'results/conv_test_feat.dat')
conv_val_feat.shape

### BatchNorm Dense layers on pretrained Conv layers

Since we've pre-computed the output of the last convolutional layer, we need to create a network that takes that as input, and predicts our 10 classes. Let's try using a simplified version of VGG's dense layers.

In [None]:
def gen_bn_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p/2),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dropout(p/2),
        Dense(128, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, acitvation='softmax')
        ]

In [None]:
p = 0.8

In [None]:
bn_model = Sequential(get_bn_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=1,
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.optimizer.lr=0.01

In [None]:
bn_model.fit(conv_feat, trn_labels, batch_size=batch_size, nb_epoch=2,
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.save_weights(path + 'models/conv8.h5')

Looking good! Let's try pre-computing 5 epochs worth of augmented data, so we can experiment with combining dropout and augmentation on the pre-trained model.

### Pre-computed DataAugmentation + Dropout

We'll use our usual data augmentation parameters:

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
                shear_range=0.1, channel_shif_range=20, width_shift_range=0.1)
da_batches = get_batches(path + 'train', gen_t, batch_size=batch_size, shuffle=False)

We'll use those to create a dataset of convolutional features 5x bigger than the training set.

In [None]:
da_conv_feat = conv_model.predict_generator(da_batches, da_batches.nb_smaple*5)

In [None]:
save_array(path + 'results/da_conv_feat.dat', da_conv_feat)

In [None]:
da_conv_feat = load_array('results/da_conv_feat.dat')

Let's include the real trianing data as well in its non-augmented form.

In [None]:
da_conv_feat = np.concatenate([da_conv_feat, conv_feat])

Since we've now got a dataset 6x bigger than before, we'll need tocopy our labels 6 times too.

In [None]:
da_trn_labels = np.concatenate([trn_labels]*6)

Based on some experiments the previous model works well, with bigger dense layers.

In [None]:
def get_bn_da_layers(p):
    return [
        MaxPooling2D(input_shape = conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(p),
        Dense(10, activation='softmax')
        ]

In [None]:
p=0.8

In [None]:
bn_model = Sequential(get_bn_da_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

Now we can train the model as usual, with pre-computed augmented data.

In [None]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=1,
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.optimizer.lr=0.01

In [None]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.optimizer.lr=1e-4

In [None]:
bn_model.fit(da_conv_feat, da_trn_labels, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))

Looks good - let's save those weights.

In [None]:
bn_model.save_weights(path + 'models/da_conv8_1.h5')

### Pseudo-Labeling

We're going to try using a combination of [psudeo labeling](http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf) and [knowledge distillation](https://arxiv.org/abs/1503.02531) to allow us to use unlabeled data (ie: do semi-supervised learning). For our initial experiment we'll use the validation set as the unlabled data, so that we can see that it is working without using the test set. At a layer date we'll try using the test set.

To do this, we can simply calculate the predictions of our model...

In [None]:
val_pseudo = bn_model.predict(conv_val_feat, batch_size=batch_size)

...concatenate them with our training labels...

In [None]:
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])

In [None]:
comb_feat = np.concatenate([da_conv_feat, conv_val_feat])

...and fine-tune our model using that data.

In [None]:
bn_model.load_weights(path _ + 'models/da_conv8_1.h5')

In [None]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=1,
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))

In [None]:
bn_model.optimizer.lr=1e-5

In [None]:
bn_model.fit(comb_feat, comb_pseudo, batch_size=batch_size, nb_epoch=4,
             validation_data=(conv_val_feat, val_labels))

That's a distinct improvement - even although the validation set isn't very big. This looks encouraging for when we try this on the test set.

In [None]:
bn_model.save_weights(path + 'models/bn-ps8.h5')

### Submit

We'll find a good clipping amount using the validation set, prior to submitting.

In [None]:
def do_clip(arr, mx): return np.clip(arr, (1 - mx)/9, mx)

In [None]:
keras.metrics.categorical_crossentropy(val_labels, do_clip(val_preds, 0.93)).eval()

In [None]:
conv_test_feat = load_array(path + 'results/conv_test_feat.dat')

In [None]:
preds = bn_model.predict(conv_test_feat, batch_size=batch_size*2)

In [None]:
subm = do_clip(preds, 0.93)

In [None]:
subm_name = path + 'results/subm.gz'

In [None]:
classes = sorted(batches.class_indices, key=batches.class_indices.get)

In [None]:
submission = pd.DataFrame(subm, columns=classes)
submission.insert(0, 'img', [a[4:] for a in test_filenames]) # <-- why a[4:]?
# submission.insert(0, 'img', [f[8:] for f in test_filenames])
submission.head()

In [None]:
submission.to_csv(subm_name, index=False, compression='gzip')

In [None]:
FileLink(subm_name)

This gets 0.534 on the leaderboard.