# State Farm Distracted Driver Detection - Resnet50

This notebook contains the absolute final attempt at the Kaggle State Farm Distracted Driver Detection competition using the Resnet50 model. The purpose is to train the best possible model.

## Initial Setup

Import libraries and functions for future use.

In [1]:
# Plots displayed inline in notebook
%matplotlib inline

# Make help libraries available
import sys

sys.path.append('D:/anlaursen/libraries')

# Set visible devices, so as to just use a single GPU.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [2]:
import numpy as np
import pandas as pd
import gc

from kerastools.resnet50 import Resnet50
from kerastools.utils import get_batches, save_array, load_array, get_classes, do_clip

from keras.models import Model
from keras.optimizers import Adam
from keras.layers import Dense, Dropout, Input
from keras.preprocessing import image

Using TensorFlow backend.


## Define model

We setup our initial Resnet50 model

In [3]:
resnet = Resnet50()
resnet.model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 224, 224, 3)   0                                            
____________________________________________________________________________________________________
lambda_1 (Lambda)                (None, 224, 224, 3)   0           input_1[0][0]                    
____________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D) (None, 230, 230, 3)   0           lambda_1[0][0]                   
____________________________________________________________________________________________________
conv1 (Conv2D)                   (None, 112, 112, 64)  9472        zero_padding2d_1[0][0]           
___________________________________________________________________________________________

## Setup batches

We define out validation and training badges for modelling

In [4]:
batch_size = 32

#path = ''
path = 'sample/'

train_batches = resnet.get_batches(path + 'train', batch_size = batch_size)
val_batches = resnet.get_batches(path + 'valid', batch_size = batch_size, shuffle = False)

Found 1000 images belonging to 10 classes.
Found 100 images belonging to 10 classes.


## Finetune model - Sample

We need to adjust the standard VGG model to our new input with 10 classes, so we finetune it.

In [5]:
resnet.finetune(train_batches)
resnet.model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 224, 224, 3)   0                                            
____________________________________________________________________________________________________
lambda_1 (Lambda)                (None, 224, 224, 3)   0           input_1[0][0]                    
____________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D) (None, 230, 230, 3)   0           lambda_1[0][0]                   
____________________________________________________________________________________________________
conv1 (Conv2D)                   (None, 112, 112, 64)  9472        zero_padding2d_1[0][0]           
___________________________________________________________________________________________

We train the model using the default learning rate of 0.001 for a single epoch

In [6]:
resnet.fit_batch(train_batches, val_batches, 1)

Epoch 1/1


We see that the accuracy increases fine on the sample, so we increase the learning rate.

In [7]:
resnet.model.optimizer.lr = 0.1

resnet.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Try 4 more epochs with lower learning rate.

In [8]:
resnet.model.optimizer.lr = 0.001

resnet.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Seems, this is as far as we can get on the sample data set. A pretty good base line in the area of 0.5 - 0.66.

## Finetune model - Full data

We continue our finetuning on the full data set.

In [9]:
path = ''

train_batches = resnet.get_batches(path + 'train', batch_size = batch_size)
val_batches = resnet.get_batches(path + 'valid', batch_size = batch_size, shuffle = False)

Found 19624 images belonging to 10 classes.
Found 2800 images belonging to 10 classes.


We start with a single epoch

In [12]:
resnet.fit_batch(train_batches, val_batches, 1)

Epoch 1/1


We see huge overfitting problems. We increase the learning rate and see, where that takes us.

In [13]:
resnet.model.optimizer.lr = 0.1

resnet.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


And then we lower the learning rate again, and see where we end up.

In [14]:
resnet.model.optimizer.lr = 0.001

resnet.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


So we get near perfect accuracy, but are overfitting quite a lot. Let's save the weights and try a different approach.

In [16]:
resnet.model.save_weights('models/base_resnet50.h5')

## Resnet precompute features
To experiment with different architectures, as well as doing knowledge destillation and pseudolabelling, we precompute the convolutional features for each layer. We start by defining a new model with no dense layers.

In [3]:
resnet_conv = Resnet50(include_top = False)

Let's check out the model

In [4]:
resnet_conv.model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 224, 224, 3)   0                                            
____________________________________________________________________________________________________
lambda_1 (Lambda)                (None, 224, 224, 3)   0           input_1[0][0]                    
____________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D) (None, 230, 230, 3)   0           lambda_1[0][0]                   
____________________________________________________________________________________________________
conv1 (Conv2D)                   (None, 112, 112, 64)  9472        zero_padding2d_1[0][0]           
___________________________________________________________________________________________

Define new batches with proper dimensions, such as to get the right number of pre computed convolutional terms .

In [5]:
path = ''

train_batches = get_batches(path + 'train', batch_size = 44, target_size = (224, 224), shuffle = False)

valid_batches = get_batches(path + 'valid', batch_size = 50, target_size = (224, 224), shuffle = False)

test_batches = get_batches(path + 'test', batch_size = 2, target_size = (224, 224), shuffle = False, class_mode = None)

Found 19624 images belonging to 10 classes.
Found 2800 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.


We also extract labels and classes for each dataset.

In [6]:
(val_classes, trn_classes, val_labels, trn_labels, 
 val_filenames, filenames, test_filenames) = get_classes(path)

Found 19624 images belonging to 10 classes.
Found 2800 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.


We then pre-compute each of our datasets and save the numpy arrays. This eats a lot of memory on the poor AWS instance, so after each data computation, we do some cleanup to realease the memory. We save and load using bcolz, as it utilises great compression and does I/O very fast.

In [11]:
conv_feat = resnet_conv.model.predict_generator(train_batches, np.int(train_batches.samples / train_batches.batch_size))
save_array(path + 'results/conv_computed/conv_feat_resnet.dat', conv_feat)

del conv_feat
gc.collect()

20

In [12]:
conv_val_feat = resnet_conv.model.predict_generator(valid_batches, np.int(valid_batches.samples / valid_batches.batch_size))
save_array(path + 'results/conv_computed/conv_val_feat_resnet.dat', conv_val_feat)

del conv_val_feat
gc.collect()

0

In [13]:
conv_test_feat = resnet_conv.model.predict_generator(test_batches, np.int(test_batches.samples / test_batches.batch_size))
save_array(path + 'results/conv_computed/conv_test_feat_resnet.dat', conv_test_feat)

del conv_test_feat
gc.collect()

0

And finally, we can load the three feature sets

In [7]:
conv_feat = load_array('results/conv_computed/conv_feat_resnet.dat')
conv_val_feat = load_array('results/conv_computed/conv_val_feat_resnet.dat')
conv_test_feat = load_array('results/conv_computed/conv_test_feat_resnet.dat')

## VGG model with precomputed augmentation

Precompute a bunch of random images

In [7]:
gen_t = image.ImageDataGenerator(rotation_range = 15,
                                 height_shift_range = 0.05,
                                 shear_range = 0.1,
                                 channel_shift_range = 20,
                                 width_shift_range = 0.1)

path = ''
da_batches = get_batches(path + 'train',
                            gen_t,
                            batch_size = 11,
                            shuffle = True,
                            target_size = (224, 224))
val_batches = get_batches(path + 'valid', batch_size = 25, shuffle = False)

Found 19624 images belonging to 10 classes.
Found 2800 images belonging to 10 classes.


We then make 10 full cycles (epochs) of the training data, precomputing the convolutional layer on the augmented images.

In [8]:
da_conv_feat = resnet_conv.model.predict_generator(da_batches, np.int(da_batches.samples / da_batches.batch_size) * 10)
save_array(path + 'results/conv_computed/da_conv_feat_resnet.dat', da_conv_feat)

del da_conv_feat
gc.collect()

InternalError: Dst tensor is not initialized.
	 [[Node: activation_49/Relu/_1383 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_4501_activation_49/Relu", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

We can then load the pre-compuited and augmented data.

In [None]:
da_conv_feat = load_array(path + 'results/conv_computed/da_conv_feat_resnet.dat')

And then concatenate it with our existing precomputed training data.

In [None]:
da_conv_feat = np.concatenate([da_conv_feat, conv_feat])

We now have 11 times as many features, but luckily order is preserved, so we can just multiply the labels with 11.

In [None]:
da_trn_labels = np.concatenate([trn_labels] * 11)

We can then define a dense layer model.

In [None]:
def get_dense_resnet(base_model, p):
    
    inputs = Input(shape = base_model.layers[-1].output_shape[1:])
        
    dense_layer = Dropout(p)(inputs
    Dense(10, activation='softmax')(dense_layer)                         
                             
    dense_layer = Dense(10, activation='softmax')(dense_layer)
    
    model = Model(inputs = inputs, outputs = dense_layer)
    
    return model
    
# Define level of dropout
p = 0.2

# Define batch size
batch_size = 32

Get the dense model.

In [58]:
dense_resnet = get_dense_resnet(resnet_conv.model, p)

dense_resnet.compile(optimizer = Adam(lr = 0.0001),
                     loss = 'categorical_crossentropy',
                     metrics = ['accuracy'])

dense_resnet.fit(x = da_conv_feat,
                 y = da_trn_labels,
                 batch_size = batch_size,
                 epochs = 1,
                 validation_data = (conv_val_feat, val_labels))

Train on 215864 samples, validate on 2800 samples
Epoch 1/1


<keras.callbacks.History at 0x1c182717dd8>

Then train the model using the augmented batches. First run some epochs at default learning rate.

In [59]:
dense_resnet.fit(x = da_conv_feat,
                 y = da_trn_labels,
                 batch_size = batch_size,
                 epochs = 4,
                 validation_data = (conv_val_feat, val_labels))

Train on 215864 samples, validate on 2800 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1c182172c50>

In [60]:
dense_resnet.fit(x = da_conv_feat,
                 y = da_trn_labels,
                 batch_size = batch_size,
                 epochs = 20,
                 validation_data = (conv_val_feat, val_labels))

Train on 215864 samples, validate on 2800 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1c181fd6978>

Then increase the learning rate and run some more epochs

In [61]:
dense_resnet.optimizer.lr = 0.001

dense_resnet.fit(x = da_conv_feat,
                 y = da_trn_labels,
                 batch_size = batch_size,
                 epochs = 20,
                 validation_data = (conv_val_feat, val_labels))

Train on 215864 samples, validate on 2800 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
 35424/215864 [===>..........................] - ETA: 23s - loss: 2.2591 - acc: 0.1740

KeyboardInterrupt: 

Finally decrease the learning rate and run some more epochs.

In [32]:
resnet.model.optimizer.lr = 0.001

resnet.fit_batch(train_batches, val_batches, 20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


## Pseudolabeling

We're going to try using a combination of [pseudo labeling](http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf) and [knowledge distillation](https://arxiv.org/abs/1503.02531) to allow us to use unlabeled data (i.e. do semi-supervised learning). For our initial experiment we'll use the validation set as the unlabeled data, so that we can see that it is working without using the test set. Afterwards we add the test set as well.

In [22]:
val_pseudo = resnet.predict(val_batches, batch_size = 50)

We concatenate thse pseudo labels with our training labels

In [24]:
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])
comb_feat = np.concatenate([da_conv_feat, conv_val_feat])

And train our model using the extended data set.