# State Farm Distracted Driver Detection - VGG16 Continued

This notebook contains the final attempt at the Kaggle State Farm Distracted Driver Detection competition using the VGG model. The purpose is to train the best possible base VGG model, as well as testing the external vgg16, utils and plot libraries.

## Initial Setup

Import libraries and functions for future use.

In [1]:
# Plots displayed inline in notebook
%matplotlib inline

# Make help libraries available
import sys

#sys.path.append('/home/ubuntu/personal-libraries')
sys.path.append('D:/anlaursen/libraries')

# Set visible devices, so as to just use a single GPU.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [2]:
import numpy as np
import pandas as pd
import gc

from kerastools.vgg16 import Vgg16
from kerastools.utils import get_batches, save_array, load_array, get_classes, do_clip

from keras.models import Model, Sequential
from keras.layers import Input, Dense, BatchNormalization, Flatten, Dropout
from keras.layers.convolutional import MaxPooling2D
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing import image

Using TensorFlow backend.


## Define model

We setup our initial VGG16 model

In [3]:
vgg = Vgg16()
vgg.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
norm_layer (Lambda)          (None, 224, 224, 3)       0         
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 226, 226, 3)       0         
_________________________________________________________________
conv_layer_1_0 (Conv2D)      (None, 224, 224, 64)      1792      
_________________________________________________________________
zero_padding2d_2 (ZeroPaddin (None, 226, 226, 64)      0         
_________________________________________________________________
conv_layer_1_1 (Conv2D)      (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 112, 112, 64)      0         
__________

## Setup batches

We define out validation and training badges for modelling

In [4]:
batch_size = 32

#path = ''
path = 'sample/'

train_batches = vgg.get_batches(path + 'train', batch_size = batch_size)
val_batches = vgg.get_batches(path + 'valid', batch_size = batch_size, shuffle = False)

Found 1000 images belonging to 10 classes.
Found 100 images belonging to 10 classes.


## Finetune model - Sample

We need to adjust the standard VGG model to our new input with 10 classes, so we finetune it.

In [5]:
vgg.finetune(train_batches)
vgg.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
norm_layer (Lambda)          (None, 224, 224, 3)       0         
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 226, 226, 3)       0         
_________________________________________________________________
conv_layer_1_0 (Conv2D)      (None, 224, 224, 64)      1792      
_________________________________________________________________
zero_padding2d_2 (ZeroPaddin (None, 226, 226, 64)      0         
_________________________________________________________________
conv_layer_1_1 (Conv2D)      (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 112, 112, 64)      0         
__________

We train the model using the default learning rate of 0.001 for a single epoch

In [6]:
vgg.fit_batch(train_batches, val_batches, 1)

Epoch 1/1


We see that the accuracy increases fine on the sample, so we increase the learning rate.

In [7]:
vgg.model.optimizer.lr = 0.01

vgg.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


It actually seems to be generalising nicely. We try 4 more epochs.

In [8]:
vgg.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Damn. Our valuation accuracy is destroyed. Try 4 more epochs with lower learning rate.

In [9]:
vgg.model.optimizer.lr = 0.001

vgg.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Seems, this is as far as we can get on the sample data set. A pretty good base line in the area of 0.5

## Finetune model - Full data

We continue our finetuning on the full data set.

In [10]:
path = ''

train_batches = vgg.get_batches(path + 'train', batch_size = batch_size)
val_batches = vgg.get_batches(path + 'valid', batch_size = batch_size, shuffle = False)

Found 19624 images belonging to 10 classes.
Found 2800 images belonging to 10 classes.


We start with a single epoch

In [11]:
vgg.fit_batch(train_batches, val_batches, 1)

Epoch 1/1


We increase the learning rate and see, where that takes us.

In [12]:
vgg.model.optimizer.lr = 0.1

vgg.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


And then we lower the learning rate again, and see where we end up.

In [13]:
vgg.model.optimizer.lr = 0.001

vgg.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


The valuation accuracy is very fluctuating, and generally we are overfitting. Lets try and make more layers trainable and see, if that helps things along.

In [14]:
layers = vgg.model.layers
# Get the index of the first dense layer...
first_dense_idx = [index for index, layer in enumerate(layers) if type(layer) is Dense][0]
# ...and set this and all subsequent layers to trainable
for layer in layers[first_dense_idx:]: layer.trainable = True

And then we rerun the training. First one epoch with low learning rate.

In [15]:
vgg.fit_batch(train_batches, val_batches, 1)

Epoch 1/1


Then four epochs with a higher learning rate.

In [16]:
vgg.model.optimizer.lr = 0.1

vgg.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


And then four epochs with a lower learning rate again.

In [17]:
vgg.model.optimizer.lr = 0.001

vgg.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


That did little to improve things. Let's try an even lower learning rate.

In [18]:
vgg.model.optimizer.lr = 0.00001

vgg.fit_batch(train_batches, val_batches, 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


We're not improving at all. Let's save the weights and try a different approach.

In [19]:
vgg.model.save_weights('models/base_vgg16.h5')

## Improved VGG

We continue using the VGG16 network, but attempt to improve it. That is we want to keep the pretrained convolutional layers fixed, but use a new architecture for the dense layers.

We start by defining a new VGG16() model.

In [20]:
vgg = Vgg16()
vgg.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
norm_layer (Lambda)          (None, 224, 224, 3)       0         
_________________________________________________________________
zero_padding2d_14 (ZeroPaddi (None, 226, 226, 3)       0         
_________________________________________________________________
conv_layer_1_0 (Conv2D)      (None, 224, 224, 64)      1792      
_________________________________________________________________
zero_padding2d_15 (ZeroPaddi (None, 226, 226, 64)      0         
_________________________________________________________________
conv_layer_1_1 (Conv2D)      (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 112, 112, 64)      0         
__________

We then proceed to find the last max pooling layer of the model.

In [21]:
# Define convolutional layers
last_conv_idx = [i for i, l in enumerate(vgg.model.layers) if type(l) is MaxPooling2D][-1]
conv_layers = vgg.model.layers[:last_conv_idx + 1]

We can then define a model using only the convolutional layers.

In [22]:
conv_model = Sequential(conv_layers)
conv_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
norm_layer (Lambda)          (None, 224, 224, 3)       0         
_________________________________________________________________
zero_padding2d_14 (ZeroPaddi (None, 226, 226, 3)       0         
_________________________________________________________________
conv_layer_1_0 (Conv2D)      (None, 224, 224, 64)      1792      
_________________________________________________________________
zero_padding2d_15 (ZeroPaddi (None, 226, 226, 64)      0         
_________________________________________________________________
conv_layer_1_1 (Conv2D)      (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 112, 112, 64)      0         
__________

The idea is now, that we want to pre-computer all of our data through the convolutional layers. This will drastically reduce the training time, once we start experimenting with dense model architecture.

We start by defining our batches.

In [23]:
path = ''

train_batches = get_batches(path + 'train', batch_size = 44, target_size = (224, 224), shuffle = False)

valid_batches = get_batches(path + 'valid', batch_size = 50, target_size = (224, 224), shuffle = False)

test_batches = get_batches(path + 'test', batch_size = 2, target_size = (224, 224), shuffle = False, class_mode = None)

Found 19624 images belonging to 10 classes.
Found 2800 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.


We also extract labels and classes for each dataset.

In [24]:
(val_classes, trn_classes, val_labels, trn_labels, 
 val_filenames, filenames, test_filenames) = get_classes(path)

Found 19624 images belonging to 10 classes.
Found 2800 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.


We then pre-compute each of our datasets and save the numpy arrays. This eats a lot of memory on the poor AWS instance, so after each data computation, we do some cleanup to realease the memory. We save and load using bcolz, as it utilises great compression and does I/O very fast.

In [25]:
conv_feat = conv_model.predict_generator(train_batches, np.int(train_batches.samples / train_batches.batch_size))
save_array(path + 'results/conv_computed/conv_feat.dat', conv_feat)

del conv_feat
gc.collect()

5

In [26]:
conv_val_feat = conv_model.predict_generator(valid_batches, np.int(valid_batches.samples / valid_batches.batch_size))
save_array(path + 'results/conv_computed/conv_val_feat.dat', conv_val_feat)

del conv_val_feat
gc.collect()

0

In [27]:
conv_test_feat = conv_model.predict_generator(test_batches, np.int(test_batches.samples / test_batches.batch_size))
save_array(path + 'results/conv_computed/conv_test_feat.dat', conv_test_feat)

del conv_test_feat
gc.collect()

0

And finally, we can load the three feature sets

In [28]:
conv_feat = load_array('results/conv_computed/conv_feat.dat')
conv_val_feat = load_array('results/conv_computed/conv_val_feat.dat')
conv_test_feat = load_array('results/conv_computed/conv_test_feat.dat')

## VGG model with batchnorm dense laysers on pretrained conv layers

Since we've pre-computed the output of the last convolutional layer, we need to create a network that takes that as input, and predicts our 10 classes. Let's try using a simplified version of VGG's dense layers.

In [29]:
def batch_norm_model(base_model, p):
    """
    Function providing a more modern dense layer to the VGG16.
    I.e. it utilizes batch normalisation.
    
    Args:
        - base_model: A keras model object.
        - p: Total level of dropout (will be devided by two in first layers)
        
    Returns:
        - A keras model object ready to be compiled.
    """
    
    inputs = Input(shape = base_model[-1].output_shape[1:])
        
    flat_layer = Flatten()(inputs)
        
    dense_layer_1 = Dropout(p / 2)(flat_layer)
    dense_layer_1 = Dense(128, activation='relu')(dense_layer_1)
    dense_layer_2 = BatchNormalization()(dense_layer_1)
    dense_layer_2 = Dropout(p / 2)(dense_layer_2)
    dense_layer_2 = Dense(128, activation='relu')(dense_layer_2)
    dense_layer_3 = BatchNormalization()(dense_layer_2)
    dense_layer_3 = Dropout(p)(dense_layer_3)
    dense_layer_3 = Dense(10, activation='softmax')(dense_layer_3)
    
    model = Model(inputs = inputs, outputs = dense_layer_3)
    
    return model

# Set level of dropout
p = 0.8

# Set batch size
batch_size = 32

Lets define and compile the model. We run a single epoch on the default learning rate.

In [30]:
bn_model = batch_norm_model(conv_layers, p)

bn_model.compile(optimizer = Adam(lr = 0.001),
                 loss = 'categorical_crossentropy',
                 metrics = ['accuracy'])

bn_model.fit(x = conv_feat,
             y = trn_labels,
             batch_size = batch_size,
             epochs = 1,
             validation_data = (conv_val_feat, val_labels))

Train on 19624 samples, validate on 2800 samples
Epoch 1/1


<keras.callbacks.History at 0x1a485aa3f28>

So very fast(!), we see a major improvement over the base VGG16 model. Lets continue running for a few more epochs at a lower learning rate.

In [31]:
bn_model.optimizer.lr = 0.0001

bn_model.fit(x = conv_feat,
             y = trn_labels,
             batch_size = batch_size,
             epochs = 4,
             validation_data = (conv_val_feat, val_labels))

Train on 19624 samples, validate on 2800 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1a483fb96a0>

So we are starting to overfit quite a bit. We have pretty much done as much as we could with the base model, in terms of regularisation. We have added dropout and batchnorm. Time to take a look and see if we can improve the input to the model.

## VGG model with precomputed augmentation and dropout.

We precompute some augmented data in order to reduce the overfitting of out model. We start by setting the level of preprocessing and then define a new bunch of batches.

In [32]:
gen_t = image.ImageDataGenerator(rotation_range = 15,
                                 height_shift_range = 0.05,
                                 shear_range = 0.1,
                                 channel_shift_range = 20,
                                 width_shift_range = 0.1)

da_batches = get_batches(path + 'train',
                         gen_t,
                         batch_size = 44,
                         shuffle = False,
                         target_size = (224, 224))

Found 19624 images belonging to 10 classes.


We then make 5 full cycles (epochs) of the training data, precomputing the convolutional layer on the augmented images.

In [33]:
da_conv_feat = conv_model.predict_generator(da_batches, np.int(da_batches.samples / da_batches.batch_size) * 5)
save_array(path + 'results/conv_computed/da_conv_feat.dat', da_conv_feat)

del da_conv_feat
gc.collect()

0

We can then load the pre-compuited and augmented data.

In [34]:
da_conv_feat = load_array(path + 'results/conv_computed/da_conv_feat.dat')

And then concatenate it with our existing precomputed training data.

In [35]:
da_conv_feat = np.concatenate([da_conv_feat, conv_feat])

We now have 6 times as many features, but luckily order is preserved, so we can just multiply the labels with 6.

In [36]:
da_trn_labels = np.concatenate([trn_labels] * 6)

We can then run the model again, but we try use use bigger dense layers and some more dropout.

In [37]:
def batch_norm_model_bigger(base_model, p):
    
    inputs = Input(shape = base_model[-1].output_shape[1:])
        
    flat_layer = Flatten()(inputs)
        
    dense_layer_1 = Dropout(p)(flat_layer)
    dense_layer_1 = Dense(256, activation='relu')(dense_layer_1)
    dense_layer_2 = BatchNormalization()(dense_layer_1)
    dense_layer_2 = Dropout(p)(dense_layer_2)
    dense_layer_2 = Dense(256, activation='relu')(dense_layer_2)
    dense_layer_3 = BatchNormalization()(dense_layer_2)
    dense_layer_3 = Dropout(p)(dense_layer_3)
    dense_layer_3 = Dense(10, activation='softmax')(dense_layer_3)
    
    model = Model(inputs = inputs, outputs = dense_layer_3)
    
    return model

# Define level of dropout
p = 0.8

# Define batch size
batch_size = 32

Re then compile the model, and run it for a single epoch on the augmented data at the defulat learning rate.

In [38]:
bn_model_bigger = batch_norm_model_bigger(conv_layers, p)

bn_model_bigger.compile(optimizer = Adam(lr = 0.001),
                        loss = 'categorical_crossentropy',
                        metrics = ['accuracy'])

bn_model_bigger.fit(x = da_conv_feat,
                    y = da_trn_labels,
                    batch_size = batch_size,
                    epochs = 1,
                    validation_data = (conv_val_feat, val_labels))

Train on 117744 samples, validate on 2800 samples
Epoch 1/1


<keras.callbacks.History at 0x1a499c258d0>

So this model is a bit heavier. Already we are underfitting quite a bit. We increase the learning rate and run for 4 epochs.

In [39]:
bn_model_bigger.optimizer.lr = 0.01

bn_model_bigger.fit(x = da_conv_feat,
                    y = da_trn_labels,
                    batch_size = batch_size,
                    epochs = 4,
                    validation_data = (conv_val_feat, val_labels))

Train on 117744 samples, validate on 2800 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1a483e23358>

Now our training and valuation accuracy seems more ballanced. Let's lower the learning rate again and train for 4 more epochs.

In [40]:
bn_model_bigger.optimizer.lr = 0.0001

bn_model_bigger.fit(x = da_conv_feat,
                    y = da_trn_labels,
                    batch_size = batch_size,
                    epochs = 4,
                    validation_data = (conv_val_feat, val_labels))

Train on 117744 samples, validate on 2800 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1a485b2aef0>

Seems like we are levelling off at this point. Let's attempt a final improvement by using pseudo labelling.

## Pseudolabeling

We're going to try using a combination of [pseudo labeling](http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf) and [knowledge distillation](https://arxiv.org/abs/1503.02531) to allow us to use unlabeled data (i.e. do semi-supervised learning). For our initial experiment we'll use the validation set as the unlabeled data, so that we can see that it is working without using the test set. Afterwards we add the test set as well.

In [41]:
val_pseudo = bn_model_bigger.predict(conv_val_feat, batch_size = 50)

We concatenate thse pseudo labels with our training labels

In [42]:
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])
comb_feat = np.concatenate([da_conv_feat, conv_val_feat])

And train our model using the extended data set.

In [43]:
bn_model_bigger.optimizer.lr = 0.001

bn_model_bigger.fit(x = comb_feat,
                    y = comb_pseudo,
                    batch_size = batch_size,
                    epochs = 1,
                    validation_data = (conv_val_feat, val_labels))

Train on 120544 samples, validate on 2800 samples
Epoch 1/1


<keras.callbacks.History at 0x1a4836af630>

We do not really see much of an improvement. Let's try 4 more epochs.

In [44]:
bn_model_bigger.fit(x = comb_feat,
                    y = comb_pseudo,
                    batch_size = batch_size,
                    epochs = 4,
                    validation_data = (conv_val_feat, val_labels))

Train on 120544 samples, validate on 2800 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1a499a40f98>

Now we are crossing the 0.9 threshold of accuracy. Lets lower the learning rate and train for 4 more epochs and see where that gets us.

In [45]:
bn_model_bigger.optimizer.lr = 0.00001

bn_model_bigger.fit(x = comb_feat,
                    y = comb_pseudo,
                    batch_size = batch_size,
                    epochs = 4,
                    validation_data = (conv_val_feat, val_labels))

Train on 120544 samples, validate on 2800 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1a499a34908>

So we do see a pretty nice improvement. Enough to warrent us trying with the entire test set.

In [46]:
test_pseudo = bn_model_bigger.predict(conv_test_feat, batch_size = 2)

We concatenate thse pseudo labels with our training and valuation pseudo labels

In [47]:
comb_pseudo = np.concatenate([comb_pseudo, test_pseudo])
comb_feat = np.concatenate([comb_feat, conv_test_feat])

And train our model using the extended data set.

In [48]:
bn_model_bigger.optimizer.lr = 0.001

bn_model_bigger.fit(x = comb_feat,
                    y = comb_pseudo,
                    batch_size = batch_size,
                    epochs = 1,
                    validation_data = (conv_val_feat, val_labels))

Train on 200270 samples, validate on 2800 samples
Epoch 1/1


<keras.callbacks.History at 0x1a48359b630>

Hrm, too early to say, but valuation accuracy holds still. Let's run 4 more epochs.

In [49]:
bn_model_bigger.fit(x = comb_feat,
                    y = comb_pseudo,
                    batch_size = batch_size,
                    epochs = 4,
                    validation_data = (conv_val_feat, val_labels))

Train on 200270 samples, validate on 2800 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1a485843eb8>

Let's lower the learning rate again and run 4 more epochs.

In [50]:
bn_model_bigger.optimizer.lr = 0.00001

bn_model_bigger.fit(x = comb_feat,
                    y = comb_pseudo,
                    batch_size = batch_size,
                    epochs = 4,
                    validation_data = (conv_val_feat, val_labels))

Train on 200270 samples, validate on 2800 samples
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x1a483f06d30>

We are improving slowly. Let's run for 10 more epochs.

In [51]:
bn_model_bigger.fit(x = comb_feat,
                    y = comb_pseudo,
                    batch_size = batch_size,
                    epochs = 10,
                    validation_data = (conv_val_feat, val_labels))

Train on 200270 samples, validate on 2800 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1a499c39518>

Having trained the model, we save the weights.

In [52]:
bn_model_bigger.save_weights('models/batchnorm_vgg16.h5')

## Submitting to Kaggle

We finally submit the improved model to Kaggle.

We start by finding the optimal level of clipping.

In [48]:
valid_batches_pred = get_batches(path + 'valid', batch_size = 50, target_size = (224, 224), shuffle = False, class_mode = None)
conv_val_feat_pred = conv_model.predict_generator(valid_batches_pred, np.int(valid_batches_pred.samples / valid_batches_pred.batch_size))

val_predictions = bn_model_bigger.predict(conv_val_feat_pred, batch_size = 50)

Found 2800 images belonging to 10 classes.


In [57]:
def do_clip(arr, mx): return np.clip(arr, (1 - mx) / 9, mx)

We then proceed to determine the optimal level of clipping using the validation data set.

In [58]:
test_clip = []
for i in np.arange(0.70, 1.0, 0.01):
    test_clip.append([i, categorical_crossentropy(val_labels, do_clip(val_predictions, i)).eval().mean()])

min(test_clip, key = lambda x: x[1])

[1.0000000000000002, 0.48164355679327142]

Here it is said, that no clipping is best. Weird. Lets submit two. One clipped and one not clipped. First we compute the predictions.

In [59]:
test_predictions = bn_model_bigger.predict(conv_test_feat, batch_size = 2)

Define classes

In [61]:
classes = sorted(valid_batches.class_indices, key = valid_batches.class_indices.get)

Then we make a cliping based on earlier experience

In [62]:
sumbit_pred = do_clip(test_predictions, 0.89)

Then we prepare a submission without clipping.

In [65]:
submission_no_clip = pd.DataFrame(test_predictions, columns = classes)
submission_no_clip.insert(0, 'img', [a[8:] for a in test_batches.filenames])
submission_no_clip.head()

Unnamed: 0,img,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9
0,img_81601.jpg,0.046099,0.006723,0.004307,0.000682,0.001594,0.001567,0.026646,0.020867,0.016716,0.874798
1,img_14887.jpg,0.68964,0.007839,0.000621,0.002267,0.001608,0.003189,0.001825,0.000393,0.005292,0.287327
2,img_62885.jpg,0.005301,0.000154,0.000257,0.017661,0.973055,0.000493,0.000514,1.6e-05,0.000978,0.001571
3,img_45125.jpg,0.002003,0.007311,0.017353,0.000462,0.002328,0.000509,0.656182,0.006239,0.302687,0.004926
4,img_22633.jpg,0.138601,0.055678,0.011123,0.001665,0.004929,0.012565,0.022692,0.007558,0.186375,0.558814


And a submission with clipping

In [66]:
submission_clip = pd.DataFrame(sumbit_pred, columns = classes)
submission_clip.insert(0, 'img', [a[8:] for a in test_batches.filenames])
submission_clip.head()

Unnamed: 0,img,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9
0,img_81601.jpg,0.046099,0.012222,0.012222,0.012222,0.012222,0.012222,0.026646,0.020867,0.016716,0.874798
1,img_14887.jpg,0.68964,0.012222,0.012222,0.012222,0.012222,0.012222,0.012222,0.012222,0.012222,0.287327
2,img_62885.jpg,0.012222,0.012222,0.012222,0.017661,0.89,0.012222,0.012222,0.012222,0.012222,0.012222
3,img_45125.jpg,0.012222,0.012222,0.017353,0.012222,0.012222,0.012222,0.656182,0.012222,0.302687,0.012222
4,img_22633.jpg,0.138601,0.055678,0.012222,0.012222,0.012222,0.012565,0.022692,0.012222,0.186375,0.558814


Finally we save the two submissions

In [67]:
submission_file_name_no_clip = 'results/augmented-pseudo-vgg-no-clip.gz'
submission_no_clip.to_csv(submission_file_name_no_clip, index = False, compression = 'gzip')

submission_file_name_clip = 'results/augmented-pseudo-vgg-clip.gz'
submission_clip.to_csv(submission_file_name_clip, index = False, compression = 'gzip')

In [70]:
from IPython.display import FileLink
FileLink('results/augmented-pseudo-vgg-no-clip.gz')

In [71]:
FileLink('results/augmented-pseudo-vgg-clip.gz')

Turns out that in this case, the no clipping submission actually performed best, by a absolute 0.03.