## StateFarm Kaggle Challenge

- For this Kaggle Challenge, I will be performing a rigorous analysis of the dataset provided by StateFarm to predict whether or not the image of a driver is in a distracted or non distracted phase.


- StateFarm has provided labelled training data in the form of images of drivers that have been classified in one of 10 different states.


- The states of the drivers are :
    - c0 : Safe Driving (2489 images)
    - c1 : texting - right (2267 images)
    - c2 : talking on the phone - right (2317 images)
    - c3 : texting - left (2346 images)
    - c4 : talking on the phone - left (2326 images)
    - c5 : operating the radio (2312 images)
    - c6 : drinking (2325 images)
    - c7 : reaching behind (2002 images)
    - c8 : hair and makeup (1911 images)
    - c9 : talking to a passenger (2129 images)


- The testing data provided is totally unlabelled as expected.
    

- My goal for this notebook is to demonstrate an intuitive understanding of going about solving a computer vision problem.


- I will be solving this problem by building on top of the Vgg16 model and I will be employing various proven methods that improve accuracy. I will not be going into the mathematical details of approaches, but rather something that can be thought through intuitively such that the process adds up.

In [10]:
import os, sys
current_dir = os.getcwd()
HOME_DIRECTORY = current_dir
DATA_DIRECTORY = current_dir+'/data/statefarm'

from utils import *
from vgg16 import Vgg16

%matplotlib inline

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
Using Theano backend.


## Steps :

- Creating validation + sample sets
- Rearranging image files into respective directories
- Finetuning & Training model
- Generating Predictions
- Validating Predictions
- Submitting to Kaggle

In [95]:
# %cd $DATA_DIRECTORY
%mkdir valid
%mkdir results
%mkdir -p sample/train
%mkdir -p sample/valid
%mkdir -p sample/results
%mkdir -p test/unknown

In [97]:
%cd train/

/home/ubuntu/nbs/data/statefarm/train


In [98]:
for d in glob('c?'):
    os.mkdir('../sample/train/'+d)
    os.mkdir('../sample/valid/'+d)
    os.mkdir('../valid/'+d)

In [102]:
# Separated 1950 out of 22424 images from the training set to the validation set.
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1950):
    os.rename(shuf[i], DATA_DIRECTORY+'/valid/'+shuf[i])

In [121]:
from shutil import copyfile

In [122]:
# Creating sample data from training & validation data to run as a test for quick iteration
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1500): copyfile(shuf[i], '../sample/train/'+shuf[i])

In [123]:
%cd ../valid

/home/ubuntu/nbs/data/statefarm/valid


In [124]:
g = glob('c?/*.jpg')
shuf = np.random.permutation(g)
for i in range(1000): copyfile(shuf[i], '../sample/valid/' + shuf[i])

## Creating Batches

In [128]:
path = "data/statefarm/sample/"
batch_size = 64
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', batch_size = batch_size*2, shuffle=False)

Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.


In [129]:
(val_classes, trn_classes, val_labels, trn_labels, val_filenames, filenames, test_filename)=get_classes(path)

Found 1500 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
Found 0 images belonging to 0 classes.


In [130]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])

In [131]:
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f28426f4d10>

In [132]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
batchnormalization_1 (BatchNormal(None, 3, 224, 224)   6           batchnormalization_input_1[0][0] 
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 150528)        0           batchnormalization_1[0][0]       
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 10)            1505290     flatten_1[0][0]                  
Total params: 1505296
____________________________________________________________________________________________________


In [133]:
np.round(model.predict_generator(batches, batches.N)[:10],2)

array([[ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.]], dtype=float32)

In [134]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f283b1d2990>

In [135]:
model.optimizer.lr=0.001

In [136]:
model.fit_generator(batches, batches.nb_sample, nb_epoch=4, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7f283ae02110>

In [137]:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)

Found 1000 images belonging to 10 classes.


In [138]:
val_res = [model.evaluate_generator(rnd_batches, rnd_batches.nb_sample) for i in range(10)]
np.round(val_res, 2)

array([[ 1.01,  0.69],
       [ 1.  ,  0.69],
       [ 0.98,  0.7 ],
       [ 1.01,  0.69],
       [ 0.98,  0.71],
       [ 1.03,  0.68],
       [ 1.  ,  0.7 ],
       [ 1.  ,  0.7 ],
       [ 1.01,  0.69],
       [ 0.99,  0.71]])