# Action Plan

Create Validation and Sample sets
Rearrange image files into their respective directories
Finetune and Train model
Generate predictions
Validate predictions
Submit predictions to Kaggle



## Create Validation and Sample Sets

In [2]:
#Verify we are in the nbs directory
%pwd

u'/home/ubuntu/courses/deeplearning1/nbs'

In [8]:
#Create references to some commonly referred directories
import os, sys
current_dir = os.getcwd()
NBS_HOME_DIR = current_dir
DATA_HOME_DIR = current_dir + '/data/redux'

In [5]:
#Import modules
from utils import *
from vgg16 import Vgg16

#Setup inline plots
%matplotlib inline

In [9]:
# Create directories
%cd $DATA_HOME_DIR
%mkdir valid
%mkdir results
%mkdir -p sample/train
%mkdir -p sample/test
%mkdir -p sample/valid
%mkdir -p sample/results
%mkdir -p test/unknown

/home/ubuntu/courses/deeplearning1/nbs/data/redux


In [11]:
%cd $DATA_HOME_DIR/train

/home/ubuntu/courses/deeplearning1/nbs/data/redux/train


In [14]:
#Seperate a validation set
g = glob('*.jpg')
shuf = np.random.permutation(g)
for i in range(2000): os.rename(shuf[i], DATA_HOME_DIR+'/valid/'+shuf[i])

In [15]:
from shutil import copyfile

In [17]:
g = glob('*.jpg')
shuf = np.random.permutation(g)
for i in range(200): copyfile(shuf[i], DATA_HOME_DIR+'/sample/train/'+shuf[i])

In [18]:
%cd $DATA_HOME_DIR/valid

/home/ubuntu/courses/deeplearning1/nbs/data/redux/valid


In [19]:
g = glob('*.jpg')
shuf = np.random.permutation(g)
for i in range(50): copyfile(shuf[i], DATA_HOME_DIR+'/sample/valid/'+shuf[i])

## Rearrange image files into respective directories

In [21]:
#Seperate cat and dog images into seperate directories
%cd $DATA_HOME_DIR/valid
%mkdir cats
%mkdir dogs
%mv cat.*.jpg cats/
%mv dog.*.jpg dogs/

%cd $DATA_HOME_DIR/train
%mkdir cats
%mkdir dogs
%mv cat.*.jpg cats/
%mv dog.*.jpg dogs/

%cd $DATA_HOME_DIR/sample/train
%mkdir cats
%mkdir dogs
%mv cat.*.jpg cats/
%mv dog.*.jpg dogs/

%cd $DATA_HOME_DIR/sample/valid
%mkdir cats
%mkdir dogs
%mv cat.*.jpg cats/
%mv dog.*.jpg dogs/


/home/ubuntu/courses/deeplearning1/nbs/data/redux/valid
mkdir: cannot create directory ‘cats’: File exists
mkdir: cannot create directory ‘dogs’: File exists
/home/ubuntu/courses/deeplearning1/nbs/data/redux/train
/home/ubuntu/courses/deeplearning1/nbs/data/redux/sample/train
/home/ubuntu/courses/deeplearning1/nbs/data/redux/sample/valid
mkdir: cannot create directory ‘cats’: File exists
mkdir: cannot create directory ‘dogs’: File exists


In [24]:
# Create single unkown class for test
%cd $DATA_HOME_DIR/test
%mv *jpg unknown/

/home/ubuntu/courses/deeplearning1/nbs/data/redux/test


## Finetune and Train Model

In [25]:
%cd $DATA_HOME_DIR

#set path to sample path if desired
#path = DATA_HOME_DIR + '/'
path = DATA_HOME_DIR + '/sample/'
test_path = DATA_HOME_DIR + '/test/'
results_path = DATA_HOME_DIR + 'results'
train_path = path + '/train/'
valid_path = path + '/valid'

/home/ubuntu/courses/deeplearning1/nbs/data/redux


In [26]:
#import vgg16 helper function
vgg = Vgg16()

In [29]:
#Set constants
batch_size = 64
no_of_epochs = 3

In [31]:
#Finetune the Model
batches = vgg.get_batches(train_path, batch_size=batch_size)
val_batches = vgg.get_batches(valid_path, batch_size=batch_size*2)
vgg.finetune(batches)

vgg.model.optimizer.lr = 0.01

Found 200 images belonging to 2 classes.
Found 50 images belonging to 2 classes.


In [32]:
#Notice we are passing in the validation dataset to the fit() method
#For each epoch we test our model against the validation set
latest_weights_filename = None
for epoch in range(no_of_epochs):
    print "Running epoch: %d" % epoch
    vgg.fit(batches, val_batches, nb_epoch=1)
    latest_weights_filename = 'ft%d.h5' % epoch
    vgg.model.save_weights(results_path+latest_weights_filename)
print "Completed %s fit operations" % no_of_epochs

Running epoch: 0
Epoch 1/1
Running epoch: 1
Epoch 1/1
Running epoch: 2
Epoch 1/1
Completed 3 fit operations


## Generate Predictions

Lets use the new model to make predicitions on the test data set

In [None]:
batches, preds = vgg.test(test_path, batch_size=batch_size*2)

Found 12500 images belonging to 1 classes.


In [None]:
print preds[:5]

filenames = batches.filenames
print filenames [:5]

In [None]:
#You can verify the column ordering by viewing some images
from PIL import Image
Image.open(test_path + filenames[2])

In [None]:
#Save our test results arrays so we can use them again later
save_array(results_path + 'test_preds.dat', preds)
save_array(results_path + 'filenames.dat', filenames)

## Validate Predictions

As well as looking at the overall metrics, it's also a good idea to look at examples of each of:

    1. A few correct labels at random
    2. A few incorrect labels at random
    3. The most correct labels of each class (ie those with highest probability that are correct)
    4. The most incorrect labels of each class (ie those with highest probability that are incorrect)
    5. The most uncertain labels (ie those with probability closest to 0.5).

Let's see what we can learn from these examples. (In general, this is a particularly useful technique for debugging problems in the model. However, since this model is so simple, there may not be too much to learn at this stage.)

Calculate predictions on validation set, so we can find correct and incorrect examples:


In [None]:
vgg.model.load_weights(results_path+latest_weights_filename)

In [None]:
val_batches, probs = vgg.test(valid_path, batch_size=batch_size)

In [None]:
filenames = val_batches.filenames
expected_labels = val_batches.classes #0 or 1

#Round our predictions to 0/1 to generate labels
our_predictions = probs[:,0]
our_labels = np.round(1-our_predictions)