Auther: Aditya Vora
# Introduction

This script contains the code to demonstrate how to fine tune the *C3D convolutional features* using theano and lasagne. We can use the pre-trained weights of the network trained on *sports-1m dataset* for any of the applications that we target.  C3D can be used as a general video feature and has shown strong performance. You can find more information in the paper [1].

* [1]: Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, 2015. http://vlg.cs.dartmouth.edu/c3d/c3d_video.pdf

# Notes

* Here we have tried to fine tune the *C3D convolutional features* for our target application which involves two classes which is *walking and running*. Since the original network was trained with caffe which is available on github (https://github.com/facebook/C3D), we here use the model c3d_model.pkl which is open source. The model is downloaded and stored in the *models* directory. 

* The c3d_helper.py module contains the helper function to build the c3d model and set the weights to the network. 

* The data used for training and validation is in lmdb format, created and stored in the data folder. 

In [None]:
# Import general libraries 
import c3d_helper
import cv2
import os
import lmdb
import numpy as np
from datum import Datum4D

In [None]:
# Import theano libraries
import theano 
import theano.tensor as T
dtensor5 = theano.tensor.TensorType(theano.config.floatX, (False,)*5)

In [None]:
# Import lasagne libraries
import lasagne
from lasagne.layers import InputLayer, DenseLayer, NonlinearityLayer
from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer
from lasagne.layers import Pool2DLayer as PoolLayer
from lasagne.nonlinearities import softmax
from lasagne.utils import floatX
from lasagne.regularization import regularize_network_params, l2
from lasagne.init import Constant, GlorotUniform, GlorotNormal

* The DataFetcher class have helper function for the input and output of data during the training and validation phase. The load_data member function is used in order to fetch the required data from the lmdb database which are stored in the data folder. 

In [None]:
# Datafetcher class for input and output of the data during training and validation
class DataFetcher(object):
    def __init__(self,database_name,video_shape,batch_size,dtype='float32'):
        self.db_name = database_name
        self.video_shape = video_shape
        self.batch_size = batch_size
        self.env = lmdb.open(database_name)
        self.txn = self.env.begin()
        self.cursor = self.txn.cursor()
        self.cursor.first()
        self.iterator = iter(self.cursor)
        self.dtype = dtype
        self.epoch = 0
        
    def load_data(self):
        TT, HH, WW = self.video_shape
        X = np.empty((self.batch_size,3) + self.video_shape,dtype=self.dtype)
        y = np.empty((self.batch_size,),dtype=self.dtype)
        crossed_epoch = False
        for n in xrange(self.batch_size):
            try:
                key,value = next(self.iterator)
            except StopIteration:
                self.cursor.first() 
                self.iterator = iter(self.cursor)
                crossed_epoch = True
                self.epoch += 1
                key,value = next(self.iterator)
                
            datum = Datum4D.fromstring(value)
            X[n] = datum.array
            y[n] = datum.label
                
        return X, y, crossed_epoch
                
    def __del__(self):
        self.txn.commit()
        self.env.close()

In [None]:
""" 
Test function to check whether the data flow is appropriate from the lmdb database. It tries to fetch a batch 
of data from the lmdb database and displays the frame of one of the clip in the batch.
""" 
def test():
    import time
    import matplotlib.pyplot as plt    
    db = "./data/ucf-val.lmdb/"
    shape = (16,112,112)
    fetcher = DataFetcher(db,shape,60)
    tic = time.time()
    X, y, _ = fetcher.load_data()
    toc = time.time()
    im = X[12,:,15].transpose(1,2,0) + 127.0
    %matplotlib inline
    plt.imshow(im.astype('uint8'))
    print "Retrieved %d videos in %0.4f milliseconds" % (X.shape[0],1000*(toc-tic))
    print "Average retrieval time: %0.3f ms" % (1000*(toc-tic)/X.shape[0])

In [None]:
test()

In [None]:
# Build the model and load the weights from the pre-trained model. 
net = c3d.build_model()
c3d.set_weights(net['prob'],'./models/c3d_model.pkl')

* The below block of code below serves our purpose of fine-tuning the network. As we can see we are replacing the final layer of the network which have 437 classes of *sports-1m* dataset with our purpose which involves classifying only two classes i.e. walking and running.  

In [None]:
# Change the final layer for our purpose 
output_layer = DenseLayer(net['fc7-1'], num_units=2, nonlinearity=softmax)

In [None]:
# Create a directory to save the results
savepath = os.path.join("results", "ucf"+"-%04d"%(2))    
if not os.path.isdir(savepath):
    os.makedirs(savepath)
elif os.path.isdir(savepath):
    print "Directory of results already exists. Please delete the directory and rerun the command."     

In [None]:
# Initialize the training parameters. Set the learning rate to be low as we do not want to drastically change 
# the weights of the pre-trained model. 
method = "momentum"
lr = 0.00001 
lr_decay = 0.5 
momentum = 0.5 
ephs = 6 
bth = 8 
neph_dcy = 3 
reg = 0.00001
train_db = "./data/ucf-train.lmdb/"
val_db = "./data/ucf-val.lmdb/"
vid_shape = (16,112,112)

In [None]:
input_var = dtensor5('inputs')
target_var = T.ivector('targets') 

In [None]:
# Prediction and loss functions
prediction = lasagne.layers.get_output(output_layer,input_var)
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()      

In [None]:
# Add regularization
weightsl2 = regularize_network_params(output_layer,l2)
loss += weightsl2*reg

In [None]:
# Get all the parameters
params = lasagne.layers.get_all_params(output_layer, trainable=True)

In [None]:
lr_shared = theano.shared(lasagne.utils.floatX(lr))

In [None]:
# Select the update method
if method == 'sgd':
    updates = lasagne.updates.sgd(loss, params, learning_rate=lr_shared)
    print "Training by stochastic gradient descent..."
elif method == 'momentum':
    updates = lasagne.updates.momentum(loss, params, learning_rate=lr_shared, momentum=momentum)
    print "Training by stochastic gradient descent with momentum..."
elif method == 'nesterov_momentum':
    updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=lr_shared, momentum=momentum)
    print "Training with nesterov momentum..."
elif method == 'rmsprop':
    updates = lasagne.updates.rmsprop(loss, params, learning_rate=lr_shared)
    print "Training with rmsprop..."
else: 
    raise NotImplemented("Optimization method %s not implemented"%method)

In [None]:
# Test function
test_prediction = lasagne.layers.get_output(output_layer,input_var, deterministic=True)
test_loss = lasagne.objectives.categorical_crossentropy(test_prediction,target_var)
test_loss = test_loss.mean()
test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),dtype=theano.config.floatX)

In [None]:
# Initialize the train function
train_fn = theano.function([input_var, target_var], loss, updates=updates)

In [None]:
# Initialize the validation function
val_fn = theano.function([input_var, target_var], [test_loss, test_acc])

In [None]:
# Initializa the data objects for train and validation
fetcher_train = DataFetcher(train_db, vid_shape, bth)
fetcher_val = DataFetcher(val_db, vid_shape, bth)

In [None]:
import time
# Finally, launch the training loop.
print("Starting training...")
best_val_acc = 0.0
for epoch in range(ephs):
    # In each epoch, we do a full pass over the training data:
    train_err = 0
    train_batches = 0
    start_time = time.time()
    crossed_epoch_train = False
    while crossed_epoch_train == False:        
        X,y,crossed_epoch_train = fetcher_train.load_data() 
        inputs, targets = X,y
        targets = targets.astype('int32')
        train_err += train_fn(inputs, targets)
        train_batches += 1

    # And a full pass over the validation data:
    val_err = 0
    val_acc = 0
    val_batches = 0
    crossed_epoch_val = False
    while crossed_epoch_val == False:
        X,y, crossed_epoch_val = fetcher_val.load_data()
        inputs, targets = X,y 
        targets = targets.astype('int32')
        err, acc = val_fn(inputs, targets)
        val_err += err
        val_acc += acc
        val_batches += 1
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, ephs, time.time() - start_time))
    print("  training loss:\t\t{:.6f}".format(train_err / train_batches))
    print("  validation loss:\t\t{:.6f}".format(val_err / val_batches))
    print("  validation accuracy:\t\t{:.2f} %".format(
        val_acc / val_batches * 100)) 
        
    if (epoch%neph_dcy == 0):
        new_lr = lr_shared.get_value() * lr_decay       
        print "new learning rate: %f"%new_lr
        lr_shared.set_value(lasagne.utils.floatX(new_lr))            

    if ((val_acc/val_batches) > best_val_acc):
        print "***Best model so far***"
        best_val_acc = (val_acc/val_batches)
        np.savez(os.path.join(savepath,'best_model.npz'), *lasagne.layers.get_all_param_values(output_layer))