# Gesture Recognition
In this group project, you are going to build a 3D Conv model that will be able to predict the 5 gestures correctly. Please import the following libraries to get started.

<h2><strong><em>Let's import all the required libraries and modules</em></strong></h2>

In [1]:
import numpy as np
import os
from scipy.misc import imread, imresize
import datetime
import os
import cv2
import matplotlib.pyplot as plt
import random as rn
from keras import backend as K
import tensorflow as tf
%matplotlib inline

Using TensorFlow backend.


In [2]:
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras.losses import categorical_crossentropy
from keras.optimizers import Adam

from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization
from keras.layers import Activation, Dropout, ZeroPadding3D
from keras.layers.convolutional import Conv2D, MaxPooling3D, Conv3D, MaxPooling2D
from keras.layers.recurrent import LSTM

In [3]:
np.random.seed(30)
rn.seed(30)
tf.set_random_seed(30)

In [4]:
train_doc = np.random.permutation(open('./Project_data/train.csv').readlines())
val_doc = np.random.permutation(open('./Project_data/val.csv').readlines())

<h4><span style="color: #ff0000;">Batch Size</span></h4>

In [5]:
batch_size = 10 #experiment with the batch size

## Generator code

## We are building using a test generator class for easier understanding

In [6]:
class DataGenerator:
    def __init__(self, width=120, height=120, frames=30, channel=3, 
                 crop = True, normalize = False, affine = False, flip = False, edge = False  ):
        self.width = width   # X dimension of the image
        self.height = height # Y dimesnion of the image
        self.frames = frames # length/depth of the video frames
        self.channel = channel # number of channels in images 3 for color(RGB) and 1 for Gray  
        self.affine = affine # augment data with affine transform of the image
        self.flip = flip
        self.normalize =  normalize
        self.edge = edge # edge detection
        self.crop = crop

    # Helper function to generate a random affine transform on the image
    def __get_random_affine(self): # private method
        dx, dy = np.random.randint(-1.7, 1.8, 2)
        M = np.float32([[1, 0, dx], [0, 1, dy]])
        return M

    # Helper function to initialize all the batch image data and labels
    def __init_batch_data(self, batch_size): # private method
        batch_data = np.zeros((batch_size, self.frames, self.width, self.height, self.channel)) 
        batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
        return batch_data, batch_labels

    def __load_batch_images(self, source_path, folder_list, batch_num, batch_size, t): # private method
    
        batch_data,batch_labels = self.__init_batch_data(batch_size)
        # We will also build a agumented batch data
        if self.affine:
            batch_data_aug,batch_labels_aug = self.__init_batch_data(batch_size)
        if self.flip:
            batch_data_flip,batch_labels_flip = self.__init_batch_data(batch_size)

        #create a list of image numbers you want to use for a particular video
        img_idx = [x for x in range(0, self.frames)] 

        for folder in range(batch_size): # iterate over the batch_size
            # read all the images in the folder
            imgs = sorted(os.listdir(source_path+'/'+ t[folder + (batch_num*batch_size)].split(';')[0])) 
            # Generate a random affine to be used in image transformation for buidling agumented data set
            M = self.__get_random_affine()
            
            #  Iterate over the frames/images of a folder to read them in
            for idx, item in enumerate(img_idx): 
                image = cv2.imread(source_path+'/'+ t[folder + (batch_num*batch_size)].strip().split(';')[0]+'/'+imgs[item], cv2.IMREAD_COLOR)
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                
                #crop the images and resize them. Note that the images are of 2 different shape 
                #and the conv3D will throw error if the inputs in a batch have different shapes  
                if self.crop:
                    image = self.__crop(image)
                # If normalize is set normalize the image else use the raw image.
                if self.normalize:
                    resized = self.__normalize(self.__resize(image))
                else:
                    resized = self.__resize(image)
                # If the input is edge detected image then use the sobelx, sobely and laplacian as 3 channel of the edge detected image
                if self.edge:
                    resized = self.__edge(resized)
                
                batch_data[folder,idx] = resized
                if self.affine:
                    batch_data_aug[folder,idx] = self.__affine(resized, M)   
                if self.flip:
                    batch_data_flip[folder,idx] = self.__flip(resized)   

            batch_labels[folder, int(t[folder + (batch_num*batch_size)].strip().split(';')[2])] = 1
            
            if self.affine:
                batch_labels_aug[folder, int(t[folder + (batch_num*batch_size)].strip().split(';')[2])] = 1
            
            if self.flip:
                if int(t[folder + (batch_num*batch_size)].strip().split(';')[2])==0:
                    batch_labels_flip[folder, 1] = 1
                elif int(t[folder + (batch_num*batch_size)].strip().split(';')[2])==1:
                    batch_labels_flip[folder, 0] = 1
                else:
                    batch_labels_flip[folder, int(t[folder + (batch_num*batch_size)].strip().split(';')[2])] = 1
        
        if self.affine:
            batch_data = np.append(batch_data, batch_data_aug, axis = 0) 
            batch_labels = np.append(batch_labels, batch_labels_aug, axis = 0) 
        if self.flip:
            batch_data = np.append(batch_data, batch_data_flip, axis = 0) 
            batch_labels = np.append(batch_labels, batch_labels_flip, axis = 0) 

        return batch_data, batch_labels
    
    def generator(self, source_path, folder_list, batch_size): # public method
        print( 'Source path = ', source_path, '; batch size =', batch_size)
        while True:
            t = np.random.permutation(folder_list)
            num_batches = len(folder_list)//batch_size # calculate the number of batches
            for batch in range(num_batches): # we iterate over the number of batches
                # you yield the batch_data and the batch_labels, remember what does yield do
                yield self.__load_batch_images(source_path, folder_list, batch, batch_size, t) 
            
            # write the code for the remaining data points which are left after full batches
            if (len(folder_list) != batch_size*num_batches):
                batch_size = len(folder_list) - (batch_size*num_batches)
                yield self.__load_batch_images(source_path, folder_list, num_batches, batch_size, t)
                
## Helper functions for image processing

    #Affine transform on the image
    def __affine(self, image, M):
        return cv2.warpAffine(image, M, (image.shape[0], image.shape[1]))

    # Flipping the image
    def __flip(self, image):
        return np.flip(image,1)
    
    # Helper function to normalise the data
    def __normalize(self, image):
        return image/127.5-1
    
    # Resizing the image
    def __resize(self, image):
        return cv2.resize(image, (self.width,self.height), interpolation = cv2.INTER_AREA)
    
    # Cropoing the image
    def __crop(self, image):
        if image.shape[0] != image.shape[1]:
            return image[0:120, 20:140]
        else:
            return image

    # Edge detection
    def __edge(self, image):
        edge = np.zeros((image.shape[0], image.shape[1], image.shape[2]))
        edge[:,:,0] = cv2.Laplacian(cv2.GaussianBlur(image[:,:,0],(3,3),0),cv2.CV_64F)
        edge[:,:,1] = cv2.Laplacian(cv2.GaussianBlur(image[:,:,1],(3,3),0),cv2.CV_64F)
        edge[:,:,2] = cv2.Laplacian(cv2.GaussianBlur(image[:,:,2],(3,3),0),cv2.CV_64F)
        return edge

## Train class

In [7]:
def train(batch_size, num_epochs, model, train_generator, val_generator, optimiser=None):

    curr_dt_time = datetime.datetime.now()

    num_train_sequences = len(train_doc)
    print('# training sequences =', num_train_sequences)
    num_val_sequences = len(val_doc)
    print('# validation sequences =', num_val_sequences)
    print('# batch size =', batch_size)    
    print('# epochs =', num_epochs)

    #write your optimizer
    if optimiser == None:
        optimiser = Adam() 
    model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
    print (model.summary())
    
    model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
    if not os.path.exists(model_name):
        os.mkdir(model_name)
            
    filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

    checkpoint = ModelCheckpoint(filepath, 
                                 monitor='val_loss', 
                                 verbose=1, 
                                 save_best_only=False, 
                                 save_weights_only=False, 
                                 mode='auto', 
                                 period=1)
    LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, cooldown=1, verbose=1)
    callbacks_list = [checkpoint, LR]

    if (num_train_sequences%batch_size) == 0:
        steps_per_epoch = int(num_train_sequences/batch_size)
    else:
        steps_per_epoch = (num_train_sequences//batch_size) + 1

    if (num_val_sequences%batch_size) == 0:
        validation_steps = int(num_val_sequences/batch_size)
    else:
        validation_steps = (num_val_sequences//batch_size) + 1

    model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                callbacks=callbacks_list, validation_data=val_generator, 
                validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)
    
    K.clear_session()

## Model class

In [8]:
class ModelGenerator(object):
    
    @classmethod
    def c3d1(cls, input_shape, nb_classes):
        """
        Build a 3D convolutional network, based loosely on C3D.
            https://arxiv.org/pdf/1412.0767.pdf
        """
        # Model.
        model = Sequential()
        model.add(Conv3D(
            8, (3,3,3), activation='relu', input_shape=input_shape
        ))
        model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2)))
        model.add(Conv3D(16, (3,3,3), activation='relu'))
        model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2)))
        model.add(Conv3D(32, (3,3,3), activation='relu'))
        model.add(Conv3D(32, (3,3,3), activation='relu'))
        model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2)))
        model.add(Conv3D(64, (2,2,2), activation='relu'))
        model.add(Conv3D(64, (2,2,2), activation='relu'))
        model.add(MaxPooling3D(pool_size=(1, 2, 2), strides=(1, 2, 2)))

        model.add(Flatten())
        model.add(Dense(512))
        model.add(Dropout(0.5))
        model.add(Dense(256))
        model.add(Dropout(0.5))
        model.add(Dense(nb_classes, activation='softmax'))

        return model
    
    @classmethod
    def c3d2(cls, input_shape, nb_classes):
        model = Sequential()
        model.add(Conv3D(16, kernel_size=(3, 3, 3), input_shape=input_shape, padding='same'))
        model.add(Activation('relu'))
        model.add(Conv3D(16, padding="same", kernel_size=(3, 3, 3)))
        model.add(Activation('relu'))
        model.add(MaxPooling3D(pool_size=(3, 3, 3), padding="same"))
        model.add(Dropout(0.25))

        model.add(Conv3D(32, padding="same", kernel_size=(3, 3, 3)))
        model.add(Activation('relu'))
        model.add(Conv3D(32, padding="same", kernel_size=(3, 3, 3)))
        model.add(Activation('relu'))
        model.add(MaxPooling3D(pool_size=(3, 3, 3), padding="same"))
        model.add(Dropout(0.25))

        model.add(Conv3D(32, padding="same", kernel_size=(3, 3, 3)))
        model.add(Activation('relu'))
        model.add(Conv3D(32, padding="same", kernel_size=(3, 3, 3)))
        model.add(Activation('relu'))
        model.add(MaxPooling3D(pool_size=(3, 3, 3), padding="same"))
        model.add(Dropout(0.25))

        model.add(Flatten())
        model.add(Dense(512, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(0.5))
        model.add(Dense(nb_classes, activation='softmax'))

        return model
    
    
    @classmethod
    ## CNN(Conv2D) + RNN(LSTM)
    def lrcn(cls, input_shape, nb_classes):
        model = Sequential()

        model.add(TimeDistributed(Conv2D(32, (7, 7), strides=(2, 2),
            activation='relu', padding='same'), input_shape=input_shape))
        model.add(TimeDistributed(Conv2D(32, (3,3),
            kernel_initializer="he_normal", activation='relu')))
        model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))

        model.add(TimeDistributed(Conv2D(64, (3,3),
            padding='same', activation='relu')))
        model.add(TimeDistributed(Conv2D(64, (3,3),
            padding='same', activation='relu')))
        model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))

        model.add(TimeDistributed(Conv2D(128, (3,3),
            padding='same', activation='relu')))
        model.add(TimeDistributed(Conv2D(128, (3,3),
            padding='same', activation='relu')))
        model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))

        model.add(TimeDistributed(Conv2D(256, (3,3),
            padding='same', activation='relu')))
        model.add(TimeDistributed(Conv2D(256, (3,3),
            padding='same', activation='relu')))
        model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
        
        model.add(TimeDistributed(Conv2D(512, (3,3),
            padding='same', activation='relu')))
        model.add(TimeDistributed(Conv2D(512, (3,3),
            padding='same', activation='relu')))
        model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))

        model.add(TimeDistributed(Flatten()))

        model.add(Dropout(0.5))
        model.add(LSTM(256, return_sequences=False, dropout=0.5))
        model.add(Dense(nb_classes, activation='softmax'))

        return model

In [9]:
train_path = './Project_data/train'
val_path = './Project_data/val'

## Model #1

### Model 1a : Resize to 120*120,  Raw image input, No cropping, No normalisation, No agumentation, No flipped images, No edge detection

In [21]:
train_gen = DataGenerator()
val_gen = DataGenerator()
model_gen = ModelGenerator()

input_shape = (30,120,120, 3)
num_classes = 5

model = model_gen.c3d1(input_shape, num_classes)

batch_size = 20
num_epochs = 20

train_generator = train_gen.generator(train_path, train_doc, batch_size)
val_generator = val_gen.generator(val_path, val_doc, batch_size)
train(batch_size, num_epochs, model, train_generator, val_generator)

# training sequences = 663
# validation sequences = 100
# batch size = 20
# epochs = 20
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_7 (Conv3D)            (None, 28, 118, 118, 8)   656       
_________________________________________________________________
max_pooling3d_5 (MaxPooling3 (None, 28, 59, 59, 8)     0         
_________________________________________________________________
conv3d_8 (Conv3D)            (None, 26, 57, 57, 16)    3472      
_________________________________________________________________
max_pooling3d_6 (MaxPooling3 (None, 26, 28, 28, 16)    0         
_________________________________________________________________
conv3d_9 (Conv3D)            (None, 24, 26, 26, 32)    13856     
_________________________________________________________________
conv3d_10 (Conv3D)           (None, 22, 24, 24, 32)    27680     
______________________________________________________

## Categorical accuracy after 20 epochs  = 0.22
- training sequences = 663
- validation sequences = 100
- batch size = 20
- epochs = 20

### Model 2: Resize to 120*120,  agumentation, flipped images, normalisation, cropping, edge detection

In [None]:
train_gen = DataGenerator(affine=True, flip=True, normalize=True, crop=True, edge=True)
val_gen = DataGenerator()
model_gen = ModelGenerator()

input_shape = (30,120,120, 3)
num_classes = 5

model = model_gen.c3d1(input_shape, num_classes)

batch_size = 20
num_epochs = 20

train_generator = train_gen.generator(train_path, train_doc, batch_size)
val_generator = val_gen.generator(val_path, val_doc, batch_size)
train(batch_size, num_epochs, model, train_generator, val_generator)

# training sequences = 663
# validation sequences = 100
# batch size = 20
# epochs = 20
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_31 (Conv3D)           (None, 28, 118, 118, 8)   656       
_________________________________________________________________
max_pooling3d_21 (MaxPooling (None, 28, 59, 59, 8)     0         
_________________________________________________________________
conv3d_32 (Conv3D)           (None, 26, 57, 57, 16)    3472      
_________________________________________________________________
max_pooling3d_22 (MaxPooling (None, 26, 28, 28, 16)    0         
_________________________________________________________________
conv3d_33 (Conv3D)           (None, 24, 26, 26, 32)    13856     
_________________________________________________________________
conv3d_34 (Conv3D)           (None, 22, 24, 24, 32)    27680     
______________________________________________________

## model 3
<ul>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;">Resize to 120*120</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;">agumentation</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;">flipped images</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;"> No normalisation</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;"> No cropping</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;">No edge detection</span></h4>
<br /><br /></li>
</ul>

In [20]:
train_gen = DataGenerator(affine=True, flip=True)
val_gen = DataGenerator()
model_gen = ModelGenerator()

input_shape = (30,120,120, 3)
num_classes = 5

model = model_gen.lrcn(input_shape, num_classes)

batch_size = 10
#num_epochs = 20

train_generator = train_gen.generator(train_path, train_doc, batch_size)
val_generator = val_gen.generator(val_path, val_doc, batch_size)
train(batch_size, num_epochs, model, train_generator, val_generator)

# training sequences = 663
# validation sequences = 100
# batch size = 10
# epochs = 20
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
time_distributed_1 (TimeDist (None, 30, 60, 60, 32)    4736      
_________________________________________________________________
time_distributed_2 (TimeDist (None, 30, 58, 58, 32)    9248      
_________________________________________________________________
time_distributed_3 (TimeDist (None, 30, 29, 29, 32)    0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, 30, 29, 29, 64)    18496     
_________________________________________________________________
time_distributed_5 (TimeDist (None, 30, 29, 29, 64)    36928     
_________________________________________________________________
time_distributed_6 (TimeDist (None, 30, 14, 14, 64)    0         
______________________________________________________


Epoch 00017: saving model to model_init_2019-03-1610_25_33.308826/model-00017-1.66619-0.23217-1.65883-0.23000.h5
Epoch 18/20

Epoch 00018: saving model to model_init_2019-03-1610_25_33.308826/model-00018-1.67348-0.20398-1.65757-0.25000.h5

Epoch 00018: ReduceLROnPlateau reducing learning rate to 3.906250185536919e-06.
Epoch 19/20

Epoch 00019: saving model to model_init_2019-03-1610_25_33.308826/model-00019-1.66058-0.22388-1.66125-0.25000.h5
Epoch 20/20

Epoch 00020: saving model to model_init_2019-03-1610_25_33.308826/model-00020-1.67395-0.19237-1.66057-0.25000.h5

Epoch 00020: ReduceLROnPlateau reducing learning rate to 1.9531250927684596e-06.


# categorical accuracy of 25

## model 4
<ul>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;">Resize to 120*120</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;">agumentation</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;">flipped images</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;"> No normalisation</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;"> No cropping</span></h4>
</li>
<li>
<h4 id="Model-3c-:-Resize-to-120*120,--agumentation,-flipped-images,-No-normalisation,-No-cropping,-No-edge-detection"><span style="color: #339966;">No edge detection</span></h4>
<br /><br /></li>
</ul>

In [11]:
train_gen = DataGenerator(affine=True, flip=True)
val_gen = DataGenerator()
model_gen = ModelGenerator()

input_shape = (30,120,120, 3)
num_classes = 5

model = model_gen.c3d2(input_shape, num_classes)

batch_size = 10
#num_epochs = 20

train_generator = train_gen.generator(train_path, train_doc, batch_size)
val_generator = val_gen.generator(val_path, val_doc, batch_size)
train(batch_size, num_epochs, model, train_generator, val_generator)

# training sequences = 663
# validation sequences = 100
# batch size = 10
# epochs = 20
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_1 (Conv3D)            (None, 30, 120, 120, 16)  1312      
_________________________________________________________________
activation_1 (Activation)    (None, 30, 120, 120, 16)  0         
_________________________________________________________________
conv3d_2 (Conv3D)            (None, 30, 120, 120, 16)  6928      
_________________________________________________________________
activation_2 (Activation)    (None, 30, 120, 120, 16)  0         
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 10, 40, 40, 16)    0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 10, 40, 40, 16)    0         
______________________________________________________


Epoch 00017: saving model to model_init_2019-03-1607_40_59.356783/model-00017-1.01241-0.60697-0.72867-0.71000.h5

Epoch 00017: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 18/20

Epoch 00018: saving model to model_init_2019-03-1607_40_59.356783/model-00018-0.96165-0.59370-0.75363-0.69000.h5
Epoch 19/20

Epoch 00019: saving model to model_init_2019-03-1607_40_59.356783/model-00019-0.99887-0.58209-0.66484-0.80000.h5

Epoch 00019: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 20/20

Epoch 00020: saving model to model_init_2019-03-1607_40_59.356783/model-00020-0.94170-0.61028-0.66658-0.79000.h5


<h2>Categorical Validation accuracy: <span style="color: #339966;">79%</span></h2>
<ul>
<li>Total params: <strong><span style="color: #339966;">929,461</span></strong></li>
<li>Trainable params: <strong><span style="color: #339966;">928,437</span></strong><br /><br /></li>
</ul>

## Let's try to improve this model with edge detection.

In [18]:
train_gen = DataGenerator(affine=True, flip=True,edge=True)
val_gen = DataGenerator()
model_gen = ModelGenerator()

input_shape = (30,120,120, 3)
num_classes = 5

model = model_gen.c3d2(input_shape, num_classes)

batch_size = 20
num_epochs = 20

train_generator = train_gen.generator(train_path, train_doc, batch_size)
val_generator = val_gen.generator(val_path, val_doc, batch_size)
train(batch_size, num_epochs, model, train_generator, val_generator)

# training sequences = 663
# validation sequences = 100
# batch size = 20
# epochs = 20
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_1 (Conv3D)            (None, 30, 120, 120, 16)  1312      
_________________________________________________________________
activation_1 (Activation)    (None, 30, 120, 120, 16)  0         
_________________________________________________________________
conv3d_2 (Conv3D)            (None, 30, 120, 120, 16)  6928      
_________________________________________________________________
activation_2 (Activation)    (None, 30, 120, 120, 16)  0         
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 10, 40, 40, 16)    0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 10, 40, 40, 16)    0         
______________________________________________________


Epoch 00017: saving model to model_init_2019-03-1609_52_37.134220/model-00017-1.31849-0.48366-2.12297-0.26000.h5

Epoch 00017: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 18/20

Epoch 00018: saving model to model_init_2019-03-1609_52_37.134220/model-00018-1.45869-0.41176-2.33950-0.28000.h5
Epoch 19/20

Epoch 00019: saving model to model_init_2019-03-1609_52_37.134220/model-00019-1.18008-0.52288-2.28002-0.31000.h5

Epoch 00019: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05.
Epoch 20/20

Epoch 00020: saving model to model_init_2019-03-1609_52_37.134220/model-00020-1.17611-0.53922-2.22794-0.32000.h5


### There was no improvement

<h3><span style="color: #339966;">Back to vanilla code</span></h3>

<h3>Final Model</h3>

In [22]:
# Parameters initialization
nb_rows = 120   # X dimension of the image
nb_cols = 120   # Y dimesnion of the image
#total_frames = 30
nb_frames = 30  # lenght of the video frames
nb_channel = 3 # numbe rof channels in images 3 for color(RGB) and 1 for Gray

# Helper function to generate a random affine transform on the iamge
def get_random_affine():
    dx, dy = np.random.randint(-1.7, 1.8, 2)
    M = np.float32([[1, 0, dx], [0, 1, dy]])
    return M

# Helper function to normalise the data
def normalize_data(data):
    return data/127.5-1

# Helper function to initialize all the batch image data and labels
def init_batch_data(batch_size):
    batch_data = np.zeros((batch_size, nb_frames, nb_rows, nb_cols, nb_channel)) 
    batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
    return batch_data, batch_labels

def load_batch_images(source_path, folder_list, batch_num, batch_size, t,validation):
    
    batch_data,batch_labels = init_batch_data(batch_size)
    
    # We will also build an augumented batch data with affine transformation
    batch_data_aug,batch_labels_aug = init_batch_data(batch_size)
    
    # We will also build an augmented batch data with horizontal flip
    batch_data_flip,batch_labels_flip = init_batch_data(batch_size)
    
    #create a list of image numbers you want to use for a particular video using full frames
    img_idx = [x for x in range(0, nb_frames)] 

    for folder in range(batch_size): # iterate over the batch_size
        # read all the images in the folder
        imgs = sorted(os.listdir(source_path+'/'+ t[folder + (batch_num*batch_size)].split(';')[0])) 
        # Generate a random affine to be used in image transformation for buidling agumented data set
        M = get_random_affine()
        
        #  Iterate over the frames/images of a folder to read them in
        for idx, item in enumerate(img_idx): 
            ## image = imread(source_path+'/'+ t[folder + (batch_num*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
            image = cv2.imread(source_path+'/'+ t[folder + (batch_num*batch_size)].strip().split(';')[0]+'/'+imgs[item], cv2.IMREAD_COLOR)
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            
            # Cropping non symmetric frames
            if image.shape[0] != image.shape[1]:
                image=image[0:120,20:140]
            
            #crop the images and resize them. Note that the images are of 2 different shape 
            #and the conv3D will throw error if the inputs in a batch have different shapes   
            resized = cv2.resize(image, (nb_rows,nb_cols), interpolation = cv2.INTER_AREA)
            #Normal data
            batch_data[folder,idx] = (resized)
            
            #Data with affine transformation
            batch_data_aug[folder,idx] = (cv2.warpAffine(resized, M, (resized.shape[0], resized.shape[1])))
            
            # Data with horizontal flip
            batch_data_flip[folder,idx]= np.flip(resized,1)

        batch_labels[folder, int(t[folder + (batch_num*batch_size)].strip().split(';')[2])] = 1
        batch_labels_aug[folder, int(t[folder + (batch_num*batch_size)].strip().split(';')[2])] = 1
        
        # Labeling data with horizobtal flip, right swipe becomes left swipe and viceversa
        if int(t[folder + (batch_num*batch_size)].strip().split(';')[2])==0:
                    batch_labels_flip[folder, 1] = 1
        elif int(t[folder + (batch_num*batch_size)].strip().split(';')[2])==1:
                    batch_labels_flip[folder, 0] = 1
                    
        else:
                    batch_labels_flip[folder, int(t[folder + (batch_num*batch_size)].strip().split(';')[2])] = 1
                  
    
    batch_data_final = np.append(batch_data, batch_data_aug, axis = 0)
    batch_data_final = np.append(batch_data_final, batch_data_flip, axis = 0)

    batch_labels_final = np.append(batch_labels, batch_labels_aug, axis = 0) 
    batch_labels_final = np.append(batch_labels_final, batch_labels_flip, axis = 0)
    
    if validation:
        batch_data_final=batch_data
        batch_labels_final= batch_labels
        
    return batch_data_final,batch_labels_final

def generator(source_path, folder_list, batch_size, validation=False):
    print( 'Source path = ', source_path, '; batch size =', batch_size)
    while True:
        t = np.random.permutation(folder_list)
        num_batches = len(folder_list)//batch_size # calculate the number of batches
        for batch in range(num_batches): # we iterate over the number of batches
            # you yield the batch_data and the batch_labels, remember what does yield do
            yield load_batch_images(source_path, folder_list, batch, batch_size, t,validation)
            

        
        # Code for the remaining data points which are left after full batches
        if (len(folder_list) != batch_size*num_batches):
            batch_size = len(folder_list) - (batch_size*num_batches)
            yield load_batch_images(source_path, folder_list, batch, batch_size, t,validation)


In [23]:
curr_dt_time = datetime.datetime.now()
train_path = './Project_data/train'
val_path = './Project_data/val'
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_epochs = 10 # choose the number of epochs
print ('# epochs =', num_epochs)

# training sequences = 663
# validation sequences = 100
# epochs = 10


In [24]:
nb_filters = [8,16,32,64]
nb_dense = [256, 128, 5]

# Input
input_shape=(nb_frames,nb_rows,nb_cols,nb_channel)

# Define model
model = Sequential()

model.add(Conv3D(nb_filters[0], 
                 kernel_size=(3,3,3), 
                 input_shape=input_shape,
                 padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(MaxPooling3D(pool_size=(2,2,2)))

model.add(Conv3D(nb_filters[1], 
                 kernel_size=(3,3,3), 
                 padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(MaxPooling3D(pool_size=(2,2,2)))

model.add(Conv3D(nb_filters[2], 
                 kernel_size=(1,3,3), 
                 padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(MaxPooling3D(pool_size=(2,2,2)))

model.add(Conv3D(nb_filters[3], 
                 kernel_size=(1,3,3), 
                 padding='same'))
model.add(Activation('relu'))
model.add(Dropout(0.25))

model.add(MaxPooling3D(pool_size=(2,2,2)))

#Flatten Layers
model.add(Flatten())

model.add(Dense(nb_dense[0], activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(nb_dense[1], activation='relu'))
model.add(Dropout(0.5))

#softmax layer
model.add(Dense(nb_dense[2], activation='softmax'))

In [26]:
optimiser = Adam() #write your optimizer
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_1 (Conv3D)            (None, 30, 120, 120, 8)   656       
_________________________________________________________________
batch_normalization_1 (Batch (None, 30, 120, 120, 8)   32        
_________________________________________________________________
activation_1 (Activation)    (None, 30, 120, 120, 8)   0         
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 15, 60, 60, 8)     0         
_________________________________________________________________
conv3d_2 (Conv3D)            (None, 15, 60, 60, 16)    3472      
_________________________________________________________________
batch_normalization_2 (Batch (None, 15, 60, 60, 16)    64        
_________________________________________________________________
activation_2 (Activation)    (None, 15, 60, 60, 16)    0         
__________

Let us create the `train_generator` and the `val_generator` which will be used in `.fit_generator`.

In [27]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size,validation=True)

In [28]:
model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

# write the Reducelronplateau code here
LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, cooldown=1, verbose=1)
callbacks_list = [checkpoint, LR]

The `steps_per_epoch` and `validation_steps` are used by `fit_generator` to decide the number of next() calls it need to make.

In [29]:
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

Let us now fit the model. This will start training the model and with the help of the checkpoints, you'll be able to save the model at the end of each epoch.

In [30]:
batch_size = 10
num_epochs = 20
model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Source path =  ./Project_data/val ; batch size = 10
Source path =  ./Project_data/train ; batch size = 10
Epoch 1/20

Epoch 00001: saving model to model_init_2019-03-1610_47_14.696608/model-00001-1.79432-0.26144-1.49009-0.37000.h5
Epoch 2/20

Epoch 00002: saving model to model_init_2019-03-1610_47_14.696608/model-00002-1.50551-0.38474-1.45627-0.32000.h5
Epoch 3/20

Epoch 00003: saving model to model_init_2019-03-1610_47_14.696608/model-00003-1.42493-0.43118-1.45265-0.29000.h5
Epoch 4/20

Epoch 00004: saving model to model_init_2019-03-1610_47_14.696608/model-00004-1.29612-0.47264-1.30985-0.43000.h5
Epoch 5/20

Epoch 00005: saving model to model_init_2019-03-1610_47_14.696608/model-00005-1.41648-0.41128-1.23605-0.51000.h5
Epoch 6/20

Epoch 00006: saving model to model_init_2019-03-1610_47_14.696608/model-00006-1.13529-0.57546-1.61266-0.28000.h5
Epoch 7/20

Epoch 00007: saving model to model_init_2019-03-1610_47_14.696608/model-00007-1.05605-0.56053-2.66717-0.35000.h5

Epoch 00007: Reduc

<keras.callbacks.History at 0x7f96e15a4710>

<h3>Final Model Accuracy is <span style="color: #339966;">82%</span></h3>
<h3>Afer increasing batch size we got <span style="color: #339966;">84%</span></h3>

In [31]:
batch_size = 20
num_epochs = 20

if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1
    
model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Epoch 1/20

Epoch 00001: saving model to model_init_2019-03-1610_47_14.696608/model-00001-0.43559-0.84314-0.31854-0.86000.h5
Epoch 2/20

Epoch 00002: saving model to model_init_2019-03-1610_47_14.696608/model-00002-0.40964-0.84314-0.56767-0.78000.h5
Epoch 3/20

Epoch 00003: saving model to model_init_2019-03-1610_47_14.696608/model-00003-0.44172-0.85948-0.38684-0.82000.h5

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 4/20

Epoch 00004: saving model to model_init_2019-03-1610_47_14.696608/model-00004-0.29219-0.87908-0.52306-0.78000.h5
Epoch 5/20

Epoch 00005: saving model to model_init_2019-03-1610_47_14.696608/model-00005-0.25718-0.91830-0.50805-0.80000.h5

Epoch 00005: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05.
Epoch 6/20

Epoch 00006: saving model to model_init_2019-03-1610_47_14.696608/model-00006-0.23940-0.91503-0.36400-0.80000.h5
Epoch 7/20

Epoch 00007: saving model to model_init_2019-03-1610_47_14.696608/model-00007

<keras.callbacks.History at 0x7f979c8839b0>