# Gesture Recognition
In this group project, you are going to build a 3D Conv model that will be able to predict the 5 gestures correctly. Please import the following libraries to get started. Once you have completed the code you can download the notebook for making a submission.

In [2]:
#Importing Necessary Libraries and Modules


import numpy as np
import os
from imageio import imread
from skimage.transform import resize
import datetime
import os 

We set the random seed so that the results don't vary drastically.

In [3]:
#Setting up seed as a constant for consistent results for each run


np.random.seed(30)
import random as rn
rn.seed(30)
from tensorflow import keras
import tensorflow as tf
tf.random.set_seed(30)

In this block, you read the folder names for training and validation. You also set the `batch_size` here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

**data path: /home/datasets/Project_data**

In [4]:
#Reading the csv for train and validation datasets


train_doc = np.random.permutation(open('/home/datasets/Project_data/train.csv').readlines())
val_doc = np.random.permutation(open('/home/datasets/Project_data/val.csv').readlines())
batch_size = 20

FileNotFoundError: [Errno 2] No such file or directory: '/home/datasets/Project_data/train.csv'

## Generator
This is one of the most important part of the code. The overall structure of the generator has been given. In the generator, you are going to preprocess the images as you have images of 2 different dimensions as well as create a batch of video frames. You have to experiment with `img_idx`, `y`,`z` and normalization such that you get high accuracy.

In [5]:
#Function for defining generators which will generate batches for train and validation


def generator(source_path, folder_list, batch_size):
    print( 'Source path = ', source_path, '; batch size =', batch_size)
    img_idx = list(range(5,25))
    while True:
        t = np.random.permutation(folder_list)
        num_batches = len(folder_list)//batch_size
        for batch in range(num_batches): # we iterate over the number of batches
            batch_data = np.zeros((batch_size,20,120,120,3)) # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
            batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
            for folder in range(batch_size): # iterate over the batch_size
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
                for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    
                    image=resize(image, (120, 120),anti_aliasing=True, preserve_range=False) #Resizing the image into 120x120
                    image = (image - np.min(image))/(np.max(image)- np.min(image)) #Normalising the image pixel values
                    
                    
                    batch_data[folder,idx,:,:,0] = image[:,:,0]
                    batch_data[folder,idx,:,:,1] = image[:,:,1]
                    batch_data[folder,idx,:,:,2] = image[:,:,2]
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels #you yield the batch_data and the batch_labels, remember what does yield do

        #Code for residual data to complete one pass on dataset
        if(len(t)%batch_size) != 0:
            batch_data = np.zeros((batch_size,20,120,120,3)) # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
            batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
            for folder in range(batch_size): # iterate over the batch_size
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
                for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    
                    image=resize(image, (120, 120),anti_aliasing=True, preserve_range=False) #Resizing the image into 120x120
                    image = (image - np.min(image))/(np.max(image)- np.min(image)) #Normalising the image pixel values
                    
                    
                    batch_data[folder,idx,:,:,0] = image[:,:,0]
                    batch_data[folder,idx,:,:,1] = image[:,:,1]
                    batch_data[folder,idx,:,:,2] = image[:,:,2]
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels #you yield the batch_data and the batch_labels, remember what does yield do


Note here that a video is represented above in the generator as (number of images, height, width, number of channels). Take this into consideration while creating the model architecture.

In [6]:
#Defining the paths for train and validation dataset, getting current date which is used for naming h5 file


curr_dt_time = datetime.datetime.now()
train_path = '/home/datasets/Project_data/train'
val_path = '/home/datasets/Project_data/val'
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_epochs = 20
print ('# epochs =', num_epochs)

# training sequences = 663
# validation sequences = 100
# epochs = 20


## Model
Here you make the model using different functionalities that Keras provides. Remember to use `Conv3D` and `MaxPooling3D` and not `Conv2D` and `Maxpooling2D` for a 3D convolution model. You would want to use `TimeDistributed` while building a Conv2D + RNN model. Also remember that the last layer is the softmax. Design the network in such a way that the model is able to give good accuracy on the least number of parameters so that it can fit in the memory of the webcam.

#### Conv3D+MaxPooling3D Model

In [7]:
#Making necessary imports for Conv3D and Maxpooling3D
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, BatchNormalization, Activation, Conv3D, MaxPooling3D
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras import optimizers

#Defining constants for Model
filter_params = [8, 16, 32, 64]
dense_params = [256, 128, 5]
input_shape = (20, 120, 120, 3)


#Model 
model = Sequential([
    Conv3D(filter_params[0], kernel_size=(3, 3, 3), input_shape=input_shape,padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling3D(pool_size=(2, 2, 2)),

    Conv3D(filter_params[1], kernel_size=(3, 3, 3), padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling3D(pool_size=(2, 2, 2)),

    Conv3D(filter_params[2], kernel_size=(1, 3, 3), padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling3D(pool_size=(2, 2, 2)),

    Conv3D(filter_params[3], kernel_size=(1, 3, 3), padding='same', activation='relu'),
    Dropout(0.25),
    MaxPooling3D(pool_size=(2, 2, 2)),
    
    Flatten(),

    Dense(dense_params[0], activation='relu'),
    Dropout(0.5),

    Dense(dense_params[1], activation='relu'),
    Dropout(0.5),
    
    Dense(dense_params[2], activation='softmax')
])

Now that you have written the model, the next step is to `compile` the model. When you print the `summary` of the model, you'll see the total number of parameters you have to train.

In [8]:
#Importing ADAptive Moment Optimiser
from tensorflow.keras.optimizers import Adam

optimiser = Adam()
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d (Conv3D)              (None, 20, 120, 120, 8)   656       
_________________________________________________________________
batch_normalization (BatchNo (None, 20, 120, 120, 8)   32        
_________________________________________________________________
max_pooling3d (MaxPooling3D) (None, 10, 60, 60, 8)     0         
_________________________________________________________________
conv3d_1 (Conv3D)            (None, 10, 60, 60, 16)    3472      
_________________________________________________________________
batch_normalization_1 (Batch (None, 10, 60, 60, 16)    64        
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 5, 30, 30, 16)     0         
_________________________________________________________________
conv3d_2 (Conv3D)            (None, 5, 30, 30, 32)     4

Let us create the `train_generator` and the `val_generator` which will be used in `.fit_generator`.

In [9]:
#Using generator function to create training and test batches


train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

In [10]:
#Defining model name, filepath for h5 file, checkpoint, callback and LR Reduction 


model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1)

callbacks_list = [checkpoint, LR]



The `steps_per_epoch` and `validation_steps` are used by `fit` method to decide the number of next() calls it need to make.

In [11]:
#Defining number of runs for the datasets which dependent on the batch size 


if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

print(steps_per_epoch)
print(validation_steps)

67
10


Let us now fit the model. This will start training the model and with the help of the checkpoints, you'll be able to save the model at the end of each epoch.

In [16]:
#Fitting the Model on Train dataset and Evaluatng on Validation dataset


model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Epoch 1/20

Epoch 00001: val_loss improved from inf to 1.58134, saving model to model_init_2021-08-0114_55_06.574721/model-00001-1.63648-0.26716-1.58134-0.24000.h5
Epoch 2/20
Epoch 00002: val_loss did not improve from 1.58134
Epoch 3/20
Epoch 00003: val_loss did not improve from 1.58134

Epoch 00003: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 4/20
Epoch 00004: val_loss did not improve from 1.58134
Epoch 5/20
Epoch 00005: val_loss improved from 1.58134 to 1.17051, saving model to model_init_2021-08-0114_55_06.574721/model-00005-0.92610-0.62836-1.17051-0.51000.h5
Epoch 6/20
Epoch 00006: val_loss improved from 1.17051 to 1.07876, saving model to model_init_2021-08-0114_55_06.574721/model-00006-0.77584-0.68060-1.07876-0.50000.h5
Epoch 7/20
Epoch 00007: val_loss improved from 1.07876 to 0.73564, saving model to model_init_2021-08-0114_55_06.574721/model-00007-0.65231-0.74179-0.73564-0.73000.h5
Epoch 8/20
Epoch 00008: val_loss did not improve from 0.73564
Epoch 

<tensorflow.python.keras.callbacks.History at 0x7fb13435cc50>

In [None]:
#Model Accuracy on Train : 96% 
#Model Accuracy on Validation : 87%

#### Conv2D+RNN Model

In [12]:
#Making necessary imports for Conv2D, Maxpooling2D, TimeDistributed, GRU
from tensorflow.keras.layers import Conv2D, MaxPooling2D, TimeDistributed, GRU


#Model
model2 = Sequential([
    TimeDistributed(Conv2D(filter_params[0], (3, 3), strides=(2, 2),activation='relu', padding='same'), input_shape=input_shape),

    TimeDistributed(Conv2D(filter_params[1], (3, 3),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    TimeDistributed(Conv2D(filter_params[2], (3, 3),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    TimeDistributed(Conv2D(filter_params[3], (2, 2),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    TimeDistributed(BatchNormalization()),
    Dropout(0.25),

    TimeDistributed(Flatten()),

    Dense(dense_params[0], activation='relu'),
    Dropout(0.25),
    
    Dense(dense_params[1], activation='relu'),
    Dropout(0.25),

    GRU(128, return_sequences=False),
    Dense(dense_params[2], activation='softmax')
])

In [13]:
optimiser = Adam()
model2.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model2.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
time_distributed (TimeDistri (None, 20, 60, 60, 8)     224       
_________________________________________________________________
time_distributed_1 (TimeDist (None, 20, 60, 60, 16)    1168      
_________________________________________________________________
time_distributed_2 (TimeDist (None, 20, 30, 30, 16)    0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 20, 30, 30, 32)    4640      
_________________________________________________________________
time_distributed_4 (TimeDist (None, 20, 15, 15, 32)    0         
_________________________________________________________________
time_distributed_5 (TimeDist (None, 20, 15, 15, 64)    8256      
_________________________________________________________________
time_distributed_6 (TimeDist (None, 20, 7, 7, 64)     

In [30]:
##Fitting the Model on Train dataset and Evaluatng on Validation dataset


model2.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Epoch 1/20
Epoch 00001: val_loss did not improve from 0.28617
Epoch 2/20
Epoch 00002: val_loss did not improve from 0.28617
Epoch 3/20
Epoch 00003: val_loss did not improve from 0.28617
Epoch 4/20
Epoch 00004: val_loss did not improve from 0.28617
Epoch 5/20
Epoch 00005: val_loss did not improve from 0.28617
Epoch 6/20
Epoch 00006: val_loss did not improve from 0.28617

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 7/20
Epoch 00007: val_loss did not improve from 0.28617
Epoch 8/20
Epoch 00008: val_loss did not improve from 0.28617
Epoch 9/20
Epoch 00009: val_loss did not improve from 0.28617
Epoch 10/20
Epoch 00010: val_loss did not improve from 0.28617

Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 11/20
Epoch 00011: val_loss did not improve from 0.28617
Epoch 12/20
Epoch 00012: val_loss did not improve from 0.28617

Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814.
Epoch 13/2

<tensorflow.python.keras.callbacks.History at 0x7fb115a35048>

In [14]:
#Model Accuracy on Train dataset : 100%
#Model Accuracy on Validation dataset : 70%

#### Observations on Base Models

We observe that the Base Model for Conv2D+RNN indicates presence of overfitting whereas the base Conv3D+Maxpooling3D gives better performance with a slight overfitting

### Experiments with Base Models

In [15]:
#Checking GPU usage before Experiments
!nvidia-smi

Mon Aug  2 12:07:52 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01    Driver Version: 470.42.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA RTX A6000    Off  | 00000000:3D:00.0 Off |                  Off |
| 30%   49C    P2    90W / 300W |  10839MiB / 48685MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

#### Experiment 1 

##### Running both Base Models with all Frames, Batch size = 20, Window size = 2, Epochs = 10, Resizing image to 100x100

In [16]:
#Making experiment specific constants and defining custom generator for this experiment
batch_size_exp1 = 20
num_epochs_exp1 = 10
filter_params_exp1 = [8,16,32,64]
dense_params_exp1 = [256, 128, 5]
input_shape_exp1 = (30,100,100,3)

def generator_exp1(source_path, folder_list, batch_size):
    print( 'Source path = ', source_path, '; batch size =', batch_size)
    img_idx = list(range(30))
    while True:
        t = np.random.permutation(folder_list)
        num_batches = len(folder_list)//batch_size
        for batch in range(num_batches): # we iterate over the number of batches
            batch_data = np.zeros((batch_size,30,100,100,3)) # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
            batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
            for folder in range(batch_size): # iterate over the batch_size
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
                for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    
                    image=resize(image, (100, 100),anti_aliasing=True, preserve_range=False) #Resizing the image into 120x120
                    image = (image - np.min(image))/(np.max(image)- np.min(image)) #Normalising the image pixel values
                    
                    
                    batch_data[folder,idx,:,:,0] = image[:,:,0]
                    batch_data[folder,idx,:,:,1] = image[:,:,1]
                    batch_data[folder,idx,:,:,2] = image[:,:,2]
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels #you yield the batch_data and the batch_labels, remember what does yield do

        #Code for residual data to complete one pass on dataset
        if(len(t)%batch_size) != 0:
            batch_data = np.zeros((batch_size,30,100,100,3)) # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
            batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
            for folder in range(batch_size): # iterate over the batch_size
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
                for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    
                    image=resize(image, (100, 100),anti_aliasing=True, preserve_range=False) #Resizing the image into 120x120
                    image = (image - np.min(image))/(np.max(image)- np.min(image)) #Normalising the image pixel values
                    
                    
                    batch_data[folder,idx,:,:,0] = image[:,:,0]
                    batch_data[folder,idx,:,:,1] = image[:,:,1]
                    batch_data[folder,idx,:,:,2] = image[:,:,2]
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels #you yield the batch_data and the batch_labels, remember what does yield do


In [17]:
#Model 
model3 = Sequential([
    Conv3D(filter_params_exp1[0], kernel_size=(2, 2, 2), input_shape=input_shape_exp1,padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling3D(pool_size=(2,2,2)),

    Conv3D(filter_params_exp1[1], kernel_size=(2, 2, 2), padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling3D(pool_size=(2,2,2)),

    Conv3D(filter_params_exp1[2], kernel_size=(1, 2, 2), padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling3D(pool_size=(2, 2, 2)),

    Conv3D(filter_params_exp1[3], kernel_size=(1, 2, 2), padding='same', activation='relu'),
    Dropout(0.25),
    MaxPooling3D(pool_size=(2, 2, 2)),
    
    Flatten(),

    Dense(dense_params_exp1[0], activation='relu'),
    Dropout(0.5),

    Dense(dense_params_exp1[1], activation='relu'),
    Dropout(0.5),
    
    Dense(dense_params_exp1[2], activation='softmax')
])

In [18]:
optimiser = Adam()
model3.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model3.summary())

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_4 (Conv3D)            (None, 30, 100, 100, 8)   200       
_________________________________________________________________
batch_normalization_4 (Batch (None, 30, 100, 100, 8)   32        
_________________________________________________________________
max_pooling3d_4 (MaxPooling3 (None, 15, 50, 50, 8)     0         
_________________________________________________________________
conv3d_5 (Conv3D)            (None, 15, 50, 50, 16)    1040      
_________________________________________________________________
batch_normalization_5 (Batch (None, 15, 50, 50, 16)    64        
_________________________________________________________________
max_pooling3d_5 (MaxPooling3 (None, 7, 25, 25, 16)     0         
_________________________________________________________________
conv3d_6 (Conv3D)            (None, 7, 25, 25, 32)    

In [19]:
train_generator_exp1 = generator_exp1(train_path, train_doc, batch_size_exp1)
val_generator_exp1 = generator_exp1(val_path, val_doc, batch_size_exp1)

In [20]:
if (num_train_sequences%batch_size_exp1) == 0:
    steps_per_epoch_exp1 = int(num_train_sequences/batch_size_exp1)
else:
    steps_per_epoch_exp1 = (num_train_sequences//batch_size_exp1) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps_exp1 = int(num_val_sequences/batch_size_exp1)
else:
    validation_steps_exp1 = (num_val_sequences//batch_size_exp1) + 1

print(steps_per_epoch_exp1)
print(validation_steps_exp1)

34
5


In [45]:
model3.fit(train_generator_exp1, steps_per_epoch=steps_per_epoch_exp1, epochs=num_epochs_exp1, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator_exp1, 
                    validation_steps=validation_steps_exp1, class_weight=None, workers=1, initial_epoch=0)

Source path =  /home/datasets/Project_data/train ; batch size = 20
Epoch 1/10

Epoch 00001: val_loss improved from inf to 1.60893, saving model to model_init_2021-08-0208_28_55.541983/model-00001-2.68628-0.23088-1.60893-0.17000.h5
Epoch 2/10
Epoch 00002: val_loss improved from 1.60893 to 1.60471, saving model to model_init_2021-08-0208_28_55.541983/model-00002-1.53723-0.30882-1.60471-0.22000.h5
Epoch 3/10
Epoch 00003: val_loss did not improve from 1.60471
Epoch 4/10
Epoch 00004: val_loss did not improve from 1.60471

Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 5/10
Epoch 00005: val_loss did not improve from 1.60471
Epoch 6/10
Epoch 00006: val_loss did not improve from 1.60471

Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 7/10
Epoch 00007: val_loss did not improve from 1.60471
Epoch 8/10
Epoch 00008: val_loss did not improve from 1.60471

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.00012500

<tensorflow.python.keras.callbacks.History at 0x7f4f81f37f28>

In [None]:
#Model Accuracy on Train dataset : 70%
#Model Accuracy on Validation dataset : 32%

In [21]:
#Model
model4 = Sequential([
    TimeDistributed(Conv2D(filter_params_exp1[0], (2, 2), strides=(2, 2),activation='relu', padding='same'), input_shape=input_shape_exp1),

    TimeDistributed(Conv2D(filter_params_exp1[1], (2, 2),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    TimeDistributed(Conv2D(filter_params_exp1[2], (2, 2),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    TimeDistributed(Conv2D(filter_params_exp1[3], (2, 2),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    TimeDistributed(BatchNormalization()),
    Dropout(0.25),

    TimeDistributed(Flatten()),

    Dense(dense_params_exp1[0], activation='relu'),
    Dropout(0.25),
    
    Dense(dense_params_exp1[1], activation='relu'),
    Dropout(0.25),

    GRU(128, return_sequences=False),
    Dense(dense_params_exp1[2], activation='softmax')
])

In [22]:
optimiser = Adam()
model4.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model4.summary())

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
time_distributed_9 (TimeDist (None, 30, 50, 50, 8)     104       
_________________________________________________________________
time_distributed_10 (TimeDis (None, 30, 50, 50, 16)    528       
_________________________________________________________________
time_distributed_11 (TimeDis (None, 30, 25, 25, 16)    0         
_________________________________________________________________
time_distributed_12 (TimeDis (None, 30, 25, 25, 32)    2080      
_________________________________________________________________
time_distributed_13 (TimeDis (None, 30, 12, 12, 32)    0         
_________________________________________________________________
time_distributed_14 (TimeDis (None, 30, 12, 12, 64)    8256      
_________________________________________________________________
time_distributed_15 (TimeDis (None, 30, 6, 6, 64)     

In [48]:
model4.fit(train_generator_exp1, steps_per_epoch=steps_per_epoch, epochs=num_epochs_exp1, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator_exp1, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Epoch 1/10
Epoch 00001: val_loss improved from 1.60471 to 1.54130, saving model to model_init_2021-08-0208_28_55.541983/model-00001-1.24049-0.47941-1.54130-0.34000.h5
Epoch 2/10
Epoch 00002: val_loss improved from 1.54130 to 1.47270, saving model to model_init_2021-08-0208_28_55.541983/model-00002-0.77886-0.69853-1.47270-0.39000.h5
Epoch 3/10
Epoch 00003: val_loss improved from 1.47270 to 1.25394, saving model to model_init_2021-08-0208_28_55.541983/model-00003-0.51645-0.80588-1.25394-0.44000.h5
Epoch 4/10
Epoch 00004: val_loss did not improve from 1.25394
Epoch 5/10
Epoch 00005: val_loss improved from 1.25394 to 1.13712, saving model to model_init_2021-08-0208_28_55.541983/model-00005-0.13738-0.95147-1.13712-0.58000.h5
Epoch 6/10
Epoch 00006: val_loss improved from 1.13712 to 0.92880, saving model to model_init_2021-08-0208_28_55.541983/model-00006-0.04765-0.98824-0.92880-0.70000.h5
Epoch 7/10
Epoch 00007: val_loss improved from 0.92880 to 0.79324, saving model to model_init_2021-08-0

<tensorflow.python.keras.callbacks.History at 0x7f4f81f37f98>

In [None]:
#Model Accuracy on Train dataset : ~95%
#Model Accuracy on Validation dataset : 56%

#### Observations on Experiment 1

Seems like this experiment didnt yield good result, we still face a problem of overfitting. Hence we will try to make our models less complex

### Experiment 2 

##### Reducing Base Models' complexity

In [23]:
model5 = Sequential([
    Conv3D(filter_params[1], kernel_size=(2, 2, 2), input_shape=input_shape,padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling3D(pool_size=(2,2,2)),

    #Conv3D(filter_params[1], kernel_size=(2, 2, 2), padding='same', activation='relu'),
    #BatchNormalization(),
    MaxPooling3D(pool_size=(2, 2, 2)),

    Conv3D(filter_params[2], kernel_size=(1, 2, 2), padding='same', activation='relu'),
    BatchNormalization(),
    MaxPooling3D(pool_size=(2, 2, 2)),

    Conv3D(filter_params[3], kernel_size=(1, 2, 2), padding='same', activation='relu'),
    Dropout(0.25),
    MaxPooling3D(pool_size=(2, 2, 2)),
    
    Flatten(),

    Dense(dense_params[1], activation='relu'),
    Dropout(0.5),

    #Dense(dense_params[1], activation='relu'),
    #Dropout(0.5),
    
    Dense(dense_params[2], activation='softmax')
])

In [24]:
optimiser = Adam()
model5.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model5.summary())

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_8 (Conv3D)            (None, 20, 120, 120, 16)  400       
_________________________________________________________________
batch_normalization_8 (Batch (None, 20, 120, 120, 16)  64        
_________________________________________________________________
max_pooling3d_8 (MaxPooling3 (None, 10, 60, 60, 16)    0         
_________________________________________________________________
max_pooling3d_9 (MaxPooling3 (None, 5, 30, 30, 16)     0         
_________________________________________________________________
conv3d_9 (Conv3D)            (None, 5, 30, 30, 32)     2080      
_________________________________________________________________
batch_normalization_9 (Batch (None, 5, 30, 30, 32)     128       
_________________________________________________________________
max_pooling3d_10 (MaxPooling (None, 2, 15, 15, 32)    

In [78]:
model5.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Epoch 1/20
Epoch 00001: val_loss did not improve from 0.72895
Epoch 2/20
Epoch 00002: val_loss did not improve from 0.72895
Epoch 3/20
Epoch 00003: val_loss did not improve from 0.72895
Epoch 4/20
Epoch 00004: val_loss did not improve from 0.72895
Epoch 5/20
Epoch 00005: val_loss did not improve from 0.72895
Epoch 6/20
Epoch 00006: val_loss did not improve from 0.72895
Epoch 7/20
Epoch 00007: val_loss did not improve from 0.72895
Epoch 8/20
Epoch 00008: val_loss improved from 0.72895 to 0.55808, saving model to model_init_2021-08-0208_28_55.541983/model-00008-0.35939-0.87463-0.55808-0.82000.h5
Epoch 9/20
Epoch 00009: val_loss improved from 0.55808 to 0.29814, saving model to model_init_2021-08-0208_28_55.541983/model-00009-0.30106-0.88657-0.29814-0.98000.h5
Epoch 10/20
Epoch 00010: val_loss did not improve from 0.29814
Epoch 11/20
Epoch 00011: val_loss did not improve from 0.29814

Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 12/20
Epoch 00012: 

<tensorflow.python.keras.callbacks.History at 0x7f4f6b9d67b8>

In [57]:
#Model Accuracy on Train dataset : ~97%
#Model Accuracy on Validation dataset : 96%

In [32]:
#Model
model6 = Sequential([
    TimeDistributed(Conv2D(filter_params[0], (2, 2), strides=(2, 2),activation='relu', padding='same'), input_shape=input_shape),

    TimeDistributed(Conv2D(filter_params[1], (2, 2),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    TimeDistributed(Conv2D(filter_params[2], (2, 2),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    #TimeDistributed(Conv2D(filter_params[3], (2, 2),padding='same', activation='relu')),
    TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))),

    TimeDistributed(BatchNormalization()),
    Dropout(0.25),

    TimeDistributed(Flatten()),

    Dense(dense_params[0], activation='relu'),
    Dropout(0.25),
    
    #Dense(dense_params[1], activation='relu'),
    #Dropout(0.25),

    GRU(dense_params[1], return_sequences=False),
    Dense(dense_params[2], activation='softmax')
])

In [33]:
optimiser = Adam()
model6.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model6.summary())

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
time_distributed_32 (TimeDis (None, 20, 60, 60, 8)     104       
_________________________________________________________________
time_distributed_33 (TimeDis (None, 20, 60, 60, 16)    528       
_________________________________________________________________
time_distributed_34 (TimeDis (None, 20, 30, 30, 16)    0         
_________________________________________________________________
time_distributed_35 (TimeDis (None, 20, 30, 30, 32)    2080      
_________________________________________________________________
time_distributed_36 (TimeDis (None, 20, 15, 15, 32)    0         
_________________________________________________________________
time_distributed_37 (TimeDis (None, 20, 7, 7, 32)      0         
_________________________________________________________________
time_distributed_38 (TimeDis (None, 20, 7, 7, 32)     

In [40]:
model6.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Epoch 1/20
Epoch 00001: val_loss did not improve from 0.71733
Epoch 2/20
Epoch 00002: val_loss did not improve from 0.71733
Epoch 3/20
Epoch 00003: val_loss did not improve from 0.71733
Epoch 4/20
Epoch 00004: val_loss did not improve from 0.71733
Epoch 5/20
Epoch 00005: val_loss did not improve from 0.71733
Epoch 6/20
Epoch 00006: val_loss did not improve from 0.71733
Epoch 7/20
Epoch 00007: val_loss did not improve from 0.71733

Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epoch 8/20
Epoch 00008: val_loss did not improve from 0.71733
Epoch 9/20
Epoch 00009: val_loss did not improve from 0.71733
Epoch 10/20
Epoch 00010: val_loss did not improve from 0.71733
Epoch 11/20
Epoch 00011: val_loss did not improve from 0.71733

Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
Epoch 12/20
Epoch 00012: val_loss did not improve from 0.71733
Epoch 13/20
Epoch 00013: val_loss did not improve from 0.71733

Epoch 00013: ReduceLROnPlat

<tensorflow.python.keras.callbacks.History at 0x7f7380955be0>

In [None]:
#Model Accuracy on Train dataset : 100%
#Model Accuracy on Validation dataset : 70%

#### Observations on Experiment 2

Model 5 (Conv3D+Maxpooling3D) gave the best performance overall after reducing the model complexity, whereas the Model 6 (Conv2D+RNN) overfit the train data even with reducing the complexity

### Experiment 3

##### Transfer Learning

In [59]:
from tensorflow.keras.applications import mobilenet

model7 = Sequential([
        TimeDistributed(mobilenet.MobileNet(weights='imagenet', include_top=False), input_shape=input_shape),
    
        TimeDistributed(BatchNormalization()),
        TimeDistributed(MaxPooling2D((2, 2))),
    
        TimeDistributed(Flatten()),

        GRU(dense_params[1], return_sequences=False),
        Dropout(0.25),
        
        Dense(dense_params[1],activation='relu'),
        Dropout(0.25),
        
        Dense(dense_params[2], activation='softmax')
    ])



In [60]:
optimiser = Adam()
model7.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model7.summary())

Model: "sequential_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
time_distributed_68 (TimeDis (None, 20, 3, 3, 1024)    3228864   
_________________________________________________________________
time_distributed_69 (TimeDis (None, 20, 3, 3, 1024)    4096      
_________________________________________________________________
time_distributed_70 (TimeDis (None, 20, 1, 1, 1024)    0         
_________________________________________________________________
time_distributed_71 (TimeDis (None, 20, 1024)          0         
_________________________________________________________________
gru_12 (GRU)                 (None, 128)               443136    
_________________________________________________________________
dropout_34 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_34 (Dense)             (None, 128)             

In [61]:
model7.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Epoch 1/20
Epoch 00001: val_loss improved from 0.71733 to 0.50663, saving model to model_init_2021-08-0212_07_23.286194/model-00001-0.86149-0.66866-0.50663-0.81000.h5
Epoch 2/20
Epoch 00002: val_loss did not improve from 0.50663
Epoch 3/20
Epoch 00003: val_loss improved from 0.50663 to 0.30205, saving model to model_init_2021-08-0212_07_23.286194/model-00003-0.25553-0.91045-0.30205-0.89000.h5
Epoch 4/20
Epoch 00004: val_loss did not improve from 0.30205
Epoch 5/20
Epoch 00005: val_loss improved from 0.30205 to 0.13884, saving model to model_init_2021-08-0212_07_23.286194/model-00005-0.14126-0.95672-0.13884-0.95000.h5
Epoch 6/20
Epoch 00006: val_loss improved from 0.13884 to 0.11505, saving model to model_init_2021-08-0212_07_23.286194/model-00006-0.21607-0.93582-0.11505-0.97000.h5
Epoch 7/20
Epoch 00007: val_loss did not improve from 0.11505
Epoch 8/20
Epoch 00008: val_loss did not improve from 0.11505

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
Epo

<tensorflow.python.keras.callbacks.History at 0x7f730ee0ea90>

In [64]:
#Model Accuracy on Train dataset : 100%
#Model Accuracy on Validation dataset : 95%

#### Observations on Experiment 3

Transfer Learning did help in getting a good accuracies on both Train (100%) and Validation (95%), but it is more complex then other models (highest number of parameters)

### Final Observations

After creating out base Models and conducting various experiments with them based on number of images per video, image cropping, images resizing, image normalizing, number of parameters and finally employing transfer learning

We will choose Model 5 (Conv3D+MaxPooling3D with reduced complexity) as our final model as it gives best performance with lower number of parameters

## Testing and Predicting on Final Model 

In [67]:
from tensorflow.keras.models import load_model
model_final = load_model('model_init_2021-08-0208_28_55.541983/model-00014-0.12511-0.95224-0.14157-0.96000.h5')

In [68]:
test_generator = generator(train_path, train_doc, batch_size)
batch_data, batch_labels=next(test_generator)
print(batch_labels)

Source path =  /home/datasets/Project_data/train ; batch size = 10
[[0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


In [69]:
print(np.argmax(model_final.predict(batch_data[:,:,:,:,:]),axis=1))

[4 0 1 1 3 4 0 4 3 4]
