## Problem Statement
### As a data scientist at a home electronics company which manufactures state of the art smart televisions. We want to develop a cool feature in the smart-TV that can recognise five different gestures performed by the user which will help users control the TV without using a remote. The training data consists of a few hundred videos categorized into one of the five classes. Each video (typically 2-3 seconds long) is divided into a sequence of 30 frames (images). These videos have been recorded by various people performing one of the five gestures in front of a webcam - similar to what the smart TV will use. 

### •	Thumbs up		:  Increase the volume.
### •	Thumbs down		: Decrease the volume.
### •	Left swipe		: 'Jump' backwards 10 seconds.
### •	Right swipe		: 'Jump' forward 10 seconds. 
### •	Stop			: Pause the movie. 


In [397]:
# Importing required libraries
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
import os
import datetime
import time
import abc
from abc import abstractmethod, ABCMeta
from sys import getsizeof
from imageio import imread
from skimage.transform import resize
import keras
from keras import backend as k
import tensorflow as tf

# Setting the seed for python random module, numpy and tf for reproducibility and debugging purpose
import random as rn
rn.seed(30)
np.random.seed(30)
tf.random.set_seed(30)

### The required libraries are imported. Same seed is provided to python random module, numpy and tensorflow for generating same sequence of random numbers every time the code is executed. Without seed code produces different results each time when it is run. The seed makes the experiment deterministics. The abstract method is used to define common interface for the group of related classes. It is important when multiple classes are adhered to common set of methods or properties. The libraries imread and resize are used for preprocessing of images such as reading, processing and transformation (resizing, rotating).

In [398]:
# Importing other libraries for model building purpose
from keras.models import Sequential, Model
from keras.layers import Dense, Flatten, GRU, TimeDistributed, BatchNormalization, Activation, Dropout
from keras.layers import Conv2D, Conv3D, MaxPooling2D, MaxPooling3D
from keras.layers import LSTM
from keras import optimizers
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
import cv2
import matplotlib.pyplot as plt

### For model building purpose we imported above libraries. The sequential model is used for model building layer by layer. For defining the NN architecture different layers are imported. Optimizers are used to control the weights of model thereby affecting the learning process. The call back list is applied at different stages of training to save the model after every epoch, to reduce the LR and stop training early when evaluation metric has stopped improving. The open source computer vision library is used for image processing and color space conversion.

In [399]:
# Lets initialize the path where project data is saved
desktop_path = os.path.join(os.path.join(os.path.expanduser('~')), 'Desktop')
project_folder = os.path.join(desktop_path, 'Gesture Recognition Case Study', 'Data', 'Project_data')

class ModelBuilder(metaclass=ABCMeta):
    def initializepath(self, project_folder):
        
        self.train_doc = np.random.permutation(open(os.path.join(project_folder, 'train.csv')).readlines()) # List of training data csv
        self.val_doc = np.random.permutation(open(os.path.join(project_folder, 'val.csv')).readlines()) # List of val data csv
        
        self.train_path = os.path.join(project_folder, 'train')    # Path to training images folder
        self.val_path = os.path.join(project_folder, 'val')        # Path to val images folder
        
        self.num_train_seq = len(self.train_doc)  # Number of data seq present in train doc
        self.num_val_seq = len(self.val_doc) # Number of data seq present in val doc
    
    # Lets initialize the image properties 
    def initialize_img_prop(self, img_width=100, img_height=100):
        self.img_width = img_width
        self.img_height = img_height
        self.channels = 3  # The attribute channels with value 3 represents color channels
        self.classes = 5  # The attribute classes represents the number of classes used in the classification model
        self.total_frames = 30  # The attribute shows the total no of frames/time steps used by the model to process seq of images/video
    
    # Now lets initialize the hyperparameters to control the learning process 
    def initializehyperparm(self, sample_frames=30, batch_size=20, no_of_epochs=20):
        self.sample_frames = sample_frames  # The attribute sample_frames defines the seq of images required to produce video data
        self.batch_size = batch_size  # The attribute defines batch size that should be provided during the training of the model
        self.no_of_epochs = no_of_epochs  # The no of iterations or the no of times the entire data is passed through NN for training
        
        
        
# Lets define a method that generates data for training of NN
    def generator(self, source_path, folder_list, augment=False):
        img_idx = np.round(np.linspace(0, self.total_frames-1, self.sample_frames)).astype(int)
        batch_size = self.batch_size
        
        while True:
            shuffled_data = np.random.permutation(folder_list)
            total_batches = len(shuffled_data) // batch_size
            for batch in range(total_batches):
                batch_data, batch_label = self.one_batch_data(source_path, shuffled_data, 
                                                        batch_size, batch,
                                                        img_idx, augment)
                yield batch_data, batch_label
                
                
            remaining_batches = len(shuffled_data) % batch_size
            if remaining_batches != 0:
                batch_data, batch_label = self.one_batch_data(source_path, shuffled_data,   
                                                          total_batches, batch_size, img_idx, augment, remaining_batches)
                yield batch_data, batch_label
            
            
            
# Lets generate the data for one batch
    def one_batch_data(self, source_path, shuffled_data, batch, batch_size, img_idx, augment, remaining_batches=0):
        seq_len = remaining_batches if remaining_batches else self.batch_size
        batch_data = np.zeros((seq_len, len(img_idx), self.img_height, self.img_width, self.channels))
        batch_label = np.zeros((seq_len, self.classes))

        if (augment):
            batch_data_aug = np.zeros((seq_len, len(img_idx), self.img_height, self.img_width, self.channels))

        for folder in range(seq_len):
            
            imgs = os.listdir(source_path + '/' + shuffled_data[folder + (batch*batch_size)].split(';')[0])
            
            for idx, item in enumerate(img_idx):
                    
                img_read = imread(source_path + '/' + shuffled_data[folder + (batch*batch_size)].strip().split(';')[0]
                                         + '/' + imgs[item]).astype(np.float32)
                img_resize = resize(img_read, (self.img_height, self.img_width, 3))

                batch_data[folder, idx, :, :, 0] = (img_resize[:, :, 0]) / 255
                batch_data[folder, idx, :, :, 1] = (img_resize[:, :, 1]) / 255
                batch_data[folder, idx, :, :, 2] = (img_resize[:, :, 2]) / 255

                if (augment):
                    transformed_img = cv2.warpAffine(img_read, np.float32([[1, 0, np.random.randint(-30, 30)],
                                                                          [0, 1, np.random.randint(-30, 30)]]),
                                                    (img_read.shape[1], img_read.shape[0]))
                    gray_scale = cv2.cvtColor(transformed_img, cv2.COLOR_BGR2GRAY)
                    x0, y0 = np.argwhere(gray_scale > 0).min(axis=0)
                    x1, y1 = np.argwhere(gray_scale > 0).max(axis=0)
                    cropped_img = transformed_img[x0:x1, y0:y1, :]
                    img_resize = resize(cropped_img, (self.img_height, self.img_width, 3))

                    batch_data_aug[folder, idx, :, :, 0] = (img_resize[:, :, 0]) / 255
                    batch_data_aug[folder, idx, :, :, 1] = (img_resize[:, :, 1]) / 255
                    batch_data_aug[folder, idx, :, :, 2] = (img_resize[:, :, 2]) / 255
                    
                               
            batch_label[folder, int(shuffled_data[folder + (batch*batch_size)].strip().split(';')[2])] = 1
    
        if (augment):
            batch_data = np.concatenate([batch_data, batch_data_aug])
            batch_label = np.concatenate([batch_label, batch_label])

        return batch_data, batch_label                                            
                                                  
        
        
# Lets define a method that trains NN using generator to load and augment data
    def train_model(self, model, augment_data=False):
        train_data_generator = self.generator(self.train_path, self.train_doc, augment=augment_data)                                            
        val_data_generator = self.generator(self.val_path, self.val_doc)         
         
        model_name = 'model_init' + '_' + str(datetime.datetime.now()).replace(' ', '').replace(':', '_') + '/'                                         
                                                     
        if not os.path.exists(model_name):
            os.mkdir(model_name)                                      
                                                     
        filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'                                            
                                                     
        checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False,
                                     mode='auto')                                            
        LR = ReduceLROnPlateau(monitor='val_loss', verbose=1, patience=4, factor=0.2)                                            
        earlystop = EarlyStopping(monitor='val_loss', verbose=1, patience=10, min_delta=0)                                            
        callback_list = [checkpoint, LR, earlystop]
                                                     
        if (self.num_train_seq % self.batch_size) == 0:
            steps_per_epoch = int(self.num_train_seq / self.batch_size)                                      
        else:                                     
            steps_per_epoch = (self.num_train_seq // self.batch_size) + 1                                      
                                                     
        if (self.num_val_seq % self.batch_size) == 0:
            val_steps = int(self.num_val_seq / self.batch_size)                                      
        else:                                     
            val_steps = (self.num_val_seq // self.batch_size) + 1                                            
                                                     
        history = model.fit_generator(train_data_generator, steps_per_epoch=steps_per_epoch, epochs=self.no_of_epochs, verbose=1, 
                            callbacks=callback_list, validation_data=val_data_generator, 
                            validation_steps=val_steps, class_weight=None, workers=1, initial_epoch=0)                                            
                                                     
        return history                                            
                                                     
    @abc.abstractmethod
    def define_model(self):
        pass                                                                                        

### 1) In first block of code we defined 3 methods initializepath, image prop and hyperparameters. The first method initializes path for training and validation datasets. Read the csv files containing sequences of data and sets path accordingly. The second method initializes the img prop such as height, width, color channels, classes and total no of frames used by the model. The 3rd method initializes the hyperparameters to control the training of model such as sample frames, batch size and no of epochs.
### 2) Second block of code uses the method generator to create data for single batch using one batch data method. The generator creates the list of indices of frames to be sampled, defines the size of each batch, shuffles the data and calculates the total no of batches formed using length of data and eac batch size. The one batch data method iterates over total batches formed and yields data for single batch and corrosponding labels. There might be remaining data that doesnt fit fully into a batch. If there are remaining batches, one batch data is again called with adjusted parameters and yeilds the data and labels for that batches. 
### 3) In 3rd block we set the length of data sequences in a batch either to batch size or to remaining batches if present. The batch data and label are arrays to store preprocessed img data and labels. The code then enters the loop that iterates over each sequence in batch and returns the list of imgs from shuffled data. Each img in img idx is then read and resized. There pixel values are normalised between 0 and 1. If data augmentation is specified, we provide additional data with  diversity for training of model. It applies affine transformation to img to slide the img horizontally and vertically. The color scale of img is also changed to gray. The nonzero coordinates of gray scale img are used to outline the imp feature. The img is then cropped using coordinates. The resized img is stored in batch data augmentation with normalization. For each sequence in batch it extracts the label from shuffled data and make corrosponding entry to 1. If augmentation is true, the augmented data and batch data is merged.
### 4) The next block is responsible for training of NN using data generators. Using generator method train and val sets are generated. The directory model name is created to save the trained models and related onfo. It creates timestamp to make each run unique. A filepath is then created to format the saved model name. It takes a placeholder for epoch, train loss, train acc, val loss, val acc. A callback list is used to save the models with best val loss during training. It monitors the val loss and saves entire model after every epoch. The model is then trained using fit generator and return the history of training.

In [400]:
# Lets build a sample model architecture with different layers
class Conv3DModel(ModelBuilder):
    def define_model(self):
        model = Sequential()
        model.add(Conv3D(16,(3,3,3), padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(1, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(1, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(1, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(1, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(128, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(0.5))  # Hidden Layer
        
        model.add(Dense(64, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(0.25)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

###  The class Conv3DModel is a child class which inherits the properties from parent class ModelBuilder. A sequential model is created and layers are added to the model one by one. Zero padding is added to input therefore the shape of output is same as input. A convolutional filter layer is added with 16,32,64,128 filters each of size (3,3,3). A Relu activation layer is then added to bring some nonlinearity in model. A batch normalization layer normalizes the output from prev layer thus speeding up the process and giving stability to model. A maxpooling3D layer is added to reduce the dimensions of input feature. A flatten layer converts the 3D output to 1D array preparing it for fully connected layer. A dense layer with 64 and 128 nuerons with activation function relu is added as a fully connected layer. A dropout layer with rate 05 and 0.25 is then added to reduce the overfitting. At last a dense layer with 5 output classes and softmax activation function is added to produce o/p probabilities for each class. The model is then compiled using adam optimizer to bring down categorical crossentropy loss function and increase the accuracy meric.

In [401]:
# Set up the model with different parameters
sample_Conv_model = Conv3DModel()
sample_Conv_model.initializepath(project_folder)
sample_Conv_model.initialize_img_prop(img_width = 160, img_height = 160)
sample_Conv_model.initializehyperparm(sample_frames = 30, batch_size = 10, no_of_epochs = 1)
sample_Conv_model1 = sample_Conv_model.define_model()
sample_Conv_model1.summary()

Model: "sequential_112"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_480 (Conv3D)         (None, 30, 160, 160, 16   1312      
                             )                                   
                                                                 
 activation_480 (Activation  (None, 30, 160, 160, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_746 (B  (None, 30, 160, 160, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_384 (MaxPool  (None, 30, 80, 80, 16)    0         
 ing3D)                                                          
                                                                 
 conv3d_481 (Conv3D)         (None, 30, 80, 80, 32) 

### The conv3D model is set with different parameters. The sample conv model is initialized by calling conv3Dmodel class. The methods defined in parent class model builder are inherited by child class conv3Dmodel to set up the sample conv model. The abstract method define model defines the model architecture and configuration for consistent structure. The summary of model architecture, layer types, output shapes, and the number of parameters are shown.

In [402]:
# Experimenting with image resolution, sample frames to use and batch size
# sample_Conv_model.train_model(sample_Conv_model1)

### The sample conv model is an instance of conv3Dmodel class which is a child class of ModelBuilder class. Here we called train model method defined in ModelBuilder class to train the sample conv model1 which is a NN model.

In [403]:
# Lets trade off between these hyperparameters
# sample_Conv_model = Conv3DModel()
# sample_Conv_model.initializepath(project_folder)
# sample_Conv_model.initialize_img_prop(img_width = 100, img_height = 100)
# sample_Conv_model.initializehyperparm(sample_frames = 15, batch_size = 30, no_of_epochs = 2)
# sample_Conv_model1 = sample_Conv_model.define_model()
# print(f'Total Params: {sample_Conv_model1.count_params()}')
# sample_Conv_model.train_model(sample_Conv_model1)

In [404]:
# Lets trade off between these hyperparameters
# sample_Conv_model = Conv3DModel()
# sample_Conv_model.initializepath(project_folder)
# sample_Conv_model.initialize_img_prop(img_width = 100, img_height = 100)
# sample_Conv_model.initializehyperparm(sample_frames = 30, batch_size = 20, no_of_epochs = 2)
# sample_Conv_model1 = sample_Conv_model.define_model()
# print(f'Total Params: {sample_Conv_model1.count_params()}')
# sample_Conv_model.train_model(sample_Conv_model1)

In [405]:
# Lets trade off between these hyperparameters
# sample_Conv_model = Conv3DModel()
# sample_Conv_model.initializepath(project_folder)
# sample_Conv_model.initialize_img_prop(img_width = 160, img_height = 160)
# sample_Conv_model.initializehyperparm(sample_frames = 30, batch_size = 15, no_of_epochs = 2)
# sample_Conv_model1 = sample_Conv_model.define_model()
# print(f'Total Params: {sample_Conv_model1.count_params()}')
# sample_Conv_model.train_model(sample_Conv_model1)

In [406]:
# Lets trade off between these hyperparameters
# sample_Conv_model = Conv3DModel()
# sample_Conv_model.initializepath(project_folder)
# sample_Conv_model.initialize_img_prop(img_width = 160, img_height = 160)
# sample_Conv_model.initializehyperparm(sample_frames = 16, batch_size = 30, no_of_epochs = 2)
# sample_Conv_model1 = sample_Conv_model.define_model()
# print(f'Total Params: {sample_Conv_model1.count_params()}')
# sample_Conv_model.train_model(sample_Conv_model1)

### From the above experiments we can see that, the image resolution and sample frames in sequence have more impact on training time than batch size. We can take batch size from 15 to 40 and will change the resolution of image from 160,160 to 120,120 based on model performance.

# Model 1: Image Resolution: (160,160), Sample Frames: 20, Batch Size: 40, No of Epochs: 15

In [407]:
# Lets build model 1 architecture with different layers
class Conv3DModel(ModelBuilder):
    def define_model(self, filter_size=(3,3,3), dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [408]:
# Set up the model 1 with different parameters
Conv_model = Conv3DModel()
Conv_model.initializepath(project_folder)
Conv_model.initialize_img_prop(img_width = 160, img_height = 160)
Conv_model.initializehyperparm(sample_frames = 20, batch_size = 40, no_of_epochs = 15) 
Conv_model1 = Conv_model.define_model()
Conv_model1.summary()

Model: "sequential_113"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_484 (Conv3D)         (None, 20, 160, 160, 16   1312      
                             )                                   
                                                                 
 activation_484 (Activation  (None, 20, 160, 160, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_752 (B  (None, 20, 160, 160, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_388 (MaxPool  (None, 10, 80, 80, 16)    0         
 ing3D)                                                          
                                                                 
 conv3d_485 (Conv3D)         (None, 10, 80, 80, 32) 

In [409]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model1.count_params()}')
# model1_hist = Conv_model.train_model(Conv_model1)

# Model 2: Image Resolution: (160,160), Sample Frames: 20, Batch Size: 20, No of Epochs: 25
## Adding dropout layers to reduce overfitting

In [410]:
# Set up the model 2 with different parameters
Conv_model1 = Conv3DModel()
Conv_model1.initializepath(project_folder)
Conv_model1.initialize_img_prop(img_width = 160, img_height = 160)
Conv_model1.initializehyperparm(sample_frames = 20, batch_size = 20, no_of_epochs = 25) 
Conv_model2 = Conv_model1.define_model(dens_nuerons=256, dropout=0.5)
Conv_model2.summary()

Model: "sequential_114"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_488 (Conv3D)         (None, 20, 160, 160, 16   1312      
                             )                                   
                                                                 
 activation_488 (Activation  (None, 20, 160, 160, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_758 (B  (None, 20, 160, 160, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_392 (MaxPool  (None, 10, 80, 80, 16)    0         
 ing3D)                                                          
                                                                 
 conv3d_489 (Conv3D)         (None, 10, 80, 80, 32) 

In [411]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model2.count_params()}')
# model2_hist = Conv_model1.train_model(Conv_model2, augment_data=True)

# Model 3: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 30, No of Epochs: 25
## Reduce the filter size to capture less complex features, to reduce model complexity and to lower the memory requiements, learning rate to 0.0002 to reduce overfitting 

In [412]:
# Lets build model 3 architecture with reduced LR
class Conv3DModel(ModelBuilder):
    def define_model(self, filter_size=(3,3,3), dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam(learning_rate=0.0002)
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [413]:
# Set up the model 3 with different parameters
Conv_model2 = Conv3DModel()
Conv_model2.initializepath(project_folder)
Conv_model2.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model2.initializehyperparm(sample_frames = 16, batch_size = 30, no_of_epochs = 25) 
Conv_model3 = Conv_model2.define_model(filter_size=(2,2,2), dens_nuerons=256, dropout=0.5)
Conv_model3.summary()

Model: "sequential_115"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_492 (Conv3D)         (None, 16, 120, 120, 16   400       
                             )                                   
                                                                 
 activation_492 (Activation  (None, 16, 120, 120, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_764 (B  (None, 16, 120, 120, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_396 (MaxPool  (None, 8, 60, 60, 16)     0         
 ing3D)                                                          
                                                                 
 conv3d_493 (Conv3D)         (None, 8, 60, 60, 32)  

In [414]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model3.count_params()}')
# model3_hist = Conv_model2.train_model(Conv_model3, augment_data=True)

###  We get total params half of the model 2 by reducing filter size. Lets add more layers

# Model 4: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 20, No of Epochs: 25
## Adding more convolutional layers with earlier filter size

In [415]:
# Lets build model 4 architecture with adding more layers
class Conv3DModel(ModelBuilder):
    def define_model(self, filter_size=(3,3,3), dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())             # Convolutional Layer
        
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [416]:
# Set up the model 4 with different parameters
Conv_model3 = Conv3DModel()
Conv_model3.initializepath(project_folder)
Conv_model3.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model3.initializehyperparm(sample_frames = 16, batch_size = 20, no_of_epochs = 25) 
Conv_model4 = Conv_model3.define_model(filter_size=(3,3,3), dens_nuerons=256, dropout=0.5)
Conv_model4.summary()

Model: "sequential_116"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_496 (Conv3D)         (None, 16, 120, 120, 16   1312      
                             )                                   
                                                                 
 activation_496 (Activation  (None, 16, 120, 120, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_770 (B  (None, 16, 120, 120, 16   64        
 atchNormalization)          )                                   
                                                                 
 conv3d_497 (Conv3D)         (None, 16, 120, 120, 16   6928      
                             )                                   
                                                                 
 activation_497 (Activation  (None, 16, 120, 120, 16

In [417]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model4.count_params()}')
# model4_hist = Conv_model3.train_model(Conv_model4, augment_data=True)

# Model 5: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 20, No of Epochs: 15
## Add dropout layers after every convolutional layer to reduce overfitting

In [418]:
# Lets build model 5 architecture with adding dropout layers
class Conv3DModel(ModelBuilder):
    def define_model(self, filter_size=(3,3,3), dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())             # Convolutional Layer
        
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        model.add(Dropout(dropout))                  # Dropout Layer
    
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        model.add(Dropout(dropout))                  # Dropout Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        model.add(Dropout(dropout))                  # Dropout Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        model.add(Dropout(dropout))                  # Dropout Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [419]:
# Set up the model 5 with different parameters
Conv_model4 = Conv3DModel()
Conv_model4.initializepath(project_folder)
Conv_model4.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model4.initializehyperparm(sample_frames = 16, batch_size = 20, no_of_epochs = 15) 
Conv_model5 = Conv_model4.define_model(filter_size=(3,3,3), dens_nuerons=256, dropout=0.25)
Conv_model5.summary()

Model: "sequential_117"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_504 (Conv3D)         (None, 16, 120, 120, 16   1312      
                             )                                   
                                                                 
 activation_504 (Activation  (None, 16, 120, 120, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_780 (B  (None, 16, 120, 120, 16   64        
 atchNormalization)          )                                   
                                                                 
 conv3d_505 (Conv3D)         (None, 16, 120, 120, 16   6928      
                             )                                   
                                                                 
 activation_505 (Activation  (None, 16, 120, 120, 16

In [420]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model5.count_params()}')
# model5_hist = Conv_model4.train_model(Conv_model5, augment_data=True)

### The model doesnt seem to be generalized. Lets try reducing the total params and see the performance

# Model 6: Image Resolution: (100,100), Sample Frames: 16, Batch Size: 20, No of Epochs: 20
## Reduce the filter size again to reduce model complexity. Also reduce the no of neurons in dense layer

In [421]:
# Lets build model 6 architecture with reduced filter size
class Conv3DModel(ModelBuilder):
    def define_model(self, dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,(3,3,3), padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam(learning_rate=0.0002)
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [422]:
# Set up the model 6 with different parameters
Conv_model5 = Conv3DModel()
Conv_model5.initializepath(project_folder)
Conv_model5.initialize_img_prop(img_width = 100, img_height = 100)
Conv_model5.initializehyperparm(sample_frames = 16, batch_size = 20, no_of_epochs = 20) 
Conv_model6 = Conv_model5.define_model(dens_nuerons=128, dropout=0.25)
Conv_model6.summary()

Model: "sequential_118"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_512 (Conv3D)         (None, 16, 100, 100, 16   1312      
                             )                                   
                                                                 
 activation_512 (Activation  (None, 16, 100, 100, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_790 (B  (None, 16, 100, 100, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_408 (MaxPool  (None, 8, 50, 50, 16)     0         
 ing3D)                                                          
                                                                 
 conv3d_513 (Conv3D)         (None, 8, 50, 50, 32)  

In [423]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model6.count_params()}')
# model5_hist = Conv_model5.train_model(Conv_model6, augment_data=True)

### After reducing no of params we get the best validation accuracy

# Model 7: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 20, No of Epochs: 25
## Increase the image resolution and no of epoches and reduce the no of neurons in dense layer to 64

In [424]:
# Lets build model 7 architecture with reduced filter size
class Conv3DModel(ModelBuilder):
    def define_model(self, dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,(3,3,3), padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,(3,3,3), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam(learning_rate=0.0002)
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [425]:
# Set up the model 7 with different parameters
Conv_model6 = Conv3DModel()
Conv_model6.initializepath(project_folder)
Conv_model6.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model6.initializehyperparm(sample_frames = 16, batch_size = 20, no_of_epochs = 20) 
Conv_model7 = Conv_model6.define_model(dens_nuerons=64, dropout=0.25)
Conv_model7.summary()

Model: "sequential_119"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_516 (Conv3D)         (None, 16, 120, 120, 16   1312      
                             )                                   
                                                                 
 activation_516 (Activation  (None, 16, 120, 120, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_796 (B  (None, 16, 120, 120, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_412 (MaxPool  (None, 8, 60, 60, 16)     0         
 ing3D)                                                          
                                                                 
 conv3d_517 (Conv3D)         (None, 8, 60, 60, 32)  

In [426]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model7.count_params()}')
# model7_hist = Conv_model6.train_model(Conv_model7, augment_data=True)

# Model 8: Image Resolution: (120,120), Sample Frames: 18, Batch Size: 20, No of Epochs: 25
## CNN + LSTM Model

In [427]:
# Lets build model 8 architecture with CNN+LSTM
class CNNLSTM(ModelBuilder):
    def define_model(self, lstm_cells=64, dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(TimeDistributed(Conv2D(16,(3,3), padding='same', activation='relu',
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels))))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
    
        model.add(TimeDistributed(Conv2D(32,(3,3), padding='same', activation='relu')))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
        
        model.add(TimeDistributed(Conv2D(64,(3,3), padding='same', activation='relu')))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
        
        model.add(TimeDistributed(Conv2D(128,(3,3), padding='same', activation='relu')))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
        
        model.add(TimeDistributed(Conv2D(256,(3,3), padding='same', activation='relu')))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
        
        
        model.add(TimeDistributed(Flatten()))  # Input Layer
        
        
        model.add(LSTM(lstm_cells))
        model.add(Dropout(dropout))  # LSTM Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [428]:
# Set up the model 8 with different parameters
Conv_model7 = CNNLSTM()
Conv_model7.initializepath(project_folder)
Conv_model7.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model7.initializehyperparm(sample_frames = 18, batch_size = 20, no_of_epochs = 20) 
Conv_model8 = Conv_model7.define_model(lstm_cells=128, dens_nuerons=128, dropout=0.25)
Conv_model8.build((None, 18, 120, 120, 3))
Conv_model8.summary()

Model: "sequential_120"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed_208 (Time  (None, 18, 120, 120, 16   448       
 Distributed)                )                                   
                                                                 
 time_distributed_209 (Time  (None, 18, 120, 120, 16   64        
 Distributed)                )                                   
                                                                 
 time_distributed_210 (Time  (None, 18, 60, 60, 16)    0         
 Distributed)                                                    
                                                                 
 time_distributed_211 (Time  (None, 18, 60, 60, 32)    4640      
 Distributed)                                                    
                                                                 
 time_distributed_212 (Time  (None, 18, 60, 60, 32) 

In [429]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model8.count_params()}')
# model8_hist = Conv_model7.train_model(Conv_model8, augment_data=True)

### There are still cases of overfitting. Lets augment the data with slight rotation and run same set of models again. The time distributed wrapper applies the same layer to each time step of input sequence independently. Therefore we need to apply 2D convolution to each frame of 3D input sequence. Here, the convolutional layer is used for spatial feature extraction and LSTM layer is used for capturing the temporal dependencies in sequence of features.

# Model 9: Image Resolution: (160,160), Sample Frames: 20, Batch Size: 20, No of Epochs: 20 (Data Augmentation)
## Similar to model 2

In [430]:
# Lets apply some data augmentation and check model performance
desktop_path = os.path.join(os.path.join(os.path.expanduser('~')), 'Desktop')
project_folder = os.path.join(desktop_path, 'Gesture Recognition Case Study', 'Data', 'Project_data')

class ModelBuilderMoreAugmentation(metaclass=ABCMeta):
    def initializepath(self, project_folder):
        
        self.train_doc = np.random.permutation(open(os.path.join(project_folder, 'train.csv')).readlines()) # List of training data csv
        self.val_doc = np.random.permutation(open(os.path.join(project_folder, 'val.csv')).readlines()) # List of val data csv
        
        self.train_path = os.path.join(project_folder, 'train')    # Path to training images folder
        self.val_path = os.path.join(project_folder, 'val')        # Path to val images folder
        
        self.num_train_seq = len(self.train_doc)  # Number of data seq present in train doc
        self.num_val_seq = len(self.val_doc) # Number of data seq present in val doc
    
    # Lets initialize the image properties 
    def initialize_img_prop(self, img_width=100, img_height=100):
        self.img_width = img_width
        self.img_height = img_height
        self.channels = 3  # The attribute channels with value 3 represents color channels
        self.classes = 5  # The attribute classes represents the number of classes used in the classification model
        self.total_frames = 30  # The attribute shows the total no of frames/time steps used by the model to process seq of images/video
    
    # Now lets initialize the hyperparameters to control the learning process 
    def initializehyperparm(self, sample_frames=30, batch_size=20, no_of_epochs=20):
        self.sample_frames = sample_frames  # The attribute sample_frames defines the seq of images required to produce video data
        self.batch_size = batch_size  # The attribute defines batch size that should be provided during the training of the model
        self.no_of_epochs = no_of_epochs  # The no of iterations or the no of times the entire data is passed through NN for training
        
        
        
# Lets define a method that generates data for training of NN
    def generator(self, source_path, folder_list, augment=False):
        img_idx = np.round(np.linspace(0, self.total_frames-1, self.sample_frames)).astype(int)
        batch_size = self.batch_size
        
        while True:
            shuffled_data = np.random.permutation(folder_list)
            total_batches = len(shuffled_data) // batch_size
            for batch in range(total_batches):
                batch_data, batch_label = self.one_batch_data(source_path, shuffled_data, 
                                                        batch_size, batch,
                                                        img_idx, augment)
                yield batch_data, batch_label
                
                
            remaining_batches = len(shuffled_data) % batch_size
            if remaining_batches != 0:
                batch_data, batch_label = self.one_batch_data(source_path, shuffled_data,   
                                                          total_batches, batch_size, img_idx, augment, remaining_batches)
                yield batch_data, batch_label
            
            
            
# Lets generate the data for one batch
    def one_batch_data(self, source_path, shuffled_data, batch, batch_size, img_idx, augment, remaining_batches=0):
        seq_len = remaining_batches if remaining_batches else self.batch_size
        batch_data = np.zeros((seq_len, len(img_idx), self.img_height, self.img_width, self.channels))
        batch_label = np.zeros((seq_len, self.classes))

        if (augment):
            batch_data_aug = np.zeros((seq_len, len(img_idx), self.img_height, self.img_width, self.channels))

        for folder in range(seq_len):
            
            imgs = os.listdir(source_path + '/' + shuffled_data[folder + (batch*batch_size)].split(';')[0])
            
            for idx, item in enumerate(img_idx):
                    
                img_read = imread(source_path + '/' + shuffled_data[folder + (batch*batch_size)].strip().split(';')[0]
                                         + '/' + imgs[item]).astype(np.float32)
                img_resize = resize(img_read, (self.img_height, self.img_width, 3))

                batch_data[folder, idx, :, :, 0] = (img_resize[:, :, 0]) / 255
                batch_data[folder, idx, :, :, 1] = (img_resize[:, :, 1]) / 255
                batch_data[folder, idx, :, :, 2] = (img_resize[:, :, 2]) / 255

                if (augment):
                    transformed_img = cv2.warpAffine(img_read, np.float32([[1, 0, np.random.randint(-30, 30)],
                                                                          [0, 1, np.random.randint(-30, 30)]]),
                                                    (img_read.shape[1], img_read.shape[0]))
                    gray_scale = cv2.cvtColor(transformed_img, cv2.COLOR_BGR2GRAY)
                    x0, y0 = np.argwhere(gray_scale > 0).min(axis=0)
                    x1, y1 = np.argwhere(gray_scale > 0).max(axis=0)
                    cropped_img = transformed_img[x0:x1, y0:y1, :]
                    img_resize = resize(cropped_img, (self.img_height, self.img_width, 3))
                    
                    matrix = cv2.getRotationMatrix2D((self.img_width//2, self.img_height//2), np.random.randint(-10,10), 1.0)
                    rotated = cv2.warpAffine(img_resize, matrix, (self.img_width, self.img_height))

                    batch_data_aug[folder, idx, :, :, 0] = (rotated[:, :, 0]) / 255
                    batch_data_aug[folder, idx, :, :, 1] = (rotated[:, :, 1]) / 255
                    batch_data_aug[folder, idx, :, :, 2] = (rotated[:, :, 2]) / 255
                    
                               
            batch_label[folder, int(shuffled_data[folder + (batch*batch_size)].strip().split(';')[2])] = 1
    
        if (augment):
            batch_data = np.concatenate([batch_data, batch_data_aug])
            batch_label = np.concatenate([batch_label, batch_label])

        return batch_data, batch_label                                            
                                                  
        
        
# Lets define a method that trains NN using generator to load and augment data
    def train_model(self, model, augment_data=False):
        train_data_generator = self.generator(self.train_path, self.train_doc, augment=augment_data)                                            
        val_data_generator = self.generator(self.val_path, self.val_doc)         
         
        model_name = 'model_init' + '_' + str(datetime.datetime.now()).replace(' ', '').replace(':', '_') + '/'                                         
                                                     
        if not os.path.exists(model_name):
            os.mkdir(model_name)                                      
                                                     
        filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'                                            
                                                     
        checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False,
                                     mode='auto')                                            
        LR = ReduceLROnPlateau(monitor='val_loss', verbose=1, patience=4, factor=0.2)                                            
                                                   
        callback_list = [checkpoint, LR]
                                                     
        if (self.num_train_seq % self.batch_size) == 0:
            steps_per_epoch = int(self.num_train_seq / self.batch_size)                                      
        else:                                     
            steps_per_epoch = (self.num_train_seq // self.batch_size) + 1                                      
                                                     
        if (self.num_val_seq % self.batch_size) == 0:
            val_steps = int(self.num_val_seq / self.batch_size)                                      
        else:                                     
            val_steps = (self.num_val_seq // self.batch_size) + 1                                            
                                                     
        history = model.fit_generator(train_data_generator, steps_per_epoch=steps_per_epoch, epochs=self.no_of_epochs, verbose=1, 
                            callbacks=callback_list, validation_data=val_data_generator, 
                            validation_steps=val_steps, class_weight=None, workers=1, initial_epoch=0)                                            
                                                     
        return history                                            
                                                     
    @abc.abstractmethod
    def define_model(self):
        pass                                                                                        

In [431]:
# Lets build model 9 architecture with data augmentation
class Conv3DModel9(ModelBuilderMoreAugmentation):
    def define_model(self, filter_size=(3,3,3), dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [432]:
# Set up the model 9 with different parameters
Conv_model8 = Conv3DModel9()
Conv_model8.initializepath(project_folder)
Conv_model8.initialize_img_prop(img_width = 160, img_height = 160)
Conv_model8.initializehyperparm(sample_frames = 20, batch_size = 20, no_of_epochs = 20) 
Conv_model9 = Conv_model8.define_model(dens_nuerons=256, dropout=0.5)
Conv_model9.summary()

Model: "sequential_121"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_520 (Conv3D)         (None, 20, 160, 160, 16   1312      
                             )                                   
                                                                 
 activation_520 (Activation  (None, 20, 160, 160, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_807 (B  (None, 20, 160, 160, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_416 (MaxPool  (None, 10, 80, 80, 16)    0         
 ing3D)                                                          
                                                                 
 conv3d_521 (Conv3D)         (None, 10, 80, 80, 32) 

In [433]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model9.count_params()}')
# model9_hist = Conv_model8.train_model(Conv_model9, augment_data=True)

# Model 10: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 30, No of Epochs: 25 (Data Augmentation)
## Similar to model 3

In [434]:
# Lets build model 10 architecture with data augmentation
class Conv3DModel9(ModelBuilderMoreAugmentation):
    def define_model(self, filter_size=(3,3,3), dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam(learning_rate=0.0002)
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [435]:
# Set up the model 10 with different parameters
Conv_model9 = Conv3DModel9()
Conv_model9.initializepath(project_folder)
Conv_model9.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model9.initializehyperparm(sample_frames = 16, batch_size = 30, no_of_epochs = 25) 
Conv_model10 = Conv_model9.define_model(filter_size=(2,2,2), dens_nuerons=256, dropout=0.5)
Conv_model10.summary()

Model: "sequential_122"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_524 (Conv3D)         (None, 16, 120, 120, 16   400       
                             )                                   
                                                                 
 activation_524 (Activation  (None, 16, 120, 120, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_813 (B  (None, 16, 120, 120, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_420 (MaxPool  (None, 8, 60, 60, 16)     0         
 ing3D)                                                          
                                                                 
 conv3d_525 (Conv3D)         (None, 8, 60, 60, 32)  

In [436]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model10.count_params()}')
# model10_hist = Conv_model9.train_model(Conv_model10, augment_data=True)

# Model 11: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 20, No of Epochs: 25 (Data Augmentation)
## Similar to model 4

In [437]:
# Lets build model 11 architecture with adding more layers
class Conv3DModel9(ModelBuilderMoreAugmentation):
    def define_model(self, filter_size=(3,3,3), dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())             # Convolutional Layer
        
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [438]:
# Set up the model 11 with different parameters
Conv_model10 = Conv3DModel9()
Conv_model10.initializepath(project_folder)
Conv_model10.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model10.initializehyperparm(sample_frames = 16, batch_size = 20, no_of_epochs = 25) 
Conv_model11 = Conv_model10.define_model(filter_size=(3,3,3), dens_nuerons=256, dropout=0.5)
Conv_model11.summary()

Model: "sequential_123"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_528 (Conv3D)         (None, 16, 120, 120, 16   1312      
                             )                                   
                                                                 
 activation_528 (Activation  (None, 16, 120, 120, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_819 (B  (None, 16, 120, 120, 16   64        
 atchNormalization)          )                                   
                                                                 
 conv3d_529 (Conv3D)         (None, 16, 120, 120, 16   6928      
                             )                                   
                                                                 
 activation_529 (Activation  (None, 16, 120, 120, 16

In [439]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model11.count_params()}')
# model11_hist = Conv_model10.train_model(Conv_model11, augment_data=True)

# Model 12: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 20, No of Epochs: 25 (Data Augmentation)
## Similar to model 5

In [440]:
# Lets build model 12 architecture with adding dropout layers
class Conv3DModel9(ModelBuilderMoreAugmentation):
    def define_model(self, filter_size=(3,3,3), dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())             # Convolutional Layer
        
        model.add(Conv3D(16,filter_size, padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        model.add(Dropout(dropout))                  # Dropout Layer
    
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(32,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        model.add(Dropout(dropout))                  # Dropout Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(64,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        model.add(Dropout(dropout))                  # Dropout Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())              # Convolutional Layer
        
        model.add(Conv3D(128,filter_size, padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        model.add(Dropout(dropout))                  # Dropout Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam(learning_rate=0.0002)
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [441]:
# Set up the model 12 with different parameters
Conv_model11 = Conv3DModel9()
Conv_model11.initializepath(project_folder)
Conv_model11.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model11.initializehyperparm(sample_frames = 16, batch_size = 20, no_of_epochs = 25) 
Conv_model12 = Conv_model11.define_model(filter_size=(3,3,3), dens_nuerons=256, dropout=0.25)
Conv_model12.summary()

Model: "sequential_124"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_536 (Conv3D)         (None, 16, 120, 120, 16   1312      
                             )                                   
                                                                 
 activation_536 (Activation  (None, 16, 120, 120, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_829 (B  (None, 16, 120, 120, 16   64        
 atchNormalization)          )                                   
                                                                 
 conv3d_537 (Conv3D)         (None, 16, 120, 120, 16   6928      
                             )                                   
                                                                 
 activation_537 (Activation  (None, 16, 120, 120, 16

In [442]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model12.count_params()}')
# model12_hist = Conv_model11.train_model(Conv_model12, augment_data=True)

# Model 13: Image Resolution: (100,100), Sample Frames: 16, Batch Size: 20, No of Epochs: 25 (Data Augmentation)
## Similar to model 6

In [443]:
# Lets build model 13 architecture with reduced filter size
class Conv3DModel9(ModelBuilderMoreAugmentation):
    def define_model(self, dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,(3,3,3), padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam(learning_rate=0.0002)
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [444]:
# Set up the model 13 with different parameters
Conv_model12 = Conv3DModel9()
Conv_model12.initializepath(project_folder)
Conv_model12.initialize_img_prop(img_width = 100, img_height = 100)
Conv_model12.initializehyperparm(sample_frames = 16, batch_size = 20, no_of_epochs = 25) 
Conv_model13 = Conv_model12.define_model(dens_nuerons=128, dropout=0.25)
Conv_model13.summary()

Model: "sequential_125"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_544 (Conv3D)         (None, 16, 100, 100, 16   1312      
                             )                                   
                                                                 
 activation_544 (Activation  (None, 16, 100, 100, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_839 (B  (None, 16, 100, 100, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_432 (MaxPool  (None, 8, 50, 50, 16)     0         
 ing3D)                                                          
                                                                 
 conv3d_545 (Conv3D)         (None, 8, 50, 50, 32)  

In [445]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model13.count_params()}')
# model13_hist = Conv_model12.train_model(Conv_model13, augment_data=True)

# Model 14: Image Resolution: (100,100), Sample Frames: 16, Batch Size: 20, No of Epochs: 25 (Data Augmentation)
## Similar to model 7

In [446]:
# Lets build model 14 architecture with reduced filter size
class Conv3DModel9(ModelBuilderMoreAugmentation):
    def define_model(self, dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(Conv3D(16,(3,3,3), padding='same', 
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels)))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
    
        model.add(Conv3D(32,(3,3,3), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(64,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        model.add(Conv3D(128,(2,2,2), padding='same'))
        model.add(Activation('relu'))
        model.add(BatchNormalization())
        model.add(MaxPooling3D(pool_size=(2, 2, 2))) # Convolutional Layer
        
        
        model.add(Flatten())  # Input Layer
        
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout))  # Hidden Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam(learning_rate=0.0002)
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [447]:
# Set up the model 14 with different parameters
Conv_model13 = Conv3DModel9()
Conv_model13.initializepath(project_folder)
Conv_model13.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model13.initializehyperparm(sample_frames = 16, batch_size = 20, no_of_epochs = 25) 
Conv_model14 = Conv_model13.define_model(dens_nuerons=64, dropout=0.25)
Conv_model14.summary()

Model: "sequential_126"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_548 (Conv3D)         (None, 16, 120, 120, 16   1312      
                             )                                   
                                                                 
 activation_548 (Activation  (None, 16, 120, 120, 16   0         
 )                           )                                   
                                                                 
 batch_normalization_845 (B  (None, 16, 120, 120, 16   64        
 atchNormalization)          )                                   
                                                                 
 max_pooling3d_436 (MaxPool  (None, 8, 60, 60, 16)     0         
 ing3D)                                                          
                                                                 
 conv3d_549 (Conv3D)         (None, 8, 60, 60, 32)  

In [448]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model14.count_params()}')
# model14_hist = Conv_model13.train_model(Conv_model14, augment_data=True)

# Model 15: Image Resolution: (120,120), Sample Frames: 18, Batch Size: 20, No of Epochs: 20 (Data Augmentation)
## Similar to model 8 CNN+GRU

In [449]:
# Lets build model 15 architecture with CNN+GRU
class CNNLSTM15(ModelBuilderMoreAugmentation):
    def define_model(self, lstm_cells=64, dens_nuerons=64, dropout=0.25):
        model = Sequential()
        model.add(TimeDistributed(Conv2D(16,(3,3), padding='same', activation='relu',
                         input_shape = (self.sample_frames, self.img_height, self.img_width, self.channels))))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
    
        model.add(TimeDistributed(Conv2D(32,(3,3), padding='same', activation='relu')))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
        
        model.add(TimeDistributed(Conv2D(64,(3,3), padding='same', activation='relu')))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
        
        model.add(TimeDistributed(Conv2D(128,(3,3), padding='same', activation='relu')))
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2)))) # Convolutional Layer
        
        
        model.add(TimeDistributed(Flatten()))  # Input Layer
        
        
        model.add(GRU(lstm_cells))
        model.add(Dropout(dropout))  # LSTM Layer
        
        model.add(Dense(dens_nuerons, activation='relu'))
        model.add(Dropout(dropout)) # Hidden Layer
        
        
        model.add(Dense(self.classes, activation='softmax')) # Output Layer
        
        optimiser = optimizers.Adam(learning_rate=0.0002)
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [450]:
# Set up the model 8 with different parameters
Conv_model14 = CNNLSTM15()
Conv_model14.initializepath(project_folder)
Conv_model14.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model14.initializehyperparm(sample_frames = 18, batch_size = 20, no_of_epochs = 20) 
Conv_model15 = Conv_model14.define_model(lstm_cells=128, dens_nuerons=128, dropout=0.25)
Conv_model15.build((None, 18, 120, 120, 3))
Conv_model15.summary()

Model: "sequential_127"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed_224 (Time  (None, 18, 120, 120, 16   448       
 Distributed)                )                                   
                                                                 
 time_distributed_225 (Time  (None, 18, 120, 120, 16   64        
 Distributed)                )                                   
                                                                 
 time_distributed_226 (Time  (None, 18, 60, 60, 16)    0         
 Distributed)                                                    
                                                                 
 time_distributed_227 (Time  (None, 18, 60, 60, 32)    4640      
 Distributed)                                                    
                                                                 
 time_distributed_228 (Time  (None, 18, 60, 60, 32) 

In [451]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model15.count_params()}')
# model15_hist = Conv_model14.train_model(Conv_model15, augment_data=True)

### There is not much improvement in accuracy.

# Model 16: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 5, No of Epochs: 20 (Data Augmentation)
### CNN+LSTM model -> MobileNet model
### Lets import mobilenet model for speeding up the training process and improving performance. MobileNet model is a deep NN architecture designed for mobile and edge devices with limited computational resources. It is light weight and efficient making it suitable for real time appications on devices with constraints.

In [452]:
# Importing mobilenet model and creating instance
from keras.applications import mobilenet
mobilenet_transfer = mobilenet.MobileNet(weights='imagenet', include_top=False)

class CNNLSTM_TL(ModelBuilderMoreAugmentation):     # Transfer Learning
    
    def define_model(self,lstm_cells=64,dense_neurons=64,dropout=0.25):
        
        model = Sequential()
        model.add(TimeDistributed(mobilenet_transfer,input_shape=(self.sample_frames, self.img_height, self.img_width, self.channels)))
        
        
        for layer in model.layers:
            layer.trainable = False    # We are not training the mobilenet model weights
        
        
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D((2, 2))))
        model.add(TimeDistributed(Flatten()))

        model.add(LSTM(lstm_cells))
        model.add(Dropout(dropout))
        
        model.add(Dense(dense_neurons,activation='relu'))
        model.add(Dropout(dropout))
        
        model.add(Dense(self.classes, activation='softmax'))
        
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model




### The instance of mobilenet model pretrained on imagenet dataset is created. The pretrained weights of imagenet dataset are used as input. The top layer (fully connected layer) of mobilenet model which is responsible for image classification is excluded. This allows the mobilenet model to use as feature extractor. The mobilenet model is added as time distributed layer making it capable of processing sequence of images. The each layer of mobilenet model is frozen to retain the previous weights and to avoid the further training. This architecture combines the power of mobilenet model for feature extraction and sequential processing capability of LSTM for video classification task. The final dense layer is responsible for classifying input sequences into different classes.

In [453]:
# Set up the model 16 with different parameters
Conv_model15 = CNNLSTM_TL()
Conv_model15.initializepath(project_folder)
Conv_model15.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model15.initializehyperparm(sample_frames = 16, batch_size = 5, no_of_epochs = 20) 
Conv_model16 = Conv_model15.define_model(lstm_cells=128, dense_neurons=128, dropout=0.25)
Conv_model16.summary()

Model: "sequential_128"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed_237 (Time  (None, 16, 3, 3, 1024)    3228864   
 Distributed)                                                    
                                                                 
 time_distributed_238 (Time  (None, 16, 3, 3, 1024)    4096      
 Distributed)                                                    
                                                                 
 time_distributed_239 (Time  (None, 16, 1, 1, 1024)    0         
 Distributed)                                                    
                                                                 
 time_distributed_240 (Time  (None, 16, 1024)          0         
 Distributed)                                                    
                                                                 
 lstm_14 (LSTM)              (None, 128)            

In [454]:
# Lets check the total parameters during training of model
# print(f'Total Param: {Conv_model16.count_params()}')
# model16_hist = Conv_model15.train_model(Conv_model16, augment_data=True)

# Model 17: Image Resolution: (120,120), Sample Frames: 16, Batch Size: 5, No of Epochs: 20 (Data Augmentation)
### CNN+GRU model -> MobileNet model

In [455]:
# Lets build model 17 architecture with CNN+GRU
class CNNGRU_TL(ModelBuilderMoreAugmentation):     # Transfer Learning
    
    def define_model(self,lstm_cells=64,dense_neurons=64,dropout=0.25):
        
        model = Sequential()
        model.add(TimeDistributed(mobilenet_transfer,input_shape=(self.sample_frames, self.img_height, self.img_width, self.channels)))
        
          # We are training the mobilenet model weights
        
        
        model.add(TimeDistributed(BatchNormalization()))
        model.add(TimeDistributed(MaxPooling2D((2, 2))))
        model.add(TimeDistributed(Flatten()))

        model.add(GRU(lstm_cells))
        model.add(Dropout(dropout))
        
        model.add(Dense(dense_neurons,activation='relu'))
        model.add(Dropout(dropout))
        
        model.add(Dense(self.classes, activation='softmax'))
        
        
        optimiser = optimizers.Adam()
        model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
        return model

In [456]:
# Set up the model 17 with different parameters
Conv_model16 = CNNGRU_TL()
Conv_model16.initializepath(project_folder)
Conv_model16.initialize_img_prop(img_width = 120, img_height = 120)
Conv_model16.initializehyperparm(sample_frames = 16, batch_size = 5, no_of_epochs = 20) 
Conv_model17 = Conv_model16.define_model(lstm_cells=128, dense_neurons=128, dropout=0.25)
Conv_model17.summary()

Model: "sequential_129"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed_241 (Time  (None, 16, 3, 3, 1024)    3228864   
 Distributed)                                                    
                                                                 
 time_distributed_242 (Time  (None, 16, 3, 3, 1024)    4096      
 Distributed)                                                    
                                                                 
 time_distributed_243 (Time  (None, 16, 1, 1, 1024)    0         
 Distributed)                                                    
                                                                 
 time_distributed_244 (Time  (None, 16, 1024)          0         
 Distributed)                                                    
                                                                 
 gru_4 (GRU)                 (None, 128)            

In [457]:
# Lets check the total parameters during training of model
print(f'Total Param: {Conv_model17.count_params()}')
model17_hist = Conv_model16.train_model(Conv_model17, augment_data=True)

Total Param: 3693253
Epoch 1/20
Epoch 1: val_loss improved from inf to 0.68646, saving model to model_init_2024-01-0318_06_46.290805\model-00001-1.20639-0.51357-0.68646-0.77000.h5
Epoch 2/20
Epoch 2: val_loss improved from 0.68646 to 0.62445, saving model to model_init_2024-01-0318_06_46.290805\model-00002-0.60933-0.77828-0.62445-0.77000.h5
Epoch 3/20
Epoch 3: val_loss improved from 0.62445 to 0.47235, saving model to model_init_2024-01-0318_06_46.290805\model-00003-0.44644-0.83861-0.47235-0.83000.h5
Epoch 4/20
Epoch 4: val_loss did not improve from 0.47235
Epoch 5/20
Epoch 5: val_loss improved from 0.47235 to 0.40367, saving model to model_init_2024-01-0318_06_46.290805\model-00005-0.26917-0.90573-0.40367-0.86000.h5
Epoch 6/20
Epoch 6: val_loss did not improve from 0.40367
Epoch 7/20
Epoch 7: val_loss improved from 0.40367 to 0.27655, saving model to model_init_2024-01-0318_06_46.290805\model-00007-0.12225-0.95928-0.27655-0.88000.h5
Epoch 8/20
Epoch 8: val_loss did not improve from 0.

### Experimenting with other combinations of hyperparameters like, activation functions (ReLU, tanh, sigmoid), other optimizers like Adagrad() and Adadelta()  can further help develop better and more accurate models. Experimenting with other combinations of hyperparameters like the filter size, paddings, stride_length, batch_normalization, dropouts etc. can further help improve performance.

# Model 8: Image Resolution: (120,120), Sample Frames: 18, Batch Size: 20, No of Epochs: 25
# CNN + LSTM Mode
# Gave Best Results