# Objective: To build a hand gesture recognition model using neural netwroks.
* Each gesture can be assigned to a unique action which can be integrated in smart devices having camera.
* Camera will input our gesture to our embedded neural model which inturn will perform gesture classificartion. Then corresponding action to this predicted gesture can be performed. E.g. We can embed this neural model into smart T.V. and perform basic functions without remote, like turning volume up by showing thumbs up, changing channels by simply right/left swiping etc.
* Basically in this task we will train neural netwroks for classification task.

## INDEX

* [Data Sourcing](#Data-Sourcing)
* [Creating Generator](#Creating-Generator)
* [Basic EDA](#Basic-EDA)
* [Model Creation](#Model-Creation)
    1. [Choosing frame shape](#Choosing-frame-shape)
        * [Conclusion](#Conclusion)
    2. [Choosing sample numbers](#Choosing-sample-numbers)
        * [Conclusion](#Conclusion)
    3. [Choosing optimizer](#Choosing-optimizer)
        * [Conclusion](#Conclusion)
    4. [Training Conv3D Model with higher epocs](#Training-Conv3D-Model-with-higher-epocs)
        * [Conclusion](#Conclusion)
    5. [Builing Conv3D Model with augmentation](#Builing-Conv3D-Model-with-augmentation)
        * [Conclusion](#Conclusion)
    6. [Custom CNN + GRU Model](#Custom-CNN-+-GRU-Model)
        * [Conclusion](#Conclusion)
    7. [Resnet50 + GRU Model](#Resnet50-+-GRU-Model)
        * [Conclusion](#Conclusion)
    8. [Resnet50 + GRU Model with augmentation](#Resnet50-+-GRU-Model-with-augmentation)
        * [Conclusion](#Conclusion)
    9. [VGG16 + GRU with data augmentation](#VGG16-+-GRU-with-data-augmentation)
        * [Conclusion](#Conclusion)

* [Final Model Selection](#Final-Model-Selection)

In [None]:
# Importing liabraries for data processing and visualization:
import numpy as np
import os
import cv2
import matplotlib.pyplot as plt
import datetime
import os
import random as rn
import shutil
import zipfile
import csv
import pandas as pd

# Importing Deep learning liabraries:
from tensorflow import keras
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
from keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization, Activation
from tensorflow.keras.layers import Conv2D,MaxPooling2D,Dropout
from tensorflow.keras.layers import Conv3D, MaxPooling3D
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau,EarlyStopping
from tensorflow.keras.optimizers import Adam,SGD

In [None]:
# We set the random seed so that we can reproduce the same randomness and thus results won't vary drastically.
np.random.seed(30)
rn.seed(30)
tf.random.set_seed(30)

# For supressing warnings
import warnings
warnings.filterwarnings("ignore")

### Data Sourcing
* `The training data consists of a few hundred videos categorised into one of the five classes`: 
    - Thumbs up:  Increase the volume
    - Thumbs down: Decrease the volume
    - Left swipe: 'Jump' backwards 10 seconds
    - Right swipe: 'Jump' forward 10 seconds  
    - Stop: Pause the movie
* `Each video (typically 2-3 seconds long) is divided into a sequence of 30 frames(images)`. These videos have been recorded by various people performing one of the five gestures in front of a webcam - similar to what the smart TV will use. 
* Dataset contains a 'train' and a 'val' folder with two CSV files for the two folders:
    * These folders are in turn divided into subfolders where each subfolder represents a video of a particular gesture. `Each subfolder, i.e. a video, contains 30 frames (or images)`.  
    * `Each row of the CSV file represents one video and contains three main pieces of information - the name of the subfolder containing the 30 images of the video, the name of the gesture and the numeric label (between 0-4) of the video`.
* Note that all images in a particular video subfolder have the same dimensions but different videos may have different dimensions.
* These videos have two types of dimensions - either 360x360 or 120x160 (depending on the webcam used to record the videos)


In [None]:
# function for data augmentation
def aug(source_path):
    
    path1 = './Project_data/aug'
    if not os.path.isdir( path1 ) :
            os.mkdir( path1 )
    
    train_record_list = np.random.permutation(open('./Project_data/train.csv').readlines())
    for record in train_record_list:
        
        frame_list = os.listdir(source_path+'/'+ (record.split(';')[0]))
        path2 = './Project_data/aug/'+ 'aug_'+ record.split(';')[0]
        if not os.path.isdir( path2 ) :
            os.mkdir( path2 )
        for frame in frame_list:
            image = cv2.imread(source_path+'/'+ record.split(';')[0]+'/'+frame).astype(np.float32)
            GaussianBlur = cv2.GaussianBlur(image,(3,3),0)   # Adding gaussian blur
            fliped= cv2.flip(GaussianBlur,1)
            dir_path = path2 + "/"  + frame
            status = cv2.imwrite(dir_path, fliped)
        
        with open('aug_csv.csv', "a", newline="") as aug_file:
            aug_writer = csv.writer(aug_file, delimiter=';')
            if int(record.split(';')[2]) == 0:
                aug_writer.writerow(['aug_'+ record.split(';')[0], 'Right_Swipe_new', '1'])
            elif int(record.split(';')[2]) == 1:
                aug_writer.writerow(['aug_'+ record.split(';')[0], 'Left Swipe_new', '0'])
            else:
                aug_writer.writerow(['aug_'+ record.split(';')[0], record.split(';')[1], record.split(';')[2].replace('\n','')])
                
    source_dir = path1
    target_dir = './Project_data/train'

    file_names = os.listdir(source_dir)
    for file_name in file_names:
        shutil.move(os.path.join(source_dir, file_name), target_dir)

    with open("./Project_data/train.csv", "a") as file1,open("./aug_csv.csv", "r") as file2:
        for line in file2:
           file1.write(line)

    if os.path.exists(path1):
        shutil.rmtree(path1)

    if os.path.exists("./aug_csv.csv"):
        os.remove("./aug_csv.csv")
        

In [None]:
# function for referencing train and validation data path:
train_doc = None
val_doc = None
train_path = None
val_path = None
num_train_sequences = None
num_val_sequences = None

def read_path(augment):
    global train_doc
    global val_doc
    global train_path
    global val_path
    global num_train_sequences
    global num_val_sequences
    
    if os.path.exists('./Project_data'):
        shutil.rmtree('./Project_data')
    
    with zipfile.ZipFile('./Project_data.zip', 'r') as zip_ref:
        zip_ref.extractall('./')
    
    train_path = './Project_data/train'      # path to train set
    val_path = './Project_data/val'          # path to validation set
    
    if augment == True:
        aug(train_path)
        
    # Reading rows of CSV file (videos indirectly) at random order to avoid any introduction of bias when training:
    train_doc = np.random.permutation(open('./Project_data/train.csv').readlines())
    val_doc = np.random.permutation(open('./Project_data/val.csv').readlines()) 
    
    num_train_sequences = len(train_doc)
    print('# training sequences =', num_train_sequences)
    num_val_sequences = len(val_doc)
    print('# validation sequences =', num_val_sequences)
    

### Creating Generator

* We will have to create our own custom data generator since we are working with videos and not regualar text or image data.
* Data preprocessing:
    * Since images are of varying shape, we must process them as a part of data formatting or else conv3D will throw error if the inputs in a batch have different shapes    
        - We have not decided to crop as useful information may be present at any location of frame.
        - `We will resize the image instead of cropping`.
    * Normalizing th epixel values :
        - Since the image is natural image we can divide each channel pixel value by 255 (i.e. {2^8}-1 ,here 8 because of unsigned int format)
        - But we have decided to use min-max normalization.
        - We acknowledge that type of normalization will depend upon nature of image and is subject to task at hand.
* Data Augmentation:
    * We will try to make model more general by increasing the size of training data with the help of Data Augmentation.
        - we have decided to add gaussian noise to the image
        - We will also flip the image around y-axis as we will be able to augment the additional oppposite guesture using same person for swiping. Also, we will able to augment guestures for other hand thus increasing variation in the data.
            - We must correct the labels for augmented swipe guestures as left swipe will become right and vice versa.
        - We could also perform image rotation( upto very little degree ), but we will not implement it here.

In [None]:
# Generator function:
def generator(source_path, folder_list, batch_size, input_shape, sampling_type):
    print( 'Source path = ', source_path, '; batch size =', batch_size,'; sampling type =',sampling_type)
    if sampling_type == 'middle':
        img_idx = list(range(5,25))
    elif sampling_type == 'custom':
        img_idx = [0,1,2,4,6,8,10,12,14,16,18,20,22,24,26,27,28,29]
    while True:
        t = np.random.permutation(folder_list)
        num_batches = len(folder_list)//batch_size
        for batch in range(num_batches): # we iterate over the number of batches
            batch_data = np.zeros((batch_size,len(img_idx),input_shape[1],input_shape[2],input_shape[3])) 
            batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
            for folder in range(batch_size): # iterate over the batch_size
                frame_list = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
                for idx,item in enumerate(img_idx): #  Iterate over the frames/images of a folder to read them in
                    image = cv2.imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+frame_list[item]).astype(np.float32)
                    
                    #crop the images and resize them. Note that the images are of 2 different shape 
                    #since the conv3D will throw error if the inputs in a batch have different shapes
                    
                    image = cv2.resize(image,input_shape[1:3],interpolation = cv2.INTER_AREA)   # Resizing the image
                    image = cv2.normalize(image, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX) #Normalising the image pixel values
                    
                    
                    batch_data[folder,idx,:,:,0] = image[:,:,0]       #feed in the image
                    batch_data[folder,idx,:,:,1] = image[:,:,1]       #feed in the image 
                    batch_data[folder,idx,:,:,2] = image[:,:,2]       #feed in the image
                    
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
                
            yield batch_data, batch_labels #yield the batch_data and the batch_labels

        
        # Code for the remaining data points which are left after full batches
        remaining_datapoints = len(folder_list) % batch_size
        if remaining_datapoints != 0:
            batch += 1
            batch_data = np.zeros((remaining_datapoints,len(img_idx),input_shape[1],input_shape[2],input_shape[3])) 
            batch_labels = np.zeros((remaining_datapoints,5)) # batch_labels is the one hot representation of the output
            for folder in range(remaining_datapoints): # iterate over the batch_size
                    frame_list = os.listdir(source_path+'/'+ t[folder + ((batch)*remaining_datapoints)].split(';')[0]) # read all the images in the folder
                    for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                        image = cv2.imread(source_path+'/'+ t[folder + ((batch)*remaining_datapoints)].strip().split(';')[0]+'/'+frame_list[item]).astype(np.float32)

                        #crop the images and resize them. Note that the images are of 2 different shape 
                        #and the conv3D will throw error if the inputs in a batch have different shapes
                        image = cv2.resize(image,input_shape[1:3],interpolation = cv2.INTER_AREA)  # Resizing the image
                        image = cv2.normalize(image, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX) #Normalising the image pixel values

                        batch_data[folder,idx,:,:,0] = image[:,:,0]           #feed in the image
                        batch_data[folder,idx,:,:,1] = image[:,:,1]           #feed in the image
                        batch_data[folder,idx,:,:,2] = image[:,:,2]           #feed in the image
                        
                        
                    batch_labels[folder, int(t[folder + ((batch)*remaining_datapoints)].strip().split(';')[2])] = 1
                    
            yield batch_data, batch_labels #yield the batch_data and the batch_labels

* We will try with SGD/Adam optizer.
* Since our target variable is categorical and our task is multiclass classification, we will use Categorical_crossentropy as a loss function and categorical_accuracy as a metric for model evalution.
* We will make use of ModelCheckpoint functionality of tf.keras liabrary to save a model/weights at some interval, so the model/weights can be loaded later to continue the training from the state saved. 
    - To reduce disk space while saving model weights we will only save best weights using `save_best_only` parameter.
* we will also be using 'ReduceLROnPlateau' feature of keras which reduces the learning rate when a specified metric has stopped improving even after specified number of epochs.
    - patience=2 means we will wait for 2 epochs before reducing learning rate when metric has stopped improving.
    - factor is a quantity by which the learning rate will be reduced.
* We will monitor 'validation loss' for both callback parameters.

In [None]:
# Function for compliling the model and training:
def run(ModelName,batch_size,num_epochs,optimzr,sampling_type):
    curr_dt_time = datetime.datetime.now()
    if optimzr == 'adam':
        optimiser = tf.keras.optimizers.Adam()
    elif optimzr == 'sgd':
        optimiser = tf.keras.optimizers.SGD()
    
    ModelName.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
    print (ModelName.summary())
    
    train_generator = generator(train_path, train_doc, batch_size,input_shape,sampling_type)
    val_generator = generator(val_path, val_doc, batch_size,input_shape,sampling_type)
    
    print('\n')
    
    model_name = ModelName._name + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
    if not os.path.exists(model_name):
        os.mkdir(model_name)
        
    filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

    checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False, mode='auto')

    LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1, mode='min', epsilon=0.0001, cooldown=0, min_lr=0.00001)
    
    callbacks_list = [checkpoint, LR]
    
    if (num_train_sequences%batch_size) == 0:
        steps_per_epoch = int(num_train_sequences/batch_size)
    else:
        steps_per_epoch = (num_train_sequences//batch_size) + 1

        
    if (num_val_sequences%batch_size) == 0:
        validation_steps = int(num_val_sequences/batch_size)
    else:
        validation_steps = (num_val_sequences//batch_size) + 1
    
    history = ModelName.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)
    
    return(history)

In [None]:
# Plots train and validation accuracy and loss.
def visualize_performance(model_history):
    acc = model_history.history['categorical_accuracy']
    val_acc = model_history.history['val_categorical_accuracy']

    loss = model_history.history['loss']
    val_loss = model_history.history['val_loss']

    epochs_range = range(len(model_history.epoch))

    plt.figure(figsize=(15, 8))
    plt.subplot(1, 2, 1)
    plt.plot(epochs_range, acc, label='Training Accuracy')
    plt.plot(epochs_range, val_acc, label='Validation Accuracy')
    plt.legend(loc='lower right')
    plt.title('Training and Validation Accuracy')

    plt.subplot(1, 2, 2)
    plt.plot(epochs_range, loss, label='Training Loss')
    plt.plot(epochs_range, val_loss, label='Validation Loss')
    plt.legend(loc='upper right')
    plt.title('Training and Validation Loss')
    plt.show()

### Data exploration

In [None]:
read_path(False)

## Basic EDA

In [None]:
random_image = cv2.imread('./Project_data/train/WIN_20180907_15_35_09_Pro_Right Swipe_new/WIN_20180907_15_35_09_Pro_00012.png')
print('original shape is :',random_image.shape)
plt.imshow(cv2.cvtColor(random_image, cv2.COLOR_BGR2RGB))
plt.show()
resize_random_image = cv2.resize(random_image,(100,100),interpolation = cv2.INTER_AREA)
print('Shape after resizing: ',resize_random_image.shape)
plt.imshow(cv2.cvtColor(resize_random_image, cv2.COLOR_BGR2RGB))
plt.show()
gusassian_random = cv2.GaussianBlur(resize_random_image,(3,3),0) 
print('After adding gaussian blur:  ')
plt.imshow(cv2.cvtColor(gusassian_random, cv2.COLOR_BGR2RGB))
plt.show()
gusassian_random_flip = cv2.flip(gusassian_random,1)
print('After flipping an image:  ')
plt.imshow(cv2.cvtColor(gusassian_random_flip, cv2.COLOR_BGR2RGB))
plt.show()

## Model Creation

* our data is made of videos and thus is sequential in nature. Thus we will use following two types of architectures :
    * CNN + RNN architecture :
        - Videos are nothing but sequence of images. Thus we can pass these sequence of images through a 2D-CNN which extracts a feature vector for each image (handles spatial information).
        - we then pass these extracted feature i.e. output of CNN to the RNN to handle the temporal information.
            - Since LSU has less parameters to train it is better than LSTM in terms of computational efficiency(lstm has 4 times more trainable parameter than vanilla RNN whereas LSU has 3 times more trainable parameter than vanilla RNN. note: This is not true for keras implementation of LSU as it has twice bias parametrs.)
            - We will choose LSU for modelling.
        - Since our ultimate task is to classify among 5 classes of gesture, we pass the output of rnn to softmax function.
    * 3D convolutional network :
        - 3D-CNN is just natural extension to the 2D convolutions. Only difference between them and 2D-CNN is that in 3D-CNN, you move the filter in three directions instead of just two. i.e. you move the filter across the sequence of image as well.
        - Since our ultimate task is to classify among 5 classes of gesture, we use softmax function at the end.
        
* Initially, We will build simple models to select among the various parametrs such as suitable frame shape. sampling rate and also for tuning hyperparameters such as optimizers etc. In starting models our aim will be to find the right combination of these parameters and  only after selecting these paramets, later we will focus on improving performance of the model.

## Choosing frame shape
* We have two types of dimensions present in our datset- 360x360 or 120x160
* If we try to make all images 360*360, lower resolution images will get pixalated very much. 
    * We will prefer downsampling rather than upsampling for image resizing.
    * We will thus try two options for image shape : 
        - img_height*img_width = 64*64 
        - img_height*img_width = 120*120

#### MODEL 1

In [None]:
# Trying out image with img_height*img_width = 64*64

input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 15
augment = False
optimzr = 'adam'
sampling_type = 'custom'

In [None]:
read_path(augment)

In [None]:
model1 = Sequential()
model1.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=input_shape))
model1.add(BatchNormalization())
model1.add(Activation('elu'))
model1.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model1.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model1.add(BatchNormalization())
model1.add(Activation('elu'))
model1.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model1.add(Dropout(0.25))

model1.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model1.add(BatchNormalization())
model1.add(Activation('elu'))
model1.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model1.add(Dropout(0.25))

model1.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model1.add(BatchNormalization())
model1.add(Activation('elu'))
model1.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model1.add(Flatten())
model1.add(Dropout(0.4))
model1.add(Dense(512, activation='elu'))
model1.add(Dropout(0.4))
model1.add(Dense(5, activation='softmax'))

model1._name = 'model_1'

In [None]:
history1 = run(ModelName=model1,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history1)

#### MODEL 2

In [None]:
# Trying out image with img_height*img_width = 120*120

input_shape = (18, 120, 120, 3)
batch_size = 8
num_epochs = 15
augment = False
optimzr = 'adam'
sampling_type = 'custom'

In [None]:
read_path(augment)

In [None]:
model2 = Sequential()
model2.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=input_shape))
model2.add(BatchNormalization())
model2.add(Activation('elu'))
model2.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model2.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('elu'))
model2.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model2.add(Dropout(0.25))

model2.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('elu'))
model2.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model2.add(Dropout(0.25))

model2.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model2.add(BatchNormalization())
model2.add(Activation('elu'))
model2.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model2.add(Flatten())
model2.add(Dropout(0.4))
model2.add(Dense(512, activation='elu'))
model2.add(Dropout(0.4))
model2.add(Dense(5, activation='softmax'))

model2._name = 'model_2'

In [None]:
history2 = run(ModelName=model2,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history2)

## Conclusion
Model_1 is giving slightly better results as compared to model_2. Also, model_1 has less number of total parameters making it comparatively a lighter model.
Thus we will use img_height*img_width = 64 * 64

## Choosing sample numbers
* Each video is a sequence of 30 images.
* But we can sample these images to save memory, reduce the computational power and runtime.
* In our task,it is logical to say that most of useful information will be present in middle timestamps of a video rather than at the very begining and very end. So we will consider only middle frames of the video.

####  Model 3

In [None]:
# Using images with middle index/timestamp of a sequence (20 frames):

input_shape = (20, 64, 64, 3)
batch_size = 8
num_epochs = 15
augment = False
optimzr = 'adam'
sampling_type = 'middle'


In [None]:
read_path(augment)

In [None]:
model3 = Sequential()
model3.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=input_shape))
model3.add(BatchNormalization())
model3.add(Activation('elu'))
model3.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model3.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model3.add(BatchNormalization())
model3.add(Activation('elu'))
model3.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model3.add(Dropout(0.25))

model3.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model3.add(BatchNormalization())
model3.add(Activation('elu'))
model3.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model3.add(Dropout(0.25))

model3.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model3.add(BatchNormalization())
model3.add(Activation('elu'))
model3.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model3.add(Flatten())
model3.add(Dropout(0.4))
model3.add(Dense(512, activation='elu'))
model3.add(Dropout(0.4))
model3.add(Dense(5, activation='softmax'))

model3._name = 'model_3'

In [None]:
history3 = run(ModelName=model3,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history3)

####  Model 4

In [None]:
# Using images with custom index/timestamp of a sequence (roughly 18 frames):

input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 15
augment = False
optimzr = 'adam'
sampling_type = 'custom'


In [None]:
read_path(augment)

In [None]:
model4 = Sequential()
model4.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=input_shape))
model4.add(BatchNormalization())
model4.add(Activation('elu'))
model4.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model4.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model4.add(BatchNormalization())
model4.add(Activation('elu'))
model4.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model4.add(Dropout(0.25))

model4.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model4.add(BatchNormalization())
model4.add(Activation('elu'))
model4.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model4.add(Dropout(0.25))

model4.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model4.add(BatchNormalization())
model4.add(Activation('elu'))
model4.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model4.add(Flatten())
model4.add(Dropout(0.4))
model4.add(Dense(512, activation='elu'))
model4.add(Dropout(0.4))
model4.add(Dense(5, activation='softmax'))

model4._name = 'model_4'

In [None]:
history4 = run(ModelName=model4,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history4)

## Conclusion 
custom image indexing is giving comparatively better results.

## Choosing optimizer
* We will create new model with SGD optimizer and compare its result with previous model.

#### Model 5

In [None]:
# Model 5 - Using Stochastic gradient descent algorithm:

input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 15
augment = False
optimzr = 'sgd'
sampling_type = 'custom'

In [None]:
read_path(augment)

In [None]:
model5 = Sequential()
model5.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=input_shape))
model5.add(BatchNormalization())
model5.add(Activation('elu'))
model5.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model5.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model5.add(BatchNormalization())
model5.add(Activation('elu'))
model5.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model5.add(Dropout(0.25))

model5.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model5.add(BatchNormalization())
model5.add(Activation('elu'))
model5.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model5.add(Dropout(0.25))

model5.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model5.add(BatchNormalization())
model5.add(Activation('elu'))
model5.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model5.add(Flatten())
model5.add(Dropout(0.4))
model5.add(Dense(512, activation='elu'))
model5.add(Dropout(0.4))
model5.add(Dense(5, activation='softmax'))

model5._name = 'model_5'

history5 = run(ModelName=model5,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history5)

## Conclusion
Both algorithms are giving similar esults. We will use Adam as optimizer.

## Training Conv3D Model with higher epocs

In [None]:
# Model 6 - training with higher epocs

input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 20
augment = False
optimzr = 'adam'
sampling_type = 'custom'

In [None]:
read_path(augment)

In [None]:
#Model 
model6 = Sequential()
model6.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=input_shape))
model6.add(BatchNormalization())
model6.add(Activation('elu'))
model6.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model6.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model6.add(BatchNormalization())
model6.add(Activation('elu'))
model6.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model6.add(Dropout(0.25))

model6.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model6.add(BatchNormalization())
model6.add(Activation('elu'))
model6.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model6.add(Dropout(0.25))

model6.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model6.add(BatchNormalization())
model6.add(Activation('elu'))
model6.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model6.add(Flatten())
model6.add(Dropout(0.4))
model6.add(Dense(512, activation='elu'))
model6.add(Dropout(0.4))
model6.add(Dense(5, activation='softmax'))

model6._name = 'model_6'

history6 = run(ModelName=model6,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history6)

## Conclusion 
Model is performing very good.

## Builing Conv3D Model with augmentation

In [None]:
# Model 7 - adam with data augmentation

input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 20
augment = True
optimzr = 'adam'
sampling_type = 'custom'

In [None]:
read_path(augment)

In [None]:
model7 = Sequential()
model7.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=input_shape))
model7.add(BatchNormalization())
model7.add(Activation('elu'))
model7.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model7.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model7.add(BatchNormalization())
model7.add(Activation('elu'))
model7.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model7.add(Dropout(0.25))

model7.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model7.add(BatchNormalization())
model7.add(Activation('elu'))
model7.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

# model7.add(Dropout(0.25))

model7.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model7.add(BatchNormalization())
model7.add(Activation('elu'))
model7.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model7.add(Flatten())
model7.add(Dropout(0.4))
model7.add(Dense(512, activation='elu'))
model7.add(Dropout(0.4))
model7.add(Dense(5, activation='softmax'))

model7._name = 'model_7'

history7 = run(ModelName=model7,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history7)

## Conclusion
Model7 is performing good

## Custom CNN + GRU Model

In [None]:
# Model 8 - CNN + GRU
input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 20
augment = True
optimzr = 'adam'
sampling_type = 'custom'

In [None]:
read_path(augment)

In [None]:
model_CnnGRU = Sequential()
model_CnnGRU.add(TimeDistributed(Conv2D(filters=16,kernel_size=(2,2),padding='same',activation="relu"),input_shape = input_shape))
model_CnnGRU.add(TimeDistributed(BatchNormalization()))
model_CnnGRU.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model_CnnGRU.add(TimeDistributed(Dropout(0.5)))
model_CnnGRU.add(TimeDistributed(Conv2D(filters=32,kernel_size=(2,2),padding='same',activation="relu")))
model_CnnGRU.add(TimeDistributed(BatchNormalization()))
model_CnnGRU.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model_CnnGRU.add(TimeDistributed(Dropout(0.5)))
model_CnnGRU.add(TimeDistributed(Conv2D(filters=64,kernel_size=(2,2),padding='same',activation="relu")))
model_CnnGRU.add(TimeDistributed(BatchNormalization()))
model_CnnGRU.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model_CnnGRU.add(TimeDistributed(Dropout(0.5)))
model_CnnGRU.add(TimeDistributed(Flatten()))
model_CnnGRU.add(GRU(64))
model_CnnGRU.add(Dropout(0.5))
model_CnnGRU.add(Dense(256,activation='relu'))
model_CnnGRU.add(Dropout(0.5))
model_CnnGRU.add(Dense(5,activation='softmax'))

model_CnnGRU._name = 'model_CNN_GRU'

history_CnnGRU = run(ModelName=model_CnnGRU,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history_CnnGRU)

Conclusion:
Performance is not satisfactory. 
We will take help of use transfer learning.

## Resnet50 + GRU Model
* We will take help of transfer learning to improve the performance.
* There are many pretrained models such as inception net, vgg net,GoogleNet etc which offers high accuracy.
* But there is a trade-off between model accuracy and efficiency, i.e. the inference time and memory requirement. 
* But to keep model light and efficient, we will use resnet-50 as it has smaller footprint.

In [None]:
# Model 9 - RESNET50 + GRU
input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 20
optimzr = 'adam'
sampling_type = 'custom'
augment = False

In [None]:
read_path(augment)

In [None]:
resnet = ResNet50(include_top=False,weights='imagenet',input_shape=input_shape[1:])
cnn =Sequential([resnet])
cnn.add(Conv2D(16,(2,2),strides=(1,1),padding='same'))
cnn.add(BatchNormalization())
cnn.add(Dropout(0.6))
cnn.add(Conv2D(16,(3,3),strides=(1,1),padding='same'))
cnn.add(BatchNormalization())
cnn.add(Dropout(0.6))
cnn.add(Flatten())

model_Resnet50GRU = Sequential()
model_Resnet50GRU.add(TimeDistributed(cnn,input_shape=input_shape))
model_Resnet50GRU.add(GRU(32,input_shape=(None,input_shape[0],256),return_sequences=True))
model_Resnet50GRU.add(GRU(16))
model_Resnet50GRU.add(Dropout(0.5))
model_Resnet50GRU.add(Dense(5,activation='softmax'))

model_Resnet50GRU._name = 'model_Resnet50_GRU'

history_Resnet50GRU = run(ModelName=model_Resnet50GRU,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history_Resnet50GRU)

# Conclusion
Best weights are saved using checkpoints.

## Resnet50 + GRU Model with augmentation

In [None]:
# Model 10 - RESNET50 + GRU with data augmentation
input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 15
optimzr = 'adam'
sampling_type = 'custom'
augment = True

In [None]:
read_path(augment)

In [None]:
resnet = ResNet50(include_top=False,weights='imagenet',input_shape=input_shape[1:])
cnn =Sequential([resnet])
cnn.add(Conv2D(16,(2,2),strides=(1,1),padding='same'))
cnn.add(BatchNormalization())
cnn.add(Dropout(0.6))
cnn.add(Conv2D(16,(3,3),strides=(1,1),padding='same'))
cnn.add(BatchNormalization())
cnn.add(Dropout(0.6))
cnn.add(Flatten())

model_ResnetGRU_aug = Sequential()
model_ResnetGRU_aug.add(TimeDistributed(cnn,input_shape=input_shape))
model_ResnetGRU_aug.add(GRU(32,input_shape=(None,input_shape[0],256),return_sequences=True))
model_ResnetGRU_aug.add(GRU(16))
model_ResnetGRU_aug.add(Dropout(0.5))
model_ResnetGRU_aug.add(Dense(5,activation='softmax'))

model_ResnetGRU_aug._name = 'model_Resnet_GRU_aug'

history_ResnetGRU_aug = run(ModelName=model_ResnetGRU_aug,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history_ResnetGRU_aug)

# Conclusion
Best weights are saved using checkpoints.

## VGG16 + GRU with data augmentation

In [None]:
# Model 11 - VGG16 + GRU with data augmentation
input_shape = (18, 64, 64, 3)
batch_size = 8
num_epochs = 30
optimzr = 'adam'
sampling_type = 'custom'
augment = True

In [None]:
read_path(augment)

In [None]:
base_model = VGG16(include_top=False,weights='imagenet',input_shape=input_shape[1:])

for layer in base_model.layers:
    layer.trainable = False
        
base_model_ouput = base_model.output
x = Flatten()(base_model_ouput)
features_1 = Dense(128, activation='relu')(x)
# features_2 = Dropout(0.4)(features_1)
features_3 = Dense(64, activation='relu')(features_1)
features_4 = Dropout(0.4)(features_3)
init_model = Model(inputs=base_model.input, outputs=features_4)

model_Vgg16GRU = Sequential()
model_Vgg16GRU.add(TimeDistributed(init_model, input_shape=input_shape))
model_Vgg16GRU.add(GRU(32,return_sequences=True))
model_Vgg16GRU.add(GRU(16))
model_Vgg16GRU.add(Dropout(0.1))
model_Vgg16GRU.add(Dense(32, activation='relu'))
model_Vgg16GRU.add(Dropout(0.5))
model_Vgg16GRU.add(Dense(5,activation='softmax'))

model_Vgg16GRU._name = 'model_Vgg16_GRU'

history_Vgg16GRU = run(ModelName=model_Vgg16GRU,batch_size=batch_size,num_epochs=num_epochs,
                optimzr=optimzr,sampling_type=sampling_type)

In [None]:
visualize_performance(history_Vgg16GRU)

# Conclusion
Best weights are saved using checkpoints.

## Final Model Selection
We can either go with Conv3D model or GRU Models trained using transfer learning since these are giving us better results.
We can use any of the saved weights which are not overfitting and are giving better accuracy on train and validation set.