# Gesture Recognition
To build a 3D Conv model that will be able to predict the 5 gestures correctly. 

### <font color='cyan'> Sections in this notebook: </font>
I. Prerequisites
    
    I.1. Importing all the necessary modules
    I.2. Shuffle the data
    
II. Custom Generator

III. Model Deployment

    III.1. Custom Conv3d Model
    III.2. Train and Val Generators
    III.3. Few more setup related steps 


    

# <font color='goldenrod'> I. Prerequisites </font>

### <font color='skyblue'>  I.1. Importing all the necessary modules </font> 

In [1]:
import numpy as np
import os
import cv2
from cv2 import imread, resize
import matplotlib.pyplot as plt
import random as rn
from keras import backend as K
import tensorflow as tf
import datetime
import os

In [2]:
from tensorflow import keras
from tensorflow.keras import layers, models
from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization, Activation, Dropout
from keras.layers.convolutional import Conv3D, MaxPooling3D
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers


In [3]:
tf.config.list_physical_devices('GPU')

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

$\Rightarrow$ <font color="asparagus"> We set the random seed so that the results don't vary drastically. </font> 

In [4]:
np.random.seed(30)
rn.seed(30)
tf.random.set_seed(30)

### <font color='skyblue'>  I.2. Shuffle the data </font> 

$\Rightarrow$ <font color="asparagus"> Read all the lines in the csv and randomly permute them. </font>

$\Rightarrow$ <font color="red "> TODO: REMOVE THIS COMMENT !! </font> <br>In this block, you read the folder names for training and validation. You also set the `batch_size` here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

In [5]:
train_path = r"D:\DDownloads\UpGrad\NeuralNetwork\CaseStudy\Project_data\train"
val_path = r"D:\DDownloads\UpGrad\NeuralNetwork\CaseStudy\Project_data\val"

In [6]:
trainCSV = r"D:\DDownloads\UpGrad\NeuralNetwork\CaseStudy\Project_data\train.csv"
valCSV = r"D:\DDownloads\UpGrad\NeuralNetwork\CaseStudy\Project_data\val.csv"

In [7]:
train_doc = np.random.permutation(open(trainCSV).readlines())
val_doc = np.random.permutation(open(valCSV).readlines())


In [8]:
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)

# training sequences = 663


In [9]:
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)

# validation sequences = 100


$\Rightarrow$ <font color="asparagus"> Set the batch size. </font>

In [10]:
batch_size = 10

$\Rightarrow$ <font color="red "> TODO: REMOVE THIS SECTION... Used for debugging cv2 </font>

In [11]:
img_name = os.listdir(train_path + "\\" + train_doc[0].split(';')[0])[0]

In [12]:
img_name

'WIN_20180926_16_54_08_Pro_00006.png'

In [13]:
x = imread(train_path + "\\" + train_doc[0].split(';')[0] + "\\" + img_name)

In [14]:
x.shape[0], x.shape[1]

(120, 160)

In [15]:
dim = (224, 224)

In [16]:
resized_img = resize(x, dim, interpolation = cv2.INTER_AREA)

In [17]:
#cv2.imshow("img", resized_img)
#cv2.waitKey(0)

In [18]:
#cv2.imshow("Orig", x)
#cv2.waitKey(0)

In [19]:
b ,g, r = cv2.split(resized_img)

$\Rightarrow$ <font color="red "> TODO: Up to this part</font>

# <font color='goldenrod'> II. Custom Generator </font>

$\Rightarrow$ <font color="asparagus"> Class Names/Labels:  </font> <br>
- Left to Right : 0 <br>
- Right to Left: 1<br>
- Stop: 2<br>
- Thumbs down: 3<br>
- Thums up: 4

In [20]:
class Generator:
    width = None 
    height = None 
    numChannels = 3
    
    source_path = None
    vectorList = None
    batch_size = None
    frameIdxList = None
    numFramesInVideo = None
    numVideso = None
    def __init__(self,
                 folder_list,
                 imgIdxList,
                 width=224,
                 height=224,
                 source_path=r"D:\DDownloads\UpGrad\NeuralNetwork\CaseStudy\Project_data\train",
                 batch_size=75):
        self.vectorList = np.random.permutation(folder_list) # Shuffle the data and store in a list
        #print(self.vectorList)
        self.frameIdxList = imgIdxList
        self.numFramesInVideo = len(imgIdxList)
        self.numVideos = len(folder_list)
        self.source_path = source_path
        self.batch_size = batch_size
        self.width = width
        self.height = height
        self.numOfBatches = self.numVideos // self.batch_size
        
    # Loop through current batch size --> get one folder at a time -->
    # loop through each image in a folder --> preprocess --> One hot encode the label --> yield
    def __getBatchData(self, batch, curr_batch_size):
        batch_data = np.zeros((batch_size, self.numFramesInVideo, 
                               self.width, self.height, self.numChannels)) 
        # batch_labels is the one hot representation of the output
        batch_labels = np.zeros((batch_size, 5))
        for folderIdx in range(curr_batch_size):
             # Get vector/folder name
            ## Turn this on for debugging
            #print(folderIdx + (batch*batch_size))
            vectorName = self.vectorList[folderIdx + (batch*self.batch_size)].strip().split(';')[0]
            #print(vectorName)
            imgs = os.listdir(self.source_path+'/'+ vectorName)
            # Iterate iver the frames/images of a folder to read them in
            for idx,item in enumerate(self.frameIdxList):
                # Get the image in float32 
                image = imread(self.source_path+'/'+ vectorName +'/'+imgs[item]).astype(np.float32)
                # Resize
                resized_img = resize(image, (self.width, self.height), interpolation = cv2.INTER_AREA)
                # Normalize
                resized_img = resized_img / 255.0
                #crop the images ## TO DO, we are resizing for now
                channels = cv2.split(resized_img) # b g r
                batch_data[folderIdx,idx,:,:,0] = channels[0]
                batch_data[folderIdx,idx,:,:,1] = channels[1]
                batch_data[folderIdx,idx,:,:,2] = channels[2]
            # One hot encoding
            batch_labels[folderIdx, int(self.vectorList[folderIdx + (batch*batch_size)].strip().split(';')[2])] = 1
        return batch_data, batch_labels
    
    # Public method, call this to get generator object
    def generator(self):
        while True:
            for batch in range(self.numOfBatches):
                batch_data, batch_labels = self.__getBatchData(batch, self.batch_size)
                yield batch_data, batch_labels
            # For the remaining data points which are left after full batches
            batch += 1
            rem_batch_size = self.numVideos % self.batch_size
            batch_data, batch_labels = self.__getBatchData(batch, rem_batch_size)
            yield batch_data, batch_labels 

$\Rightarrow$ <font color="red "> TODO: REMOVE THIS COMMENT !! </font> <br> Note here that a video is represented above in the generator as (number of images, height, width, number of channels). Take this into consideration while creating the model architecture.

$\Rightarrow$ <font color="asparagus"> Some global constants

In [21]:
imgIdxList = list(range(0,30))
# (width, height) is the final size of the input images 
# numChannels = 3 (RGB)
width = 224
height = 224
numChannels = 3

$\Rightarrow$ <font color="asparagus"> For testing

In [22]:
#batch_size = 3

In [23]:
train_doc[:4]

array(['WIN_20180926_16_54_08_Pro_Right_Swipe_new;Right_Swipe_new;1\n',
       'WIN_20180925_18_02_58_Pro_Thumbs_Down_new;Thumbs_Down_new;3\n',
       'WIN_20180925_17_33_08_Pro_Left_Swipe_new;Left_Swipe_new;0\n',
       'WIN_20180925_17_51_17_Pro_Thumbs_Up_new;Thumbs_Up_new;4\n'],
      dtype='<U88')

In [24]:
gen = Generator(folder_list=train_doc, 
                      imgIdxList=imgIdxList, 
                      width=width, 
                      height=height, 
                      source_path=train_path, batch_size=batch_size)

In [25]:
train_generator = gen.generator() # Create generator Class' instance

In [26]:
batch_data, batch_label =  next(train_generator)

In [27]:
batch_data.shape # VideoIdx, FrameIdxInVideo, width, height, numChannels

(10, 30, 224, 224, 3)

In [28]:
batch_data[0].shape

(30, 224, 224, 3)

In [29]:
batch_data[0][0].shape

(224, 224, 3)

$\Rightarrow$ <font color="asparagus"> Change the first index to 0, 1, 2, .., (batch_size -1) to view the image. </font>

In [30]:
#cv2.imshow("First", batch_data[2][0])
#cv2.waitKey(0)

In [31]:
batch_data, batch_label =  next(train_generator)

In [32]:
#cv2.imshow("First", batch_data[0][0])
#cv2.waitKey(0)

$\Rightarrow$ <font color="asparagus"> If you try to see batch_data[1][0], it should be all zeros, since batch size is 3, number of videos = 4. The second next(train_generator) statement will generate only one video, the other two will tensors will be all zeros because of: <br> batch_data = np.zeros((batch_size, self.numFramesInVideo, self.width, self.height, self.numChannels)) </font>

# <font color='goldenrod'> III. Model Deployment </font>

$\Rightarrow$ <font color="red "> TODO: REMOVE THIS COMMENT !! </font> <br> 
Here you make the model using different functionalities that Keras provides. Remember to use `Conv3D` and `MaxPooling3D` and not `Conv2D` and `Maxpooling2D` for a 3D convolution model. You would want to use `TimeDistributed` while building a Conv2D + RNN model. Also remember that the last layer is the softmax. Design the network in such a way that the model is able to give good accuracy on the least number of parameters so that it can fit in the memory of the webcam.

In [33]:
# output layer parameters
n_output          =  5 # number of classes in case of classification, 1 in case of regression
output_activation =  "softmax"# "softmax" or "sigmoid" in case of classification, "linear" in case of regression

In [34]:
numFeaturesInFirstLayer = 16

### <font color='skyblue'>  III.1. Custom Conv3d model </font> 

$\Rightarrow$ <font color="asparagus"> Utility to get conv3d model </font>

In [35]:
def get_conv3d_model(numFeaturesInFirstLayer, numFrames, output_activation,
                     width=224, height=224, numChannels=3,
                     numClasses=5, numNeuronsInDenseLayer=256):
    #Model
    model=models.Sequential()

    # Convolution layer with 64 features, 3x3 filter and relu activation with 2x2 pooling
    model.add(layers.Conv3D(numFeaturesInFirstLayer,(3,3,3),padding = 'same',activation='relu', 
                            input_shape=(numFrames,width,height,numChannels)))
    model.add(layers.MaxPooling3D())

    # Convolution layer with 128 features, 3x3 filter and relu activation with 2x2 pooling
    model.add(layers.Conv3D((numFeaturesInFirstLayer*2),(3,3,3),padding = 'same',activation='relu'))
    model.add(layers.MaxPooling3D())

    # Convolution layer with 128 features, 3x3 filter and relu activation with 2x2 pooling
    model.add(layers.Conv3D((numFeaturesInFirstLayer*4),(3,3,3),padding = 'same',activation='relu'))
    model.add(layers.MaxPooling3D())
    
    # Convolution layer with 128 features, 3x3 filter and relu activation with 2x2 pooling
    model.add(layers.Conv3D((numFeaturesInFirstLayer*8),(3,3,3),padding = 'same',activation='relu'))
    model.add(layers.MaxPooling3D())
    
    model.add(layers.Flatten())
    model.add(layers.Dense(numNeuronsInDenseLayer,activation='relu'))
    model.add(Dropout(0.4))
    model.add(layers.Dense(numClasses,activation=output_activation))
    return model

$\Rightarrow$ <font color="asparagus"> Some Constants </font>

In [36]:
numFeaturesInFirstLayer = 16
numFrames = len(imgIdxList)
model = get_conv3d_model(numFeaturesInFirstLayer, numFrames, output_activation=output_activation)

$\Rightarrow$ <font color="asparagus"> Compile model </font>

In [37]:
optimiser = "adam"
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d (Conv3D)             (None, 30, 224, 224, 16)  1312      
                                                                 
 max_pooling3d (MaxPooling3D  (None, 15, 112, 112, 16)  0        
 )                                                               
                                                                 
 conv3d_1 (Conv3D)           (None, 15, 112, 112, 32)  13856     
                                                                 
 max_pooling3d_1 (MaxPooling  (None, 7, 56, 56, 32)    0         
 3D)                                                             
                                                                 
 conv3d_2 (Conv3D)           (None, 7, 56, 56, 64)     55360     
                                                                 
 max_pooling3d_2 (MaxPooling  (None, 3, 28, 28, 64)    0

### <font color='skyblue'>  III.2. Train and Val Generators </font> 

In [38]:
gen = Generator(folder_list=train_doc, 
                      imgIdxList=imgIdxList, 
                      width=width, 
                      height=height, 
                      source_path=train_path, batch_size=batch_size)

In [39]:
train_generator = gen.generator() # Get generator object

In [40]:
val_gen_obj = Generator(folder_list=val_doc,
                        imgIdxList=imgIdxList, 
                        width=width, 
                        height=height, 
                        source_path=val_path, 
                        batch_size=batch_size)
val_generator = val_gen_obj.generator()

### <font color='skyblue'>  III.3. Few more setup related steps </font> 

$\Rightarrow$ <font color="asparagus"> Creating some callbacks</font>

In [41]:
curr_dt_time = datetime.datetime.now()

In [42]:
model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.001)
callbacks_list = [checkpoint, LR]



$\Rightarrow$ <font color="asparagus"> The `steps_per_epoch` and `validation_steps` are used by `fit_generator` to decide the number of next() calls it need to make. </font>

In [43]:
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

In [44]:
num_epochs = 10

print ('# epochs =', num_epochs)

# epochs = 10


#### III.3.1. Model analysis

In [45]:
history = model.fit(train_generator, 
          steps_per_epoch=steps_per_epoch, 
          epochs=num_epochs, 
          verbose=1,
          callbacks=callbacks_list, 
          validation_data=val_generator,
          validation_steps=validation_steps, 
          class_weight=None, 
          workers=1, initial_epoch=0)


  model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1,


Epoch 1/10
Epoch 1: saving model to model_init_2022-12-2800_21_56.575700\model-00001-1.58484-0.23134-1.40353-0.48000.h5
Epoch 2/10
Epoch 2: saving model to model_init_2022-12-2800_21_56.575700\model-00002-1.21942-0.47761-0.95839-0.65000.h5
Epoch 3/10
Epoch 3: saving model to model_init_2022-12-2800_21_56.575700\model-00003-0.84276-0.61642-0.79920-0.71000.h5
Epoch 4/10
Epoch 4: saving model to model_init_2022-12-2800_21_56.575700\model-00004-0.63910-0.74179-0.67916-0.82000.h5
Epoch 5/10
Epoch 5: saving model to model_init_2022-12-2800_21_56.575700\model-00005-0.49023-0.82537-0.71483-0.81000.h5
Epoch 6/10
Epoch 6: saving model to model_init_2022-12-2800_21_56.575700\model-00006-0.38508-0.84478-0.57779-0.80000.h5
Epoch 7/10
Epoch 7: saving model to model_init_2022-12-2800_21_56.575700\model-00007-0.26253-0.91194-0.73061-0.77000.h5
Epoch 8/10
Epoch 8: saving model to model_init_2022-12-2800_21_56.575700\model-00008-0.20263-0.91791-0.66000-0.81000.h5
Epoch 9/10
Epoch 9: saving model to mode

<keras.callbacks.History at 0x1c29ea079a0>

$\Rightarrow$ <font color="red"> TODO!! Plot history

### <font color='skyblue'>  III.4. CNN + RNN </font> 

In [48]:
from keras.applications.vgg16 import VGG16

#### III.4.1. Feature Extraction: CNN


$\Rightarrow$ <font color="asparagus"> Base Model: VGG16 </font>

In [85]:
# Base Model
# I don't want the fully connected layers on the right; I'll add my own: include_top = False
baseModel = (VGG16(weights="imagenet", include_top=False, input_shape=(width, height, numChannels)))
baseModel.trainable = False ## Don't start training all over again

In [86]:
baseModel.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0     

$\Rightarrow$ <font color="asparagus"> Add dense layers and ouput layer with 5 classes </font>

In [70]:
flattenLayer = layers.Flatten()
denseLayer = layers.Dense(256, activation="relu")
denseLayer = layers.Dense(512, activation="relu")
outputLayer = layers.Dense(5, activation="softmax")

$\Rightarrow$ <font color="asparagus"> Train and Val Generator objects

In [89]:
gen = Generator(folder_list=train_doc, 
                      imgIdxList=imgIdxList, 
                      width=width, 
                      height=height, 
                      source_path=train_path, batch_size=batch_size)

In [90]:
train_generator = gen.generator() # Get train generator object
val_generator = val_gen_obj.generator() # Get val generator obj

$\Rightarrow$ <font color="asparagus"> Compile the model

In [66]:
optimiser = "adam"
baseModel.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg16 (Functional)          (None, 7, 7, 512)         14714688  
                                                                 
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________
None


### Experiments

In [109]:
baseModel.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0     

In [81]:
from keras.preprocessing.image import ImageDataGenerator

In [103]:
# Create the ImageDataGenerator object
datagen = ImageDataGenerator()

In [114]:
# Create the generator object using the flow_from_directory method
generator = datagen.flow_from_directory(train_path,
                                        target_size=(224, 224),
                                        batch_size=batch_size,
                                        class_mode=None,  # Set class_mode to None since we are not using labels
                                        shuffle=False)  # Set shuffle to False to preserve the order of the frames

Found 19890 images belonging to 663 classes.


In [112]:
out = next(generator)

In [121]:
cv2.imshow("img", out[2])
cv2.waitKey(0)

-1

In [123]:
for i in range(10):
    cv2.imshow("img", out[i])
    cv2.waitKey(0)

In [115]:
features = baseModel.predict_generator(generator)

TypeError: Model.predict_generator() got an unexpected keyword argument 'batch_size'

TypeError: 'numpy.ndarray' object is not an iterator