# Gesture Recognition
#### Submitted by
- Sameer Soin
- Ayush Mandowara

## Problem Statement

A smart tv manufactures wants to add gesture based controls to their TVs. 

To start with, the following 5 gestures are planned to be undstood by the TV:
- Thumbs Up to increase volume
- Thumbs Down to decrease volume
- Left Swipe to move 10 seconds back
- Right Swipe to move 10 seconds ahead
- Open Palm (Stop) to pause

The hardware and software to capture and take action based on the gestures already exists with the manufacturer, our focus will be on `Recognising the Gestures`.

## Data
- The data we have been provided with to train our model consists of images / frames taken in a sequence (videos that are already broken down into images) for various individuals showing the above mentioned hand gestures.  
- The data is labelled with the different classes (gestures) that need to be identified.

## Approach
To do this, we will be using `Deep Learning`. Specifically, we will be trying two approaches:
- Approach 1: 3D CNN Model  
- Approach 2: A CNN + RNN Model


---

# Imports

In [119]:
import cv2
import datetime
import numpy as np
import os
import pandas as pd
import random as rn

from keras import backend as K
import tensorflow as tf
from cv2 import imread
from sklearn.utils import shuffle

import matplotlib.pyplot as plt

%matplotlib inline

In [109]:
# setting up logger to enable / disable debug statements quickly.

import sys
import logging
from importlib import reload
reload(logging)

logging.basicConfig(stream=sys.stdout, format='',
                level=logging.INFO, datefmt=None)
log = logging.getLogger(__name__)

In [64]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 6007806666564665693
xla_global_id: -1
]


### Fixed Random Seeds
- This helps in reproducing results in subsequent runs

In [5]:
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
rn.seed(RANDOM_SEED)
tf.random.set_seed(RANDOM_SEED)

## Reading the Data
- The data is labelled
- The file paths along with labels are stored in csv files
- Data is already divided into train and validation folders 

In [6]:
train_doc = np.random.permutation(open('Project_data/train.csv').readlines())
val_doc = np.random.permutation(open('Project_data/val.csv').readlines())

project_root = "Project_data"
train_folder = os.path.join(project_root, "train")
val_folder = os.path.join(project_root, "val")

In [7]:
train_doc[0]

'WIN_20180925_18_23_57_Pro_Thumbs_Down_new;Thumbs_Down_new;3\n'

In [8]:
len(train_doc)

663

In [9]:
train_df = pd.read_csv('Project_data/train.csv', delimiter=';', names=['Video Folder', 'Gesture', 'Label'])

In [10]:
len(train_df)

663

In [11]:
train_df.head(3)

Unnamed: 0,Video Folder,Gesture,Label
0,WIN_20180925_17_08_43_Pro_Left_Swipe_new,Left_Swipe_new,0
1,WIN_20180925_17_18_28_Pro_Left_Swipe_new,Left_Swipe_new,0
2,WIN_20180925_17_18_56_Pro_Left_Swipe_new,Left_Swipe_new,0


In [12]:
train_df.tail(3)

Unnamed: 0,Video Folder,Gesture,Label
660,WIN_20180907_16_42_05_Pro_Thumbs Up_new,Thumbs Up_new,4
661,WIN_20180907_16_42_55_Pro_Thumbs Up_new,Thumbs Up_new,4
662,WIN_20180907_16_43_39_Pro_Thumbs Up_new,Thumbs Up_new,4


In [13]:
val_df = pd.read_csv('Project_data/val.csv', delimiter=';', names=['Video Folder', 'Gesture', 'Label'])

In [14]:
val_df.head(3)

Unnamed: 0,Video Folder,Gesture,Label
0,WIN_20180925_17_17_04_Pro_Left_Swipe_new,Left_Swipe_new,0
1,WIN_20180925_17_43_01_Pro_Left_Swipe_new,Left_Swipe_new,0
2,WIN_20180925_18_01_40_Pro_Left_Swipe_new,Left_Swipe_new,0


In [15]:
val_df.tail(3)

Unnamed: 0,Video Folder,Gesture,Label
97,WIN_20180907_15_54_30_Pro_Thumbs Up_new,Thumbs Up_new,4
98,WIN_20180907_16_10_59_Pro_Thumbs Up_new,Thumbs Up_new,4
99,WIN_20180907_16_39_59_Pro_Thumbs Up_new,Thumbs Up_new,4


In [16]:
train_df = shuffle(train_df, random_state=RANDOM_SEED)

In [17]:
train_df.head(5)

Unnamed: 0,Video Folder,Gesture,Label
327,WIN_20180925_18_23_57_Pro_Thumbs_Down_new,Thumbs_Down_new,3
579,WIN_20180907_16_21_11_Pro_Stop Gesture_new,Stop Gesture_new,2
513,WIN_20180907_16_38_29_Pro_Left Swipe_new_Left ...,Left Swipe_new_Left Swipe_new,0
362,WIN_20180926_17_23_38_Pro_Thumbs_Down_new,Thumbs_Down_new,3
265,WIN_20180926_17_21_49_Pro_Stop_new,Stop_new,2


In [18]:
val_df = shuffle(val_df, random_state=RANDOM_SEED)

In [19]:
val_df.head(5)

Unnamed: 0,Video Folder,Gesture,Label
83,WIN_20180907_16_30_54_Pro_Stop Gesture_new,Stop Gesture_new,2
53,WIN_20180925_17_38_43_Pro_Thumbs_Up_new,Thumbs_Up_new,4
70,WIN_20180907_15_55_06_Pro_Right Swipe_new,Right Swipe_new,1
45,WIN_20180926_16_57_50_Pro_Thumbs_Down_new,Thumbs_Down_new,3
44,WIN_20180926_16_44_04_Pro_Thumbs_Down_new,Thumbs_Down_new,3


In [20]:
train_df.reset_index(drop=True, inplace=True)
val_df.reset_index(drop=True, inplace=True)

### Display a sequence

In [21]:
train_df.head(1)

Unnamed: 0,Video Folder,Gesture,Label
0,WIN_20180925_18_23_57_Pro_Thumbs_Down_new,Thumbs_Down_new,3


In [22]:
def get_video_path_train(idx):
    video_name = train_df.iloc[idx]['Video Folder']
    video_path = os.path.join(train_folder, video_name)
    return video_path

def get_image_list_train(idx):
    ims = os.listdir(get_video_path_train(idx))
    return ims

In [23]:
get_video_path_train(0)

'Project_data\\train\\WIN_20180925_18_23_57_Pro_Thumbs_Down_new'

In [24]:
ims = get_image_list_train(0)

In [25]:
len(ims)

30

In [26]:
def plot_sequence(train_idx, rows=3, columns=10, fig_size=(20,3), step_size=1):
    fig = plt.figure(figsize=fig_size)
    ims = get_image_list_train(train_idx)
    folder_path = get_video_path_train(train_idx)
    
    for i in range(1, columns*rows+1, step_size):
        img = imread(os.path.join(folder_path, ims[i-1]))
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        fig.add_subplot(rows, columns, i)
        plt.imshow(img)

    plt.show()

In [27]:
# plot_sequence(0)

In [28]:
# plot_sequence(1)

### Checking GPU Specs

In [29]:
# !nvidia-smi

# Generator

### Problems
- Since the data is huge, it cannot be processed in a single go. The machine will throw out of memory error.
- There are images in two types (dimension 120x120 and 360x), we need to make the dimensions same
- There is some room for skipping images to speed up the training process
- Data augmentation may be required to increase accuracy
- Ablation will be required to reduce analysis time

### Solution
All of the above can be achieved with the help of a custom generator which generates data in batches as per requirement.

In [30]:
class ImageProps:
    """Image class to easily store, retrieve and update properties of input images during training"""
    img_selection_via_idx = [3, 6, 9, 12, 15, 18, 21]
    img_selection_len = len(img_selection_via_idx)
    
    img_resize_height = 100
    img_resize_width = 100
    
    img_crop_width_lower_limit = 10
    img_crop_width_upper_limit = 90
    img_crop_height_lower_limit = 10
    img_crop_height_upper_limit = 90
    
    img_height = 80
    img_width = 80
    
    def normalize_channel(self, input_channel, lower_percentile=5, upper_percentile=95):
        """To normalize input channel using percentile values"""
        lower_percentile_val = np.percentile(input_channel, lower_percentile)
        upper_percentile_val = np.percentile(input_channel, upper_percentile)
        
        numerator = input_channel-lower_percentile_val
        denominator = upper_percentile_val-lower_percentile_val
        
        normalized_channel = numerator/denominator
        
        return normalized_channel

In [31]:
img_props = ImageProps()

In [32]:
img_props.img_selection_via_idx

[3, 6, 9, 12, 15, 18, 21]

In [174]:
class VideoBatchGenerator:
    """Generator class to generate images in batches as per requirement
    
    Number of channels in RGB image is 3
    Number of gestures / output classes is 5
    
    Batch Data dimensions:
    - images have 2 dimensions (width x height)
    - rgb images have 3 channels (width x height x 3)
    - videos are sequence of rgb images (sequence of images x width x height x 3)
    - each batch has prespecified number of videos (batch size * sequence of images * width * height * 3)
    """
    batch_size = 3
    num_images_per_video = img_props.img_selection_len
    img_height = img_props.img_height
    img_width = img_props.img_width
    NUM_RGB_CHANNELS = 3
    NUM_CLASSES = 5
    
    def batch_generator(self, parent_folder_path, df):
        num_videos = len(df)
        # batch size cannot be larger than the input video sequence
        self.batch_size = min(self.batch_size, num_videos)
        num_batches = num_videos//self.batch_size
        extra_batch_size = num_videos%self.batch_size
        
        log.info(f"Source Path: {parent_folder_path}")
        log.info(f"Number of Videos: {num_videos}")
        log.info(f"Batch Size: {self.batch_size}") 
        log.info(f"Number of Batches: {num_batches}")
        log.info(f"Extra Batch Size (zero means no extra batch): {extra_batch_size}")
        
        while True:
            shuffled_df = shuffle(df, random_state=RANDOM_SEED)
            shuffled_video_folders = shuffled_df['Video Folder']
            shuffled_labels = shuffled_df['Label']
            
            logging.debug(f"{shuffled_df.head()}")
            
            for batch_id in range(num_batches):
                log.info(f"Current Batch: {batch_id}")
                batch_data = np.zeros((self.batch_size, 
                                       self.num_images_per_video, 
                                       self.img_width, self.img_height, 
                                       self.NUM_RGB_CHANNELS))
                batch_labels = np.zeros((self.batch_size, 
                                         self.NUM_CLASSES))
                
                for video_id in range(self.batch_size):
                    video_folder_id = video_id + batch_id*self.batch_size
                    video_folder_path = os.path.join(parent_folder_path, shuffled_video_folders[video_folder_id])
                    logging.debug(f'id: {video_folder_id} video_folder_path: {video_folder_path}')
                    imgs_in_video = os.listdir(video_folder_path)
                    logging.debug(f'first image: {imgs_in_video[0]}')
                    
                    for img_id, img_id_in_video in enumerate(img_props.img_selection_via_idx):
                        img = imgs_in_video[img_id_in_video]
                        logging.debug(f'current image via selection: {img}')
                        img_path = os.path.join(video_folder_path, img)
                        logging.debug(f'current image via selection [path]: {img_path}')
                        img_array = imread(img_path)
                        img_array = cv2.cvtColor(img_array, cv2.COLOR_BGR2RGB)
                        
                        #plt.imshow(img_array)
                        #plt.show()
                        
                        resized_image = cv2.resize(img_array, (
                                        img_props.img_resize_width, 
                                        img_props.img_resize_height,
                                        ))
                        
                        cropped_image = resized_image = resized_image[
                            img_props.img_crop_width_lower_limit:img_props.img_crop_width_upper_limit,
                            img_props.img_crop_height_lower_limit:img_props.img_crop_height_upper_limit,
                        ]
                        
                        logging.debug(f"Shape of cropped image (after resize): {cropped_image.shape}")
                        
                        red_channel = cropped_image[:, :, 0]
                        green_channel = cropped_image[:, :, 1]
                        blue_channel = cropped_image[:, :, 2]
                        
                        batch_data[video_id, img_id, :, :, 0] = img_props.normalize_channel(red_channel)
                        batch_data[video_id, img_id, :, :, 1] = img_props.normalize_channel(green_channel)
                        batch_data[video_id, img_id, :, :, 2] = img_props.normalize_channel(blue_channel)
                        
                    batch_labels[video_id, shuffled_labels[video_folder_id]] = 1
                    logging.debug(f"batch data: for video:img [{video_id}:{img_id}] = {batch_data[video_id, img_id, :, :, 0]}")
                    logging.debug(f"batch label: {video_id} = {batch_labels[video_id]}")
                yield batch_data, batch_labels
            
            if extra_batch_size:
                last_batch_id = batch_id + 1
                log.info(f"Current Batch (Extra Batch): {last_batch_id}")
                batch_data = np.zeros((extra_batch_size, 
                                       self.num_images_per_video, 
                                       self.img_width, self.img_height, 
                                       self.NUM_RGB_CHANNELS))
                batch_labels = np.zeros((extra_batch_size, 
                                         self.NUM_CLASSES))
                
                for video_id in range(extra_batch_size):
                    video_folder_id = video_id + last_batch_id*extra_batch_size
                    video_folder_path = os.path.join(parent_folder_path, shuffled_video_folders[video_folder_id])
                    logging.debug(f'id: {video_folder_id} video_folder_path: {video_folder_path}')
                    imgs_in_video = os.listdir(video_folder_path)
                    logging.debug(f'first image: {imgs_in_video[0]}')
                    
                    for img_id, img_id_in_video in enumerate(img_props.img_selection_via_idx):
                        img = imgs_in_video[img_id_in_video]
                        logging.debug(f'current image via selection: {img}')
                        img_path = os.path.join(video_folder_path, img)
                        logging.debug(f'current image via selection [path]: {img_path}')
                        img_array = imread(img_path)
                        img_array = cv2.cvtColor(img_array, cv2.COLOR_BGR2RGB)
                        
                        #plt.imshow(img_array)
                        #plt.show()
                        
                        resized_image = cv2.resize(img_array, (
                                        img_props.img_resize_width, 
                                        img_props.img_resize_height,
                                        ))
                        
                        cropped_image = resized_image = resized_image[
                            img_props.img_crop_width_lower_limit:img_props.img_crop_width_upper_limit,
                            img_props.img_crop_height_lower_limit:img_props.img_crop_height_upper_limit,
                        ]
                        
                        logging.debug(f"Shape of cropped image (after resize): {cropped_image.shape}")
                        
                        red_channel = cropped_image[:, :, 0]
                        green_channel = cropped_image[:, :, 1]
                        blue_channel = cropped_image[:, :, 2]
                        
                        batch_data[video_id, img_id, :, :, 0] = img_props.normalize_channel(red_channel)
                        batch_data[video_id, img_id, :, :, 1] = img_props.normalize_channel(green_channel)
                        batch_data[video_id, img_id, :, :, 2] = img_props.normalize_channel(blue_channel)
                        
                    batch_labels[video_id, shuffled_labels[video_folder_id]] = 1
                    logging.debug(f"batch data: for video:img [{video_id}:{img_id}] = {batch_data[video_id, img_id, :, :, 0]}")
                    logging.debug(f"batch label: {video_id} = {batch_labels[video_id]}")
                yield batch_data, batch_labels

In [175]:
v = VideoBatchGenerator()

In [176]:
x = v.batch_generator(train_folder, train_df[0:10])

In [186]:
d = next(x)

Current Batch: 0


In [180]:
train_df.head(2)

Unnamed: 0,Video Folder,Gesture,Label
0,WIN_20180925_18_23_57_Pro_Thumbs_Down_new,Thumbs_Down_new,3
1,WIN_20180907_16_21_11_Pro_Stop Gesture_new,Stop Gesture_new,2


In [148]:
train_df[0:4]

Unnamed: 0,Video Folder,Gesture,Label
0,WIN_20180925_18_23_57_Pro_Thumbs_Down_new,Thumbs_Down_new,3
1,WIN_20180907_16_21_11_Pro_Stop Gesture_new,Stop Gesture_new,2
2,WIN_20180907_16_38_29_Pro_Left Swipe_new_Left ...,Left Swipe_new_Left Swipe_new,0
3,WIN_20180926_17_23_38_Pro_Thumbs_Down_new,Thumbs_Down_new,3


In [114]:
v.batch_size = 3

In [136]:
x = v.batch_generator(train_folder, train_df[0:10])

In [137]:
data = next(x)

Source Path: Project_data\train
Number of Videos: 10
Batch Size: 3
Number of Batches: 3
Extra Batch Size (zero means no extra batch): 1
Current Batch: 0


In [144]:
data = next(x)

Current Batch (Extra Batch): 3


In [135]:
data[1]

array([[0., 0., 0., 1., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1.]])

In [None]:
%matplotlib inline
plt.imshow(v.img_array[:,:,0])