----------------------------------------------------------------------------------------------------------
***********************************************************************************************************

# <span style="color:Purple"> Computer vision for machine learning Project: "Detecting hand gestures"


#### Task 5 & 6: 

#### Author: Lynda Attouche
#### Link: https://drive.google.com/file/d/1IoZ_pyLh-EYjCTMZyhkoifGnKLriP7lw/view?usp=sharing
*******************************************************************************************
----------------------------------------------------------------------------------------------------------


## README
* Throughout this notebook, no special commands are needed to run the code. Simply run the cells in order. 

* The code of some tasks loops endlessly, to stop them a counter has been set up. It is possible to comment the counter and so to move on to the next task, just stop it manually and run from the next cell 

## Imports

#### Libraries

In [None]:
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode, b64encode
import numpy as np
from PIL import Image
import io
import cv2
from matplotlib import pyplot as plt
import time
import os
import random

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#dataset 1: letters with equal number of pictures and a lot of variability
#link: https://drive.google.com/drive/folders/1u9XBwpw8lxd2TtpLyBHlbf9eCqCBLwcC?usp=sharing
dataset_path1 = "/content/drive/MyDrive/ComputerVision/HandGesture_1/" 

#dataset 2: letters with unbalanced number of pictures and a lot of variability
#link: https://drive.google.com/drive/folders/1TnHp9qpislCBCd6Y8rVctV9eibujBYnm?usp=sharing
dataset_path2 = "/content/drive/MyDrive/ComputerVision/HandGesture_2/" 

#dataset 3: letters with equal number of pictures and one of them with no variability 
#link: https://drive.google.com/drive/folders/1zGyOR2_XicBWWQ0NY-ShuP1co9jnPzZj?usp=sharing
dataset_path3 = "/content/drive/MyDrive/ComputerVision/HandGesture_3/" 
path = "/content/drive/MyDrive/ComputerVision/"

#### OpenCV

In [None]:
!git clone https://github.com/opencv/opencv/

Cloning into 'opencv'...
remote: Enumerating objects: 305186, done.[K
remote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 305186 (delta 0), reused 4 (delta 0), pack-reused 305181[K
Receiving objects: 100% (305186/305186), 494.50 MiB | 28.10 MiB/s, done.
Resolving deltas: 100% (212371/212371), done.
Checking out files: 100% (7044/7044), done.


In [None]:
face_cascade_path = "/content/opencv/data/haarcascades/haarcascade_frontalface_alt.xml"
face_cascades = cv2.CascadeClassifier(face_cascade_path)

### Goal 
The goal of these two tasks is to generate datasets from the probability map containing the sign presented by the hand after its detection. As it will be detailed a bit more: 4 letters have been chosen (A,E,K,Y) and the 3 datasets are characterized as follows:
* dataset 1: letters with equal number of pictures and a lot of variability

* dataset 2: letters with unbalanced number of pictures and a lot of variability (K: the minority class, A: majority)

* dataset 3: letters with equal number of pictures and one of them with no variability (class K)

### Used functions


In [None]:
#Converting image types
def byte2image(byte):
  jpeg = b64decode(byte.split(',')[1])
  im = Image.open(io.BytesIO(jpeg))
  return np.array(im)

def image2byte(image):
  image = Image.fromarray(image)
  buffer = io.BytesIO()
  image.save(buffer, 'jpeg')
  buffer.seek(0)
  x = b64encode(buffer.read()).decode('utf-8')
  return x

In [None]:
def VideoCapture():
  js = Javascript('''
    async function create(){
      div = document.createElement('div'); //create new div element
      document.body.appendChild(div); //add the content of the new element to the DOM

      video = document.createElement('video'); //create new video element
      video.setAttribute('playsinline', ''); //setting attributes of the element

      div.appendChild(video); //add the content of video the the div element

      //Selecting facing mode of the video stream
      stream = await navigator.mediaDevices.getUserMedia({video: {facingMode: "environment"}});
      video.srcObject = stream;

      await video.play(); //playing video

      canvas =  document.createElement('canvas'); //create new canvas element
      // set canvas size 
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);

      div_out = document.createElement('div'); //create a new div element, will contains the output
      document.body.appendChild(div_out); //add the content of the div_out to the DOM
      img = document.createElement('img'); //create the image element (will contain the image/capture we'll take)
      div_out.appendChild(img); //add the image element to the output div
    }

    //taking the capture and storing it
    async function capture(){
        return await new Promise(function(resolve, reject){ // Wait for Capture to be clicked.
            pendingResolve = resolve;
            canvas.getContext('2d').drawImage(video, 0, 0); //draw an image onto the canvas.
            result = canvas.toDataURL('image/jpeg', 0.8);
            pendingResolve(result);
        })
    }

    //displaying the capture 
    function showimg(imgb64){
        img.src = "data:image/jpg;base64," + imgb64;
    }

  ''')
  display(js)

In [None]:
def compute_region(img,margin,prev):
  """
  Computes  region of interest 
  @params:
          - img (array): image on which detection will be done
          - margin (int): the margin to be taken from the previous region (the coordinate shift)
          - prev (array): previous face detection
  @return 
          - region of interest 
  """
  # as seen in the previous task, the cascadeClassifier returns a face as a rectangle
  # so this is the case for the param prev, ie: prev = (x,y,w,h) = (prev[0],prev[1],prev[2],prev[3])
  # where (x,y) is the top left corner and (w,h) the bottom right corner
  # the goal is then to compute (x',y',w',h') considering the margin and previous region to define the new region such that:
  # x' = x - margin
  # y' = y - margin 
  # w' = (x+w) + margin
  # h' = (y+h) + margin

  #top left corner
  x_prime = prev[0] - margin 
  y_prime = prev[1] - margin 

  #bottom right
  w_prime = prev[0]+prev[2]+margin 
  h_prime = prev[1]+prev[3]+margin 

  # Note: 
  #the new region must stay in the image and not be out of it (I noticed that after some tests because the results were weird)
  # i.e:
  #top left corner should not be negative (as a subtraction is made from the previous x and y) càd:  x_prime>=0 and y_prime>=0
  #if either x or y (or both of them) is negative, it should be set to 0
  x_prime= max(0,x_prime) # = 0 if x_prime<0
  y_prime = max(0,y_prime) #= 0 if y_prime<0

  #for the bottom right corner should not be out of the image in the sense that we should not obtain values that go beyond the coordinate of the 
  #image (since w' for example is a result of increasing x+w with a margin )
  #to handle that possible problem, the value should be set to the image height or width (depends on which coordinate) of the image, as follows
  (imgH,imgW) = img.shape[0], img.shape[1]
  w_prime = min(imgW,w_prime) # = image width if x_prime>image width
  h_prime = min(imgH,h_prime) # = image heigh if y_prime>image heigh

  return (x_prime,y_prime,w_prime-x_prime,h_prime-y_prime)

In [None]:
def detect_face(im,prev,margin):
  """
  Detects face regions
  @param:
        - im (array): image/capture
        - prev (array): previous detected area
        - margin (int): margin for detection 
  @return 
        detected region w/out margin 
  """
  new_reg = im.copy() #starter region, whole image (first step, step t before detection)
  curr_face = None #contains the detected face
  #if we didn't detect a face yet
  if prev is not None :
    #print("I am here")
    x_prime,y_prime,w_prime,h_prime = compute_region(im,margin,prev) #computing the new region (of timestep t+1)
    #my new region:
    new_reg = im[y_prime:y_prime+h_prime,x_prime:x_prime+w_prime] 
    #plot the rectangle 
    cv2.rectangle(im, (x_prime,y_prime), (x_prime+w_prime, y_prime+h_prime), (255,0,0),2)

  gray = cv2.cvtColor(new_reg, cv2.COLOR_BGR2GRAY) # Converting image to gray scale
  #face detection using face cascade 
  faces = face_cascades.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=4)
  if len(faces)==0: # no face has been detected 
    prev = None  #we won't have a roi *
  else:
    curr_face = faces[0] #we pick the first face detected
    if prev is None: #previous region surrounding the face picked
      (x,y,w,h) = curr_face #get the face region coordinates
    else: #we already have a face, so we have already computed xprime,yprime,
      (x,y,w,h)=(x_prime+curr_face[0], y_prime+curr_face[1], curr_face[2], curr_face[3]) #updating with the new coordinates (primes)
    #so we update our region 
    prev = (x,y,w,h)
    #we plot the rectangle 
    cv2.rectangle(im, (x,y), (x+w, y+h), (255,0,0),2)    
  return curr_face

In [None]:
def hand_position(pts):
  """
  Computing the position of the box around the hand
  @params:
          pts: the box points (points in corners)
  @return:
          position of the hand
  """
  #in this following lines of code, we'll be take either the min or max so we can control the position of hand
  #insuring that the hands stays insides the image
  x_top_l = max(0, min(pts[:,0]))
  y_top_l = max(0, min(pts[:,1]))
  x_bottom_r = min(im.shape[1], max(pts[:,0]))
  y_bottom_r = min(im.shape[0],  max(pts[:,1 ]))
  hand =  (x_top_l, y_top_l), (x_bottom_r, y_bottom_r)
  return hand

In [None]:
def save_hand(dataset_path,prob, hand_box,letter,res_dim,num):
  """
  Saves image into colab folder
  @params: 
          prob : probability map containing hand
          hand_box : hand position 
          letter: letter represented by the hand gesture
          res_dim: dimension of saving
          num: number/index of the image
  @return:
          saved image
  """

  hand = prob[hand_box[0][1]:hand_box[1][1], hand_box[0][0]:hand_box[1][0]]
  hand = cv2.resize(hand, (res_dim,res_dim)) #resizing the image to given dimension
  filename = dataset_path +'/'+letter+'/'+letter+'_'+str(num)+'_'+str(res_dim)+'.jpg' #naming the image
  return cv2.imwrite(filename, hand)

In [None]:
def save_image(dataset_path,im,letter,res_dim,num):
  """
  Saves image into colab folder
  @params: 
          im : image to save
          letter: letter represented by the hand gesture
          res_dim: dimension of saving
          num: number/index of the image
  @return:
          saved image
  """
  filename = dataset_path +'/'+letter+'/'+letter+'_'+str(num)+'_'+str(res_dim)+'.jpg' #naming the image
  return cv2.imwrite(filename, im)

In [None]:
def to_txt(path,dataset_path,num_dataset):
  """
  Writes images in txt file
  @params:
          path: path to the file
          dataset_path : path to images
  @return:
          txt file in colab with all images written on it
  """
  images_y = os.listdir(dataset_path+'y/') #reading folder containing images of label y
  images_a = os.listdir(dataset_path+'a/') #reading folder containing images of label a
  images_e = os.listdir(dataset_path+'e/') #reading folder containing images of label e
  images_k = os.listdir(dataset_path+'k/') #reading folder containing images of label k
  with open(path+'dataset'+str(num_dataset)+'.txt','w') as f: #writing on the file dataset

    for im_ in images_y: #for each image of label Y
      im = cv2.imread(dataset_path+'y/'+im_,cv2.IMREAD_GRAYSCALE) #reading current image
      if im.shape[0] == 16: #selecting only images of shape 16
        im = im.reshape((1,256)) #reshaping current image of size 16 to (1,256)
        f.write(''.join('Y,'))
        np.savetxt(f, im, delimiter=',', fmt='%d') #writing image on the file 
        
    for im_ in images_a: #for each image of label A
      im = cv2.imread(dataset_path+'a/'+im_,cv2.IMREAD_GRAYSCALE)
      if im.shape[0] == 16:
        im = im.reshape((1,256))
        f.write(''.join('A,'))
        np.savetxt(f, im, delimiter=',', fmt='%d')

    for im_ in images_e: #for each image of label E
      im = cv2.imread(dataset_path+'e/'+im_,cv2.IMREAD_GRAYSCALE)
      if im.shape[0] == 16:
        im = im.reshape((1,256))
        f.write(''.join('E,'))
        np.savetxt(f, im, delimiter=',', fmt='%d')
    
    for im_ in images_k: #for each image of label K
      im = cv2.imread(dataset_path+'k/'+im_,cv2.IMREAD_GRAYSCALE)
      if im.shape[0] == 16:
        im = im.reshape((1,256))
        f.write(''.join('K,'))
        np.savetxt(f, im, delimiter=',', fmt='%d')

In [None]:
def transformations(dataset_path,image,letter,res_dim,num):
  """
  Applies a specific transformation on a given image and saves it
  @params:
          image : image on which we apply transformation
          letter: letter represented by the hand gesture
          res_dim: dimension of saving
          num: number/index of the image

  @return:
          transformed and saved image (array) 
          last index (int)
  """
  width, height = image.shape

  #copy the initial image
  transformed_image1 = np.copy(image)
  #rotation matrix 2x3:
  #((height-1)/2.0,(width-1)/2.0) is the center of rotation and here it is the center of image
  #the angle is 180°
  #1 indicates that the scale is not changed
  Mrot = cv2.getRotationMatrix2D(((height-1)/2.0,(width-1)/2.0),180,1) 
  #here an affine transformation(matrix) used to project the rotated image
  transformed_image1 = cv2.warpAffine(transformed_image1,Mrot,(height,width))
  save_image(dataset_path,transformed_image1,letter,res_dim,num+1)

  transformed_image2 = np.copy(image)
  #rotation matrix 2x3:
  #((height-1)/2.0,(width-1)/2.0) is the center of rotation and here it is the center of image
  #the angle is 90°
  #1 indicates that the scale is not changed
  Mrot = cv2.getRotationMatrix2D(((height-1)/2.0,(width-1)/2.0),90,1) 
  #here an affine transformation(matrix) used to project the rotated image
  transformed_image2 = cv2.warpAffine(transformed_image2,Mrot,(height,width))
  save_image(dataset_path,transformed_image2,letter,res_dim,num+2)
  
  transformed_image3 = np.copy(image)
  #rotation matrix 2x3:
  #((height-1)/2.0,(width-1)/2.0) is the center of rotation and here it is the center of image
  #the angle is -90°
  #1 indicates that the scale is not changed
  Mrot = cv2.getRotationMatrix2D(((height-1)/2.0,(width-1)/2.0),-90,1) 
  #here an affine transformation(matrix) used to project the rotated image
  transformed_image3 = cv2.warpAffine(transformed_image3,Mrot,(height,width))
  save_image(dataset_path,transformed_image3,letter,res_dim,num+3)

  transformed_image4 = np.copy(image)
  #rotation matrix 2x3:
  #((height-1)/2.0,(width-1)/2.0) is the center of rotation and here it is the center of image
  #the angle is 60°
  #1 indicates that the scale is not changed
  Mrot = cv2.getRotationMatrix2D(((height-1)/2.0,(width-1)/2.0),60,1) 
  #here an affine transformation(matrix) used to project the rotated image
  transformed_image4 = cv2.warpAffine(transformed_image4,Mrot,(height,width))
  save_image(dataset_path,transformed_image4,letter,res_dim,num+4)

  transformed_image5 = np.copy(image)
  #rotation matrix 2x3:
  #((height-1)/2.0,(width-1)/2.0) is the center of rotation and here it is the center of image
  #the angle is -60°
  #1 indicates that the scale is not changed
  Mrot = cv2.getRotationMatrix2D(((height-1)/2.0,(width-1)/2.0),-60,1) 
  #here an affine transformation(matrix) used to project the rotated image
  transformed_image5 = cv2.warpAffine(transformed_image5,Mrot,(height,width))
  save_image(dataset_path,transformed_image5,letter,res_dim,num+5)

  transformed_image6= np.copy(image)
  #3pts(pts1) and their transforms are used to define the matrix of affine transformation
  pts1 = np.float32([[50,50],[200,50],[50,200]])
  pts2 = np.float32([[10,100],[200,50],[100,250]])
  #the obtained matrix:
  Maff = np.float32([[1,0,100],[0,1,50]]) 
  #project the image after affine trasformation
  transformed_image6 = cv2.warpAffine(transformed_image6,Maff,(height,width))
  save_image(dataset_path,transformed_image5,letter,res_dim,num+6)

  transformed_image7= np.copy(image)
  #translating image
  quarter_height, quarter_width = height / 4, width / 4
  T = np.float32([[1, 0, quarter_width], [0, 1, quarter_height]])
  #using warpAffine to transform
  #translating image using the translation matrix T
  transformed_image7 = cv2.warpAffine(transformed_image7, T, (width, height))

  save_image(dataset_path,transformed_image5,letter,res_dim,num+7)
  return num+7 #returns last index

In [None]:
def apply_transformation(dataset_path,c_y,c_a,c_e,c_k):
  """
  Apply transformation for each image in each folder
  @params
          c_y (int): last index in folder y
          c_a (int): last index in folder a
          c_e (int): last index in folder e
          c_k (int): last index in folder k
  @return 
          saved transformed images
  """
  images_y = os.listdir(dataset_path+'y/') #reading content of folder y 
  images_y = [cv2.imread(dataset_path+'y/'+im,cv2.IMREAD_GRAYSCALE) for im in images_y] #reading images of label y
  images_y_16 = [im for im in images_y if im.shape[0]==16] #contains images of label Y size 16x16
  images_y_224 = [im for im in images_y if im.shape[0]==224] #contains images of label Y of size 224x224

  images_a = os.listdir(dataset_path+'a/') #reading content of folder a
  images_a = [cv2.imread(dataset_path+'a/'+im,cv2.IMREAD_GRAYSCALE) for im in images_a] #reading images of label a
  images_a_16 = [im for im in images_a if im.shape[0]==16] #contains images of label A size 16x16
  images_a_224 = [im for im in images_a if im.shape[0]==224] #contains images of label A of size 224x224


  images_e = os.listdir(dataset_path+'e/') #reading content of folder e
  images_e = [cv2.imread(dataset_path+'e/'+im,cv2.IMREAD_GRAYSCALE) for im in images_e] #reading images of label e
  images_e_16 = [im for im in images_e if im.shape[0]==16] #contains images of label E size 16x16
  images_e_224 = [im for im in images_e if im.shape[0]==224] #contains images of label E of size 224x224

  images_k = os.listdir(dataset_path+'k/') #reading content of folder k
  images_k = [cv2.imread(dataset_path+'k/'+im,cv2.IMREAD_GRAYSCALE) for im in images_k] #reading images of label k
  images_k_16 = [im for im in images_k if im.shape[0]==16] #contains images of label K size 16x16
  images_k_224 = [im for im in images_k if im.shape[0]==224] #contains images of label K of size 224x224

  c__a,c__y,c__e,c__k = c_a,c_y,c_e,c_k #taking track of images indices 
  
  #Label: Y
  for im in images_y_16: #for each image of size 16x16
    c_y = transformations(dataset_path,im,'y',16,c_y) #applying & saving transformations
  c_y = c__y #initial value
  for im in images_y_224: #for each image of size 224x224
    c_y = transformations(dataset_path,im,'y',224,c_y) #applying & saving transformations

  #Label: A
  for im in images_a_16: #for each image of size 16x16
    c_a = transformations(dataset_path,im,'a',16,c_a) #applying & saving transformations
  c_a = c__a #initial value
  for im in images_a_224: #for each image of size 224x224
    c_a = transformations(dataset_path,im,'a',224,c_a)  #applying & saving transformations

  #Label: E
  for im in images_e_16: #for each image of size 16x16
    c_e = transformations(dataset_path,im,'e',16,c_e) #applying & saving transformations
  c_e = c__e  #initial value
  for im in images_e_224: #for each image of size 224x224
    c_e = transformations(dataset_path,im,'e',224,c_e) #applying & saving transformations

  #Label: K
  for im in images_k_16: #for each image of size 16x16
    c_k = transformations(dataset_path,im,'k',16,c_k) #applying & saving transformations
  c_k = c__k  #initial value
  for im in images_k_224: #for each image of size 224x224
    c_k = transformations(dataset_path,im,'k',224,c_k)  #applying & saving transformations


#### 1. Face detection
The first step is the detection of the face to be able to calculate the corresponding histogram and make a back projection on the new captures. 

In [None]:
start_time = time.time()
VideoCapture()
eval_js('create()')

#out first starting box: 
prev = None 
margin = 50 #randomly chosen
b = True
c = 0 #counter just to stop the algorithm, since it takes time (infinit loop)
while b:
  byte = eval_js('capture()')
  im = byte2image(byte) 
  curr_face = detect_face(im,prev,margin)
   
  if c>20:
    #in this part, I decided to display only the face frame so I can play with the distance to the camera and I didn't display the big rectangle
    #around, if you want to display it, you just have to return "prev" from the fuction that computes the face detection and display it using:
    #eval_js('showimg("{}")'.format(image2byte(im[prev[1]:prev[1]+prev[3], prev[0]:prev[0]+prev[2]])))
    face_frame = im[curr_face[1]:curr_face[1]+curr_face[3], curr_face[0]:curr_face[0]+curr_face[2]]
    tracking_window_face = curr_face
    eval_js('showimg("{}")'.format(image2byte(face_frame)))
    break
  c +=1
  eval_js('showimg("{}")'.format(image2byte(im)))

#### 2. Computing histogram for detected face

After the detection of the face, the calculation of the histogram is done as follows. This one will be backpropagated to be able to detect on a given capture the different elements of the same color as the face, in this case, the hands. 

In [None]:
def hsv(face_frame):
  """
  Transforms the face frame into HSV and computes histogram
  @params:
        - face_frame: the frame representing the face
  @return:
        - mask: hsv with mask to deal with brightness and darkness pixels
        - histo: histogram computed from the hsv
  """
  #transforming the detected face frame into HSV
  hsv = cv2.cvtColor(face_frame, cv2.COLOR_BGR2HSV)
  #Creating a mask using inRange for the pixels to deal with brightness and darkness
  #allows us to take into consideration the pixels that are too dark and/or too bright
  #the parameters of the mask were found just by playing and testing many of values (the tests were held in one only place)
  mask = cv2.inRange(hsv,np.array((0,60,32)), np.array((180,200,200)))
  #computing histogram of face frame using hue channel ie [0]
  #here we used: 18 as bins for the histogram
  #and the range of the hue was set to [0,180]
  histo = cv2.calcHist([hsv],[0], mask, [18], [0,180])
  #normalizing histogram 0-255
  histo = cv2.normalize(histo, histo, 0, 255, cv2.NORM_MINMAX)
  return mask, hsv,histo

mask,hsv,histo = hsv(face_frame)
#Displaying histogram
plt.imshow(histo.reshape(1,-1))
plt.show()

#### 3. Datecting hand and saving images
In this step, the probability map containing the hand is stored each time. It was decided to set the threshold to 20. This means that for each letter 20 probability maps containing the chosen letter are stored. The size of the dataset will be increased in the next step. 
For each letter, we store the "hand" in dataset 1 with sizes 16 and 224. Respecting the name of the image as it is: letter_index_size. 

* For dataset 1: 20 captures were made with variations of the hand (front, profile, upside down, translate,..). 

* For dataset 2: no capture was required, as it is the same as dataset 1 with different sizes for the classes.

* For dataset 3: the captures of the letters a, e, and y were used. And for the letter k, 20 captures were taken without variation, see some of the images were identical. 

In [None]:
VideoCapture()
eval_js('create()')
#this following line describes a criteria to stop camshift algorithm
#so this algorithm stops when 10 iterations have been carried out or when the computed value is not changing in all the direction by a factor of 1pt 
term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
tracking_window_hand = (0,0,im.shape[1],im.shape[0]) #to keep track of the hand
c = 0 #couter to stop the algo
c_A, c_E, c_K, c_Y = 0,0,0,0
letter = input("Letter?")
while True:
  time.sleep(2)
  byte = eval_js('capture()') # capture
  im = byte2image(byte) #converting capture 
  # Converting the image to HSV
  hsv = cv2.cvtColor(im, cv2.COLOR_BGR2HSV)
  # Computing mask (inRange) as done in the previous cell
  mask = cv2.inRange(hsv,np.array((0,60,32)), np.array((180,200,200)))
  # Back projecting the face frame histogram into the hsv image
  #basically, we have the histogram of colors of the face and we will backproject it in our current image 
  #to detect the part of the image that fit the histogram (have same color as the face)
  prob = cv2.calcBackProject([hsv],[0], histo, [0,180],scale= 1)
  
  # Apply the mask to the backprojection output
  # Helps us to deal with dark or/and bright pixels
  prob = prob & mask

  #Tracking face 
  # Applying cam shift
  (x,y,w,h) = tracking_window_face
  ret,tracking_window_face = cv2.CamShift(prob,tracking_window_face, term_crit)
  # Retrieve the rotated bounding rectangle
  pts = cv2.boxPoints(ret).astype(np.int)
  # fill the face area (prob) with zeros
  cv2.fillPoly(prob, [pts], 0)
  # Draw the face area
  cv2.polylines(im, [pts], True, (255, 255 , 0), 2)
 
  #Tracking hand
  ret2, tracking_window_hand = cv2.CamShift(prob, tracking_window_hand, term_crit)
  
  pts2 = cv2.boxPoints(ret2).astype(np.int)
  hand = hand_position(pts2)

  #drawing the rectangle around hand 
  cv2.rectangle(im, hand[0], hand[1], (0,0,255), 2)

  time.sleep(1) #wait a moment 

  if letter == 'a': #if sign is A
    if c_A >20: #taking only 20 images
      break
    save_hand(dataset_path1,prob,hand,'a',16,c_A) #saving probability map with size 16x16
    save_hand(dataset_path1,prob,hand,'a',224,c_A) #saving probability map size 224x224
    c_A +=1

  elif letter == 'e': #if sign is E
    if c_E >20: #taking only 20 images
      break
    save_hand(dataset_path1,prob,hand,'e',16,c_E) #saving probability map size 16x16
    save_hand(dataset_path1,prob,hand,'e',224,c_E) #saving probability map size 224x224
    c_E +=1
  elif letter == 'k' : #if sign is K
    if c_K >20: #taking only 20 images
      break
    save_hand(dataset_path3,prob,hand,'k',16,c_K) #saving probability map with size 16x16
    save_hand(dataset_path3,prob,hand,'k',224,c_K) #saving probability map with size 224x224
    c_K +=1

  elif letter == 'y': #if sign is Y
    if c_Y >20: #taking only 20 images
      break
    save_hand(dataset_path1,prob,hand,'y',16,c_Y) #saving probability map with size 16x16
    save_hand(dataset_path1,prob,hand,'y',224,c_Y) #saving probability map with size 224x224
    c_Y +=1

  eval_js('showimg("{}")'.format(image2byte(im)))
  eval_js('showimg("{}")'.format(image2byte(prob)))

#### 4. Data augmentation for 1st dataset using OpenCV

Previously, 20 video captures were taken. However, a dataset of at least 100 images per class was required. 
Now, to increase the size of the dataset, geometrical transformations (rotation with 180, 90, and 60 angles and translations) were applied to each image (of each class) which are detailed in transformation function above and that uses OpenCV library.
This allowed us to go from 20 images per label to 168 images. 

In [None]:
c_y,c_a,c_e,c_k = 20,20,20,20 #indices from which new data will be stored 
#applying transformation on saved images using geometric transformation described in the function
apply_transformation(dataset_path1,c_y,c_a,c_e,c_k)

For the second dataset, the k class was reduced to 50 images. The classes e and y were truncated to 100 data. And finally, the class represents the majority class with 168 images. 

#### 5. Increasing 3rd dataset size by duplicating images
For the case of this dataset, a non-variability of the k class was necessary. So, in order to respect this and to have the same number of data as the other classes, a duplication of the images was made. Indeed, each image among the 20 was duplicated a certain number of times to reach the desired size (168 images)

In [None]:
images_k = os.listdir(dataset_path3+'k/') #reading content of folder of label k in 3rd dataset
images_k = [cv2.imread(dataset_path3+'k/'+im,cv2.IMREAD_GRAYSCALE) for im in images_k] #reading images of 3rd dataset of label K
images_k_16 = [im for im in images_k if im.shape[0]==16] #contains images of size 16x16
images_k_224 = [im for im in images_k if im.shape[0]==224] #contains images of size 224x224
#this following steps have been done twice: first from index 20 and 84
c_k, c__k = 20,20  #indices from which new images will be stored 
for imk in images_k_16: #for each image of size 16x16
  save_image(dataset_path3,imk,'k',16,c_k) #duplicating current image
  c_k +=1
c_k = c__k
for imk in images_k_224: #for each image of size 224x224
  save_image(dataset_path3,imk,'k',224,c_k) #duplicating current image
  c_k+=1

#### 6. Text file saving and shuffling data
Now that the datasets are ready, text files have been generated for each of them. They contain the data (images) where each line represents the image and its label. They will be used in the model phase (next task)

In [None]:
#dataset 1
to_txt(path,dataset_path1,1) #generating text file for 1st dataset
#shuffling rows of generated text file
lines = open(path+'dataset1.txt').readlines()
random.shuffle(lines)
open(path+'dataset1.txt', 'w').writelines(lines)

#dataset 2
to_txt(path,dataset_path2,2) #generating text file for 2nd dataset
#shuffling rows of generated text file
lines = open(path+'dataset2.txt').readlines()
random.shuffle(lines)
open(path+'dataset2.txt', 'w').writelines(lines)

#dataset 3
to_txt(path,dataset_path3,3) #generating text file for 3rd dataset
#shuffling rows of generated text file
lines = open(path+'dataset3.txt').readlines()
random.shuffle(lines)
open(path+'dataset3.txt', 'w').writelines(lines)