----------------------------------------------------------------------------------------------------------
***********************************************************************************************************

# <span style="color:Purple"> Computer vision for machine learning Project: "Detecting hand gestures"


#### Task 8: 

#### Author: Lynda Attouche
#### Link: https://colab.research.google.com/drive/1QJxNdlKoWiqh0i70uOAmZ9XLsCyHePBG?usp=sharing
*******************************************************************************************
----------------------------------------------------------------------------------------------------------


## README
* Throughout this notebook, no special commands are needed to run the code. Simply run the cells in order. 

* The code of some tasks loops endlessly, to stop them a counter has been set up. It is possible to comment the counter and so to move on to the next task, just stop it manually and run from the next cell 

In the previous task, we tested the models with the generated datasets. In this task, we will test our models "live", by directly detecting the hand in video capture, transforming it into an image and making the prediction. The different steps will be those seen in the previous, mostly tasks 5-6 and will be concluded by the prediction on the models of task 7. 

## Importing Libraries

In [None]:
import sklearn
import matplotlib.pyplot as plt
import numpy as np
import os
import keras
from keras.models import Sequential
import tensorflow as tf
from keras.layers import Dense, Dropout
from google.colab.output import eval_js
from IPython.display import display, Javascript
from keras.models import model_from_json
import numpy as np
from PIL import Image
import io
import cv2
import time
from base64 import b64decode, b64encode
import random

In [None]:
from google.colab import drive
drive.mount('/content/drive')
path = "/content/drive/MyDrive/ComputerVision/"

Mounted at /content/drive


In [None]:
!git clone https://github.com/opencv/opencv/

Cloning into 'opencv'...
remote: Enumerating objects: 305819, done.[K
remote: Counting objects: 100% (267/267), done.[K
remote: Compressing objects: 100% (194/194), done.[K
remote: Total 305819 (delta 98), reused 170 (delta 64), pack-reused 305552[K
Receiving objects: 100% (305819/305819), 495.25 MiB | 27.54 MiB/s, done.
Resolving deltas: 100% (212738/212738), done.
Checking out files: 100% (7048/7048), done.


In [None]:
face_cascade_path = "/content/opencv/data/haarcascades/haarcascade_frontalface_alt.xml"
face_cascades = cv2.CascadeClassifier(face_cascade_path)

## Used functions 

In [None]:
#Converting image types
def byte2image(byte):
  jpeg = b64decode(byte.split(',')[1])
  im = Image.open(io.BytesIO(jpeg))
  return np.array(im)

def image2byte(image):
  image = Image.fromarray(image)
  buffer = io.BytesIO()
  image.save(buffer, 'jpeg')
  buffer.seek(0)
  x = b64encode(buffer.read()).decode('utf-8')
  return x

In [None]:
def VideoCapture():
  js = Javascript('''
    async function create(){
      div = document.createElement('div'); //create new div element
      document.body.appendChild(div); //add the content of the new element to the DOM

      video = document.createElement('video'); //create new video element
      video.setAttribute('playsinline', ''); //setting attributes of the element

      div.appendChild(video); //add the content of video the the div element

      //Selecting facing mode of the video stream
      stream = await navigator.mediaDevices.getUserMedia({video: {facingMode: "environment"}});
      video.srcObject = stream;

      await video.play(); //playing video

      canvas =  document.createElement('canvas'); //create new canvas element
      // set canvas size 
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);

      div_out = document.createElement('div'); //create a new div element, will contains the output
      document.body.appendChild(div_out); //add the content of the div_out to the DOM
      img = document.createElement('img'); //create the image element (will contain the image/capture we'll take)
      div_out.appendChild(img); //add the image element to the output div
    }

    //taking the capture and storing it
    async function capture(){
        return await new Promise(function(resolve, reject){ // Wait for Capture to be clicked.
            pendingResolve = resolve;
            canvas.getContext('2d').drawImage(video, 0, 0); //draw an image onto the canvas.
            result = canvas.toDataURL('image/jpeg', 0.8);
            pendingResolve(result);
        })
    }

    //displaying the capture 
    function showimg(imgb64){
        img.src = "data:image/jpg;base64," + imgb64;
    }

  ''')
  display(js)

In [None]:
def compute_region(img,margin,prev):
  """
  Computes  region of interest 
  @params:
          - img (array): image on which detection will be done
          - margin (int): the margin to be taken from the previous region (the coordinate shift)
          - prev (array): previous face detection
  @return 
          - region of interest 
  """
  # as seen in the previous task, the cascadeClassifier returns a face as a rectangle
  # so this is the case for the param prev, ie: prev = (x,y,w,h) = (prev[0],prev[1],prev[2],prev[3])
  # where (x,y) is the top left corner and (w,h) the bottom right corner
  # the goal is then to compute (x',y',w',h') considering the margin and previous region to define the new region such that:
  # x' = x - margin
  # y' = y - margin 
  # w' = (x+w) + margin
  # h' = (y+h) + margin

  #top left corner
  x_prime = prev[0] - margin 
  y_prime = prev[1] - margin 

  #bottom right
  w_prime = prev[0]+prev[2]+margin 
  h_prime = prev[1]+prev[3]+margin 

  # Note: 
  #the new region must stay in the image and not be out of it (I noticed that after some tests because the results were weird)
  # i.e:
  #top left corner should not be negative (as a subtraction is made from the previous x and y) càd:  x_prime>=0 and y_prime>=0
  #if either x or y (or both of them) is negative, it should be set to 0
  x_prime= max(0,x_prime) # = 0 if x_prime<0
  y_prime = max(0,y_prime) #= 0 if y_prime<0

  #for the bottom right corner should not be out of the image in the sense that we should not obtain values that go beyond the coordinate of the 
  #image (since w' for example is a result of increasing x+w with a margin )
  #to handle that possible problem, the value should be set to the image height or width (depends on which coordinate) of the image, as follows
  (imgH,imgW) = img.shape[0], img.shape[1]
  w_prime = min(imgW,w_prime) # = image width if x_prime>image width
  h_prime = min(imgH,h_prime) # = image heigh if y_prime>image heigh

  return (x_prime,y_prime,w_prime-x_prime,h_prime-y_prime)

In [None]:
def detect_face(im,prev,margin):
  """
  Detects face regions
  @param:
        - im (array): image/capture
        - prev (array): previous detected area
        - margin (int): margin for detection 
  @return 
        detected region w/out margin 
  """
  new_reg = im.copy() #starter region, whole image (first step, step t before detection)
  curr_face = None #contains the detected face
  #if we didn't detect a face yet
  if prev is not None :
    #print("I am here")
    x_prime,y_prime,w_prime,h_prime = compute_region(im,margin,prev) #computing the new region (of timestep t+1)
    #my new region:
    new_reg = im[y_prime:y_prime+h_prime,x_prime:x_prime+w_prime] 
    #plot the rectangle 
    cv2.rectangle(im, (x_prime,y_prime), (x_prime+w_prime, y_prime+h_prime), (255,0,0),2)

  gray = cv2.cvtColor(new_reg, cv2.COLOR_BGR2GRAY) # Converting image to gray scale
  #face detection using face cascade 
  faces = face_cascades.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=4)
  if len(faces)==0: # no face has been detected 
    prev = None  #we won't have a roi *
  else:
    curr_face = faces[0] #we pick the first face detected
    if prev is None: #previous region surrounding the face picked
      (x,y,w,h) = curr_face #get the face region coordinates
    else: #we already have a face, so we have already computed xprime,yprime,
      (x,y,w,h)=(x_prime+curr_face[0], y_prime+curr_face[1], curr_face[2], curr_face[3]) #updating with the new coordinates (primes)
    #so we update our region 
    prev = (x,y,w,h)
    #we plot the rectangle 
    cv2.rectangle(im, (x,y), (x+w, y+h), (255,0,0),2)    
  return curr_face

In [None]:
def hand_position(im,pts):
  """
  Computing the position of the box around the hand
  @params:
          pts: the box points (points in corners)
  @return:
          position of the hand
  """
  #in this following lines of code, we'll be take either the min or max so we can control the position of hand
  #insuring that the hands stays insides the image
  x_top_l = max(0, min(pts[:,0]))
  y_top_l = max(0, min(pts[:,1]))
  x_bottom_r = min(im.shape[1], max(pts[:,0]))
  y_bottom_r = min(im.shape[0],  max(pts[:,1 ]))
  hand =  (x_top_l, y_top_l), (x_bottom_r, y_bottom_r)
  return hand

In [None]:
def load_model(name,weights):
  """
  Loads neural network model
  @params:
          name(string): name of savec model
          weights(string): name of saved weights
  @return:
          model
  """
  # load json and create model
  json_file = open(path+name, 'r')
  loaded_model_json = json_file.read()
  json_file.close()
  loaded_model = model_from_json(loaded_model_json)
  # load weights into new model
  loaded_model.load_weights(path+weights)
  return loaded_model

In [None]:
def int_to_letter(i):
  """
  Converts integer to letter according to the used classes
  @params:
          i (int): the one we want to convert
  @return:
          letter corresponding to the given integer
  """
  if i == 0:
    return 'A'
  elif i == 1:
    return 'E'
  elif i==2:
    return 'K'
  else: 
    return 'Y'

In [None]:
def predict(hand_image,loaded_model):
  """
    Makes prediction on given image of hand
    @params:
            hand_image(array): image of hand that we want to predict label
            loaded_model: model to use for predicting
    @return: 
            label of hand_image (letter format)
  """
  prediction = loaded_model1.predict(hand_image) # where hand_image is the probability image of your hand of size (1,256)
  prediction = prediction.argmax()
  return int_to_letter(prediction)

## Hand detection

The main step of this task is hand detection. In order to have our data, mainly, the image of hand we want to make prediction on, we need to detect. To do so, we follow the different step seen in tasks 4,5,6: face detection, computing histogram, backprojection and capture exact hand rectangle.

### a. Face detection

In [None]:
start_time = time.time()
VideoCapture()
eval_js('create()')

#out first starting box: 
prev = None 
margin = 50 #randomly chosen
b = True
c = 0 #counter just to stop the algorithm, since it takes time (infinit loop)
while b:
  byte = eval_js('capture()')
  im = byte2image(byte) 
  curr_face = detect_face(im,prev,margin)
   
  if c>20:
    #in this part, I decided to display only the face frame so I can play with the distance to the camera and I didn't display the big rectangle
    #around, if you want to display it, you just have to return "prev" from the fuction that computes the face detection and display it using:
    #eval_js('showimg("{}")'.format(image2byte(im[prev[1]:prev[1]+prev[3], prev[0]:prev[0]+prev[2]])))
    face_frame = im[curr_face[1]:curr_face[1]+curr_face[3], curr_face[0]:curr_face[0]+curr_face[2]]
    tracking_window_face = curr_face
    eval_js('showimg("{}")'.format(image2byte(face_frame)))
    break
  c +=1
  eval_js('showimg("{}")'.format(image2byte(im)))

### b. Histogram computation

In [None]:
def hsv(face_frame):
  """
  Transforms the face frame into HSV and computes histogram
  @params:
        - face_frame: the frame representing the face
  @return:
        - mask: hsv with mask to deal with brightness and darkness pixels
        - histo: histogram computed from the hsv
  """
  #transforming the detected face frame into HSV
  hsv = cv2.cvtColor(face_frame, cv2.COLOR_BGR2HSV)
  #Creating a mask using inRange for the pixels to deal with brightness and darkness
  #allows us to take into consideration the pixels that are too dark and/or too bright
  #the parameters of the mask were found just by playing and testing many of values (the tests were held in one only place)
  mask = cv2.inRange(hsv,np.array((0,60,32)), np.array((180,200,200)))
  #computing histogram of face frame using hue channel ie [0]
  #here we used: 18 as bins for the histogram
  #and the range of the hue was set to [0,180]
  histo = cv2.calcHist([hsv],[0], mask, [18], [0,180])
  #normalizing histogram 0-255
  histo = cv2.normalize(histo, histo, 0, 255, cv2.NORM_MINMAX)
  return mask, hsv,histo

mask,hsv,histo = hsv(face_frame)
#Displaying histogram
plt.imshow(histo.reshape(1,-1))
plt.show()

### c. Hand box detection

In [None]:
VideoCapture()
eval_js('create()')
#this following line describes a criteria to stop camshift algorithm
#so this algorithm stops when 10 iterations have been carried out or when the computed value is not changing in all the direction by a factor of 1pt 
term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 )
tracking_window_hand = (0,0,im.shape[1],im.shape[0]) #to keep track of the hand
c = 0 #couter to stop the algo
while True:
  time.sleep(2)
  byte = eval_js('capture()') # capture
  im = byte2image(byte) #converting capture 
  # Converting the image to HSV
  hsv = cv2.cvtColor(im, cv2.COLOR_BGR2HSV)
  # Computing mask (inRange) as done in the previous cell
  mask = cv2.inRange(hsv,np.array((0,60,32)), np.array((180,200,200)))
  # Back projecting the face frame histogram into the hsv image
  #basically, we have the histogram of colors of the face and we will backproject it in our current image 
  #to detect the part of the image that fit the histogram (have same color as the face)
  prob = cv2.calcBackProject([hsv],[0], histo, [0,180],scale= 1)
  
  # Apply the mask to the backprojection output
  # Helps us to deal with dark or/and bright pixels
  prob = prob & mask

  #Tracking face 
  # Applying cam shift
  (x,y,w,h) = tracking_window_face
  ret,tracking_window_face = cv2.CamShift(prob,tracking_window_face, term_crit)
  # Retrieve the rotated bounding rectangle
  pts = cv2.boxPoints(ret).astype(np.int)
  # fill the face area (prob) with zeros
  cv2.fillPoly(prob, [pts], 0)
  # Draw the face area
  cv2.polylines(im, [pts], True, (255, 255 , 0), 2)
 
  #Tracking hand
  ret2, tracking_window_hand = cv2.CamShift(prob, tracking_window_hand, term_crit)
  
  pts2 = cv2.boxPoints(ret2).astype(np.int)
  hand_image = hand_position(im,pts2)

  #drawing the rectangle around hand 
  cv2.rectangle(im, hand_image[0], hand_image[1], (0,0,255), 2)
  eval_js('showimg("{}")'.format(image2byte(im)))
  eval_js('showimg("{}")'.format(image2byte(prob)))
  c+=1
  if(c>5):
    stop = input("Stop?")
    if stop == 'y':
      break

In [None]:
hand_image =  prob[hand_image[0][1]:hand_image[1][1], hand_image[0][0]:hand_image[1][0]] #capturing hand on probability map
hand_image = cv2.resize(hand_image, (16,16)) #resizing image to same size as the one used in our model
hand_image = hand_image.reshape((1,256)) #reshaping to fit data used in our mode

## Loading models

After detecting the hand, we need to load the second actor of this task: "Model". Since we have made 3 different models, they will be loaded and tested all. 

In [None]:
loaded_model1 = load_model('model1.json',"model1_weights.h5") #loading model 1
loaded_model2 = load_model('model2.json',"model2_weights.h5") #loading model 2
loaded_model3 = load_model('model3.json',"model3_weights.h5") #loading model 3

## Prediction on detected hand

Now, all the element are here (hand_image + model), the prediction is made as follow.

In [None]:
print('The predicted letter with model 1 is :', predict(hand_image,loaded_model1))
print('The predicted letter with model 2 is :', predict(hand_image,loaded_model2))
print('The predicted letter with model 3 is :', predict(hand_image,loaded_model3))