----------------------------------------------------------------------------------------------------------
***********************************************************************************************************

# <span style="color:Purple"> Computer vision for machine learning Project: "Detecting hand gestures"


#### Task 1 & 2: 

#### Author: Lynda Attouche
#### Link: https://colab.research.google.com/drive/15_jQRmQlB2_xS-lNTKlxIHEmjEfgAeFD?usp=sharing
***********************************************************************************************************
----------------------------------------------------------------------------------------------------------


## README
* Throughout this notebook, no special commands are needed to run the code. Simply run the cells in order. 

* The code of the first task loops endlessly. In order to move on to the next task, just stop it manually and run from the next cell 

## Imports

#### Libraries

In [None]:
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode, b64encode
import numpy as np
from PIL import Image
import io
import cv2
import time

#### OpenCV

In [None]:
!git clone https://github.com/opencv/opencv/

Cloning into 'opencv'...
remote: Enumerating objects: 303962, done.[K
remote: Counting objects: 100% (2/2), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 303962 (delta 0), reused 1 (delta 0), pack-reused 303960[K
Receiving objects: 100% (303962/303962), 493.11 MiB | 24.71 MiB/s, done.
Resolving deltas: 100% (211516/211516), done.
Checking out files: 100% (7028/7028), done.


## Useful functions

In [None]:
# Create a real time video stream
def VideoCapture():
  js = Javascript('''
    async function create(){
      div = document.createElement('div');
      document.body.appendChild(div);

      video = document.createElement('video');
      video.setAttribute('playsinline', '');

      div.appendChild(video);

      stream = await navigator.mediaDevices.getUserMedia({video: {facingMode: "environment"}});
      video.srcObject = stream;

      await video.play();

      canvas =  document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);

      div_out = document.createElement('div');
      document.body.appendChild(div_out);
      img = document.createElement('img');
      div_out.appendChild(img);
    }

    async function capture(){
        return await new Promise(function(resolve, reject){
            pendingResolve = resolve;
            canvas.getContext('2d').drawImage(video, 0, 0);
            result = canvas.toDataURL('image/jpeg', 0.8);
            pendingResolve(result);
        })
    }

    function showimg(imgb64){
        img.src = "data:image/jpg;base64," + imgb64;
    }

  ''')
  display(js)

In [None]:
#Converting image types
def byte2image(byte):
  jpeg = b64decode(byte.split(',')[1])
  im = Image.open(io.BytesIO(jpeg))
  return np.array(im)

def image2byte(image):
  image = Image.fromarray(image)
  buffer = io.BytesIO()
  image.save(buffer, 'jpeg')
  buffer.seek(0)
  x = b64encode(buffer.read()).decode('utf-8')
  return x

## Loading Cascade classifiers from git repository


In [None]:
face_cascade_path = "/content/opencv/data/haarcascades/haarcascade_frontalface_alt.xml"
eyes_cascade_path = "/content/opencv/data/haarcascades/haarcascade_eye.xml"
face_cascades = cv2.CascadeClassifier(face_cascade_path)
eyes_cascades = cv2.CascadeClassifier(eyes_cascade_path)

## Faces and eyes detection

### Task 1: Basic face and eyes Detection

The aim of this first task is to detect the faces and eyes that can be included in an image, in this case a real time capture from our computer camera. 

The detection is done thanks to **opencv**'s face and eye cascades classifiers. They are used in the following cell. 

This function starts with the detection of faces on the whole image. To illustrate this, rectangles are drawn around the faces using the positions of the faces returned by the **face_cascade.detectMultiScale** classifier. Next, now that the face region is available, the eyes need to be detected. To do this, the **eyes_cascades.detectMultiScale** classifier is called and provided with the face region as an argument. 
Similarly, rectangles are drawn around the eyes to show the detection made. 

In [None]:
def faceyesDetection(im):
  """
      Detects faces and eyes on image
      @params:
              - im (array): image on which the detection will take place
      @return:
              - plotting rectangles showing the detection of eyes and faces on the image/capture
  """
  gray = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY) #take the gray scale version
  
  faces = face_cascades.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=4)   #face detection using face cascade 
  for (x, y, w, h) in faces: #here x,y point coord top left corner, w and h are width and height of the capture
    cv2.rectangle(im, (x,y), (x+w, y+h), (255,0,0),2) #(x+w,y+h) is bottom right corner
    face_region_gray = gray[y:y+h, x:x+w] # face region detected in gray scale
    face_region = im[y:y+h, x:x+w] # face region  detected

    eyes = eyes_cascades.detectMultiScale(face_region_gray, 1.1, 3) # Eyes detection in face region 
    for (xeye,yeye,weye,heye) in eyes: #here xeye,yeye point coord top left corner of face region, weye and heye are width and height of the face region
      cv2.rectangle( face_region, (xeye,yeye), (xeye+weye,yeye+heye), (0,255,0),2) # Draw rectangle on detected eyes


Application of the function above on image (live stream video capture)

In [None]:
start_time = time.time()
VideoCapture()
eval_js('create()')
x = True
while x:
  #Capturing live stream video
  byte =  eval_js('capture()') 
  #Converting the capture to image type
  im = byte2image(byte) 
  #Detecting faces and eyes on the capture (detection+drawing)
  faceyesDetection(im)
  eval_js('showimg("{}")'.format(image2byte(im)))
  print("%s s" % (time.time() - start_time))
  r = input("stop?")
  if r=='y':
    break

**Comments & Conclusion:**

Several Tests were carried out with the previous code, **i.e.** detection of: one person, several people at the same time, person in a photo, person with objects close to the face (necklace in this case), person in slight movement and finally with different brightnesses. 

From the results obtained through these tests, the cascade classifiers used generally detect faces **quite well**. 

However, they are still quite **limited**. Indeed, for example, when there is too much light or in a more or less dark environment, detection is almost impossible. In addition, when there is slight movement of the face, the detection in addition to taking time, does not really result in a clean detection. And finally, when objects are close to the face, the algorithm tended to detect them as well. 




### Task 2: Reducing the computation time of Facedetect

As noted earlier, the classical algorithm used in task 1 can take a fair number of seconds to do the detection. This is due to the fact that the classifier scans/searches through the whole image. Thus, in this task, the computation will only be done from the region of interest and this will reduce the computation time. 

In the following cell, a function has been defined in order to calculate the region on which the face search will be carried out at time **t+1** taking into account the previous face delimitation at time **t** (if we refer to the figure on moodle)

In [None]:
def compute_region(img,margin,prev):
  """
  Computes  region of interest 
  @params:
          - img (array): image on which detection will be done
          - margin (int): the margin to be taken from the previous region (the coordinate shift)
          - prev (array): previous face detection
  @return 
          - region of interest 
  """
  # as seen in the previous task, the cascadeClassifier returns a face as a rectangle
  # so this is the case for the param prev, ie: prev = (x,y,w,h) = (prev[0],prev[1],prev[2],prev[3])
  # where (x,y) is the top left corner and (w,h) the bottom right corner
  # the goal is then to compute (x',y',w',h') considering the margin and previous region to define the new region such that:
  # x' = x - margin
  # y' = y - margin 
  # w' = (x+w) + margin
  # h' = (y+h) + margin

  #top left corner
  x_prime = prev[0] - margin 
  y_prime = prev[1] - margin 

  #bottom right
  w_prime = prev[0]+prev[2]+margin 
  h_prime = prev[1]+prev[3]+margin 

  # Note: 
  #the new region must stay in the image and not be out of it (I noticed that after some tests because the results were weird)
  # i.e:
  #top left corner should not be negative (as a subtraction is made from the previous x and y) càd:  x_prime>=0 and y_prime>=0
  #if either x or y (or both of them) is negative, it should be set to 0
  x_prime= max(0,x_prime) # = 0 if x_prime<0
  y_prime = max(0,y_prime) #= 0 if y_prime<0

  #for the bottom right corner should not be out of the image in the sense that we should not obtain values that go beyond the coordinate of the 
  #image (since w' for example is a result of increasing x+w with a margin )
  #to handle that possible problem, the value should be set to the image height or width (depends on which coordinate) of the image, as follows
  (imgH,imgW) = im.shape[0], im.shape[1]
  w_prime = min(imgW,w_prime) # = image width if x_prime>image width
  h_prime = min(imgH,h_prime) # = image heigh if y_prime>image heigh

  return (x_prime,y_prime,w_prime-x_prime,h_prime-y_prime)

In [None]:
start_time = time.time()
VideoCapture()
eval_js('create()')

#out first starting box: 
prev = None 
margin = 80 #randomly chosen
b = True

while b:
  byte = eval_js('capture()')
  im = byte2image(byte) 
  new_reg = im.copy() #starter region, whole image (first step, step t before detection)
  #if we didn't detect a face yet
  if prev is not None :
    #print("I am here")
    x_prime,y_prime,w_prime,h_prime = compute_region(im,margin,prev) #computing the new region (of timestep t+1)
    #my new region:
    new_reg = im[y_prime:y_prime+h_prime,x_prime:x_prime+w_prime] 
    #plot the rectangle 
    cv2.rectangle(im, (x_prime,y_prime), (x_prime+w_prime, y_prime+h_prime), (255,0,0),2)

  gray = cv2.cvtColor(new_reg, cv2.COLOR_BGR2GRAY) # Converting image to gray scale
  #face detection using face cascade 
  faces = face_cascades.detectMultiScale(gray, scaleFactor=1.3, minNeighbors=4)
  if len(faces)==0: # no face has been detected 
    prev = None  #we won't have a roi 
  else:
    curr_face = faces[0] #we pick the first face detected
    if prev is None: #previous region surrounding the face picked
      (x,y,w,h) = curr_face #get the face region coordinates
    else: #we already have a face, so we have already computed xprime,yprime,
      (x,y,w,h)=(x_prime+curr_face[0], y_prime+curr_face[1], curr_face[2], curr_face[3]) #updating with the new coordinates (primes)
    #so we update our region 
    prev = (x,y,w,h)
    #we plot the rectangle 
    cv2.rectangle(im, (x,y), (x+w, y+h), (255,0,0),2)    

  eval_js('showimg("{}")'.format(image2byte(im)))
  #print("%s s" % (time.time() - start_time))

<IPython.core.display.Javascript object>

KeyboardInterrupt: ignored

**Conclusion:**

Following the tests made, this new method proved to be faster (which was the goal by the way). For example for one of the tests, the classical method did the computation in about 4s while this new method did it in 1s. 
So in terms of time, this new method is quite satisfactory but still has a low accuracy (at least in my case). 