<a href="https://colab.research.google.com/github/JhonHader/HAND-KEYPOINTS-DETECTION/blob/main/Hand_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <strong>Hand Keypoints Detection</strong>

*   **<font color='red'> Description problem </font>** 
##### Hand keypoints detection and finger count.

---
---

#####Developed by: 
<h6 align=center> ${\text{Jhon Hader Fernández}}$ <h6>
<h6 align=center> ${\text{Diego Fernando Díaz}}$ <h6>

#####<h6 align=center>{<i>jhon_fernandez, di-diego</i>}@javeriana.edu.co<h6>
#####<h6 align=center>Pontificia Universidad Javeriana<h6>

<br>

## ***1. ENVIRONMENT***

To use this project is recommended to use GPU execution enviroment, to this is so important install all NVIDIA-CUDA dependences (to run deep neural networks [dnn] in OpenCV).
This could take a long, therefore we've created a enviroment, to use you have to get its link (it's located in Drive)

In [None]:
install_GPU_dependencies = False

if install_GPU_dependencies == True:
  %cd /content
  !git clone https://github.com/opencv/opencv
  !git clone https://github.com/opencv/opencv_contrib
  !mkdir /content/build
  %cd /content/build

  !cmake -DOPENCV_EXTRA_MODULES_PATH=/content/opencv_contrib/modules  -DBUILD_SHARED_LIBS=OFF  -DBUILD_TESTS=OFF  -DBUILD_PERF_TESTS=OFF  -DBUILD_EXAMPLES=OFF  -DWITH_OPENEXR=OFF  -DWITH_CUDA=ON  -DWITH_CUBLAS=ON  -DWITH_CUDNN=ON  -DOPENCV_DNN_CUDA=ON  /content/opencv

  !make -j8 install

  !mkdir  "/content/drive/My Drive/cv2_cuda"
  !cp  /content/build/lib/python3/cv2.cpython-36m-x86_64-linux-gnu.so "/content/drive/My Drive/cv2_cuda"

else:
  print('[INFO...] CUDA-GPU and OpenCV dependences requirements already installed!')


### ***1.1. GET ENVIRONMENT***

* **<font color='green'><i> 1.1.1. </i></font>** <br>
Get enviroment from Google Drive, please chechk your execution environment, it has to be configurated in **<font color='red'><i> GPU </i></font>**

* **<font color='green'><i> 1.1.2. </i></font>** <br>
Import libraries 
<br>


**<font color='red'><i> REQUIREMENT </i></font>** <br>
OpenCV version has to be greater than **<font color='blue'><i> 4.2.x </i></font>**

In [None]:
GPU = True
if GPU == True:
  print('[INFO...] Loading GPU dependences.')
  !cp "/content/drive/My Drive/Hand_Detection/cv2_cuda/cv2.cpython-36m-x86_64-linux-gnu.so" .
  print('\n[INFO...] GPU dependendes was loaded, OpenCV version has to be greater 4.2.x')

import cv2
from __future__ import division
import time
import numpy as np
from sklearn.metrics import pairwise
from skimage import exposure
import imutils
import time
import os
import sys
from google.colab.patches import cv2_imshow
from IPython.display import clear_output
print('[INFO...] OpenCV version:', cv2.__version__)

## ***2. DATA INPUT (GETTING)***

Get video input, get video frames and workspace window.

### ***2.1. GET VIDEO STREAM***

You're running code in a cloud server, that isn't your hardware, then, to access to your camera is neccesary use API. This snippet access to your camera and record video stream.

In [None]:
from IPython.display import display, Javascript, HTML
from google.colab.output import eval_js
from base64 import b64decode
 
def record_video(filename='video.mp4'):
  js = Javascript("""
    async function recordVideo() {
      // mashes together the advanced_outputs.ipynb function provided by Colab, 
      // a bunch of stuff from Stack overflow, and some sample code from:
      // https://developer.mozilla.org/en-US/docs/Web/API/MediaStream_Recording_API
 
      // Optional frames per second argument.
      const options = { mimeType: "video/webm; codecs=vp9" };
      const div = document.createElement('div');
      const capture = document.createElement('button');
      const stopCapture = document.createElement("button");
      capture.textContent = "Start Recording";
      capture.style.background = "green";
      capture.style.color = "white";
 
      stopCapture.textContent = "Stop Recording";
      stopCapture.style.background = "red";
      stopCapture.style.color = "white";
      div.appendChild(capture);

      const canvas = document.createElement('canvas');
      canvas.id = 'my_canvas';
      canvas.height = 480;
      canvas.width = 640;
      
      var ctx = canvas.getContext('2d');
      ctx.strokeStyle = 'rgb(255, 255, 0)';  
      ctx.lineWidth = 5;

      const video = document.createElement('video');
      video.id = 'my_video';
      video.style.display = 'block';

      const stream = await navigator.mediaDevices.getUserMedia({video: true});
      // create a media recorder instance, which is an object
      // that will let you record what you stream.
      let recorder = new MediaRecorder(stream, options);
      document.body.appendChild(div);
      div.appendChild(document.createElement('div'));
      div.appendChild(canvas);
      // Video is a media element.  This line here sets the object which serves
      // as the source of the media associated with the HTMLMediaElement
      // Here, we'll set it equal to the stream.
      video.srcObject = stream;
      // We're inside an async function, so this await will fire off the playing
      // of a video. It returns a Promise which is resolved when playback has 
      // been successfully started. Since this is async, the function will be 
      // paused until this has started playing. 
      
      // Add event listener to play video
      video.addEventListener('play', function(){
        draw(this, ctx, canvas.width, canvas.height);
      }, false);

      // Draw rectangle and flip video
      function draw(v, c, w, h) {
          c.save();
          c.translate(w/2, h/2);
          c.scale(-1, 1);
          c.drawImage(v, -w/2, -h/2, w, h);
          c.restore();
          c.strokeRect(300, 50, 280, 300);
          setTimeout(draw, 20, v, c, w, h);
      }

      video.play();
 
      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);
      // and now, just wait for the capture button to get clicked in order to
      // start recording
      await new Promise((resolve) => {
        capture.onclick = resolve;
      });
      recorder.start();
      capture.replaceWith(stopCapture);
      // use a promise to tell it to stop recording
      await new Promise((resolve) => stopCapture.onclick = resolve);
      recorder.stop();
 
      let recData = await new Promise((resolve) => recorder.ondataavailable = resolve);
      let arrBuff = await recData.data.arrayBuffer();
      
      // stop the stream and remove the video element
      stream.getVideoTracks()[0].stop();
      div.remove();
 
      let binaryString = "";
      let bytes = new Uint8Array(arrBuff);
      bytes.forEach((byte) => {
        binaryString += String.fromCharCode(byte);
      })
      return btoa(binaryString);
    }
    """)
  try:
    display(js)
    data = eval_js('recordVideo({})')
    binary = b64decode(data)
    with open(filename, "wb") as video_file:
      video_file.write(binary)
    print(
        f"Finished recording video. Saved binary under filename in current working directory: {filename}"
    )
  except Exception as err:
      # In case any exceptions arise
      print(str(err))
  return filename

### ***2.2. GET VIDEO FRAMES***

Get all frames in the video, besides shows video information such as: FPS and size. Get FPS is important because we'll need it to renderize a new video.

In [None]:
def get_frames_of_video(input_recording):
  cap = cv2.VideoCapture(input_recording)

  fps = cap.get(cv2.CAP_PROP_FPS)
  fps = 6.25 if fps > 15 else fps    # Sometimes it doesn't get correct FPS, therefore we defined top FPS (15)
  size = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)))

  print('[INFO...] Video properties:', 'FPS:', fps, '- (width, height):', (size[0], size[1]))

  video = []

  print('[INFO...] Getting frames of video.')
  
  while(cap.isOpened()):
      ret, frame = cap.read()
      if ret == False:
        break
      frame = cv2.flip(frame, 1)    
      video.append(frame)
  
  print('[REPORT...] Frames of video was obteined succesfully. \n')
  return video, size, fps

### ***2.3. GET WORKSPACE***

Get all video frames in workspace (yellow rectangle) showed when is getting video stream.

In [None]:
def get_workspace(video_frames, window=True):
  window_frames = []

  if window:
    for img in video_frames:
      window_frames.append(img[50:350, 300:580].copy())
  else:
    window_frames = video_frames

  return window_frames

## ***3. DATA OUTPUT***

Data output, renderize video and create it.

### ***3.1. EXPORT VIDEO***

Export video.

In [None]:
import moviepy.video.io.ImageSequenceClip

def create_video(frames, fps, filename):
  clip = moviepy.video.io.ImageSequenceClip.ImageSequenceClip(frames, fps=fps)
  clip.write_videofile(filename)

### ***3.2. RENDERIZE VIDEO***

Renderize video, this is draw join skeleton draw with finger count to main complete frame, let's remember that we are working on workspace window (that's a segment of the complete frame), so, we have to join it.

In [None]:
def join_skeleton(video_frames, skeleton_frames, finger_amount, window):

  out_frames = []
  vid_frame = []

  for i, keypoints_frame in enumerate(skeleton_frames):
    if window:
      vid_frame = cv2.cvtColor(video_frames[i].copy(), cv2.COLOR_BGR2RGB)
      vid_frame[50:350, 300:580] = keypoints_frame
      if i >= (len(video_frames)-len(finger_amount)):
        cv2.putText(vid_frame, finger_amount[i - (len(video_frames) - len(finger_amount))], (5, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), thickness=4)
      out_frames.append(vid_frame)
    else:
      out_frames = skeleton_frames

  return out_frames

## ***4. SOLUTIONS***

We make a proposal, to make two different subsystems to each proccess (finger count and keypoints detection)

### ***4.1. FINGER COUNT***

Get finger amount. <br>
[<font color='blue'><i>**See documentation**<i></font>](https://github.com/JhonHader/HAND-KEYPOINTS-DETECTION)

In [None]:
def count_fingers(hand_frames):

  bg = None
  finger_amount = []

  bg = cv2.cvtColor(hand_frames[0], cv2.COLOR_BGR2GRAY)
  
  for frame in hand_frames[1:]:

    # Determinar la región de interes
    ROI = frame
    grayROI = cv2.cvtColor(ROI, cv2.COLOR_BGR2GRAY)

    # Región de interés del fondo de la imagen
    bgROI = bg

    # Determinar la imagen binaria (background vs foreground)
    dif = cv2.absdiff(grayROI, bgROI)
    _, th = cv2.threshold(dif, 30, 255, cv2.THRESH_BINARY)

    # Opening y closing
    th = cv2.morphologyEx(th, cv2.MORPH_OPEN, np.ones((3, 3), np.uint8))
    th = cv2.morphologyEx(th, cv2.MORPH_CLOSE, np.ones((7, 7), np.uint8))

    # Encontrando los contornos de la imagen binaria
    cnts, _ = cv2.findContours(th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = sorted(cnts, key=cv2.contourArea, reverse=True)[:1]

    for cnt in cnts:

        # Encontrar el centro del contorno
        M = cv2.moments(cnt)
        if M["m00"] == 0: M["m00"] = 1
        x = int(M["m10"] / M["m00"])
        y = int(M["m01"] / M["m00"])

        # Encontrar boundary box del contorno
        (bnd_x, bnd_y, bnd_w, bnd_h) = cv2.boundingRect(cnt)

        # Contorno encontrado a través de cv2.convexHull
        hull = cv2.convexHull(cnt)

        # Distancias entre maximos puntos y el centro
        top = tuple(hull[hull[:, :, 1].argmin()][0])
        bottom = tuple(hull[hull[:, :, 1].argmax()][0])
        left = tuple(hull[hull[:, :, 0].argmin()][0])
        right = tuple(hull[hull[:, :, 0].argmax()][0])
        dist = pairwise.euclidean_distances([left, right, top], [[x, y]])
        radi = int(0.7 * dist.max())

        # Circular ROI
        circular_roi = np.zeros(ROI.shape[:-1], dtype=np.uint8)
        cv2.circle(circular_roi, (x, y), radi, 255, 8)
        fingers = cv2.bitwise_and(th, th, mask=circular_roi)

        # Opening
        fingers = cv2.morphologyEx(fingers, cv2.MORPH_OPEN, np.ones((7, 7), np.uint8))

        # Contour area
        fingers_con, _ = cv2.findContours(fingers, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
        finger_count = 0

        for counter in fingers_con:
            if cv2.contourArea(counter) < 300:
                finger_count += 1
        
        if finger_count > 5:
          finger_count = ' '
        else:
          finger_count = str(finger_count)

        finger_amount.append(finger_count)
  return finger_amount

### ***4.2. HAND KEYPOINTS DETECTION***

Get all keypoints detection for every frame in the video. <br>
[<font color='blue'><i>**See documentation**<i></font>](https://github.com/JhonHader/HAND-KEYPOINTS-DETECTION)

In [None]:
def get_skeleton(hand_frames):
  root_path = '/content/drive/My Drive/Hand_Detection'

  protoFile = os.path.join(root_path, "MODELS/pose_deploy.prototxt.txt")
  weightsFile = os.path.join(root_path, "MODELS/pose_iter_102000.caffemodel")
  nPoints = 22
  POSE_PAIRS = [[0, 1], [1, 2], [2, 3], [3, 4], [0, 5], [5, 6], [6, 7], [7, 8], [0, 9],
                [9, 10], [10, 11], [11, 12], [0, 13], [13, 14], [14, 15], [15, 16],
                [0, 17], [17, 18], [18, 19], [19, 20]]

  frameWidth = hand_frames[0].shape[1]
  frameHeight = hand_frames[0].shape[0]
  aspect_ratio = frameWidth / frameHeight

  threshold = 0.1

  inHeight = 368
  inWidth = int(((aspect_ratio * inHeight) * 8) // 8)

  skeleton = []

  net = cv2.dnn.readNetFromCaffe(protoFile, weightsFile)
  if GPU == True:
    print('[INFO...] GPU was configurated!')
    net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
    net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

  t_total = time.time()

  print('[INFO...] Processing video, getting hand keypoints!')
  for frame in hand_frames: 

    points = []

    img_original = frame.copy()
    frameCopy = np.copy(frame)

    inpBlob = cv2.dnn.blobFromImage(frame, 1.0 / 255, (inWidth, inHeight),
                                    (0, 0, 0), swapRB=False, crop=False)

    net.setInput(inpBlob)
    output = net.forward()
    for i in range(nPoints):

        probMap = output[0, i, :, :]
        probMap = cv2.resize(probMap, (frameWidth, frameHeight))

        minVal, prob, minLoc, point = cv2.minMaxLoc(probMap)

        if prob > threshold:
          points.append((int(point[0]), int(point[1])))
        else:
          points.append(None)

    for pair in POSE_PAIRS:
        partA, partB = pair[0], pair[1]

        if points[partA] and points[partB]:
            cv2.line(frame, points[partA], points[partB], (0, 0, 255), 2)
            cv2.circle(frame, points[partA], 4, (0, 255, 0), thickness=-1, lineType=cv2.FILLED)
            cv2.circle(frame, points[partB], 4, (0, 255, 0), thickness=-1, lineType=cv2.FILLED)

    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    skeleton.append(frame)
  
  print("\n[REPORT...] Hand keypoints was gotten successfully!")
  print("[REPORT...] time taken processing video [s] : {:.3f}".format(time.time() - t_total), '\n')
  return skeleton

## ***5. RESULTS***

Get video input, proccess it and get video output.

In [None]:
input_recording = 'in.mp4'     
record_video(filename=input_recording)

window = True

video_frames, size, fps = get_frames_of_video(input_recording)
window_frames = get_workspace(video_frames, window=window)
finger_amount = count_fingers(hand_frames=window_frames)
skeleton_frames = get_skeleton(window_frames)
out_frames = join_skeleton(video_frames, skeleton_frames, finger_amount, window=window)
create_video(out_frames, fps, 'out.mp4')


from IPython.display import HTML
from base64 import b64encode

video_width = 640
video_file = open('out.mp4', "r+b").read()
video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"
HTML(f"""<video width={video_width} controls><source src="{video_url}"></video>""")