# Basics of mediapipe

## ML Pipeline of the MediaPipe Hands: 

MediaPipe Hands is a high-fidelity hand and finger tracking solution. It employs machine learning (ML) to infer **21 3D landmarks** of a hand from just a single frame. Whereas current state-of-the-art approaches rely primarily on powerful desktop environments for inference, our method achieves real-time performance on a mobile phone, and even scales to multiple hands.

*Tracked 3D hand landmarks are represented by dots in different shades, with the brighter ones denoting landmarks closer to the camera.*

MediaPipe Hands utilizes an **ML pipeline** consisting of multiple models working together: A **palm detection model** that operates on the full image and returns an **oriented hand bounding box**. A **hand landmark model** that operates on the cropped image region defined by the palm detector and returns **high-fidelity 3D hand keypoints**. This strategy is similar to that employed in our MediaPipe Face Mesh solution, which uses a face detector together with a face landmark model.

Providing the accurately cropped hand image to the hand landmark model drastically reduces the need for data augmentation (e.g. rotations, translation and scale) and instead allows the network to dedicate most of its capacity towards coordinate prediction accuracy. In addition, in our pipeline the crops can also be generated based on the hand landmarks identified in the previous frame, and only when the landmark model could no longer identify hand presence is palm detection invoked to relocalize the hand.

### Hand Landmark Model:


<img src=https://google.github.io/mediapipe/images/mobile/hand_landmarks.png />

In [10]:
import mediapipe as mp # is going to import mediapipe solutions
import cv2 # opencv
import numpy as np
# those libraries bellow are mainly for output process

In [11]:
mp_drawing = mp.solutions.drawing_utils

In [12]:
mp_hands = mp.solutions.hands

Now open the webcam using opencv

Supported configuration options:

- static_image_mode
- max_num_hands
- model_complexity
- min_detection_confidence
- min_tracking_confidence


STATIC_IMAGE_MODE
If set to false, the solution treats the input images as a video stream. It will try to detect hands in the first input images, and upon a successful detection further localizes the hand landmarks. In subsequent images, once all max_num_hands hands are detected and the corresponding hand landmarks are localized, it simply tracks those landmarks without invoking another detection until it loses track of any of the hands. This reduces latency and is ideal for processing video frames. If set to true, hand detection runs on every input image, ideal for processing a batch of static, possibly unrelated, images. Default to false.

MAX_NUM_HANDS
Maximum number of hands to detect. Default to 2.

MODEL_COMPLEXITY
Complexity of the hand landmark model: 0 or 1. Landmark accuracy as well as inference latency generally go up with the model complexity. Default to 1.

MIN_DETECTION_CONFIDENCE
Minimum confidence value ([0.0, 1.0]) from the hand detection model for the detection to be considered successful. Default to 0.5.

MIN_TRACKING_CONFIDENCE:
Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the hand landmarks to be considered tracked successfully, or otherwise hand detection will be invoked automatically on the next input image. Setting it to a higher value can increase robustness of the solution, at the expense of a higher latency. Ignored if static_image_mode is true, where hand detection simply runs on every image. Default to 0.5.

Confidence - detection: threshold for the initial detection to be succesful

Confidence - tracking: threshold for  tracking after initial detection

you can use Python’s **enumerate()** to get a counter and the value from the iterable at the same time!
DrawingSpec is a mediapipe class that allows you to customize the look of your detection
in this case we are using to customize the landmarks and connections

In [13]:
cap = cv2.VideoCapture(0)

with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands:
    # start the loop
    while cap.isOpened():
        ret, frame = cap.read()
        
        frame = cv2.flip(frame, 1)
        
        # by default the image color is blue green and red, rgb, red green and blue
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        
        # set flag
        image.flags.writeable = False
        
        # detections
        results = hands.process(image)
        
        image.flags.writeable = True
        
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        
        # print(results)
        
        # render the results
        if results.multi_hand_landmarks: # if there are landmarks
            # looping in each one of the results
            for num, hand in enumerate(results.multi_hand_landmarks):
                # passing the image, the hand result and the HAND_CONNECTIONS (show the connected relationship)
                mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS,
                                        mp_drawing.DrawingSpec(color=(230, 10 , 128), thickness=2, circle_radius=4),
                                        mp_drawing.DrawingSpec(color=(230, 10, 230), thickness=2, circle_radius=2))
            
        cv2.imshow("Hand Tracking", image)
            
        if (cv2.waitKey(1) >= 0):
            break
        
cap.release() # Closes video file or capturing device.
cv2.destroyAllWindows()

- X = landmark x position in the horizontal axis
- y = landmark position in the vertical axis
- z = landmark depth from the camera

## Acessing Landmarks

The method bellow can be use to access landmarks.

In [14]:
results.multi_hand_landmarks[0].landmark[mp_hands.HandLandmark.PINKY_TIP]

TypeError: 'NoneType' object is not subscriptable