# TOPIC - HAND TRAKCING

Hand tracking uses 2 main modules at backend which is - 
1. Palm detection (works on complete image, and provides cropped image of hand)
2. Hand landmarks (finds 21 different landmarks on the cropped image of hand)

to train the hand landmarks, dev manually updated 30,000 images of different hands.


# Procedure

1. Import required libraries (cv2, mediapipe, time).
2. Start webcam feed using cv2.VideoCapture(0).
3. Initialize MediaPipe Hands module and drawing utilities.
4. Begin frame-by-frame loop to process live video.
5. Capture current frame from webcam.
6. Convert frame from BGR to RGB (MediaPipe requires RGB).
7. Run hands.process() → 🟢 Palm Detection happens here.
8. If palms are detected, proceed to hand landmark detection.
9. Extract and loop through 21 landmarks per hand.
10. Convert normalized coordinates to pixel values.
11. Optionally highlight specific landmarks (e.g., thumb tip).
12. Draw landmarks and connections using mpDraw.draw_landmarks().
13. Calculate and display FPS to monitor performance.
14. Show annotated frame in window using cv2.imshow().
15. Break loop on 'q' key press, then release camera and close windows.

In [6]:
import cv2                          # for image and video processing (handling video capture)
import mediapipe as mp              # provides pre-built solutions like hand detection
import time                         # used to calculate FPS (frames per second) by measuring time between frame renders

           
cap = cv2.VideoCapture(0)


mpHands = mp.solutions.hands         # Accesses the hands module inside mediapipe.solutions (Needed to create a hand detection pipeline)
hands = mpHands.Hands()              # Creates a Hands object (main engine that analyzes each frame and detects hands and their positions)
mpDraw = mp.solutions.drawing_utils  # Accesses drawing utilities in MediaPipe (Used to draw landmarks (points) and connections (lines) on the hands)


# Frame per second 
pTime = 0                            # Previous Time : Stores the timestamp of the previous frame that was processed
cTime = 0                            # Current Time : Stores the timestamp of the current frame being processed


while True:                         # Infinite loop that processes the video feed frame by frame. Keeps the app running until manually stopped.
    ret,img = cap.read()
    imgBGR = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)          # Converts the image from BGR (OpenCV’s default) to RGB (MediaPipe’s expected format
    results = hands.process(imgBGR)                        # Processes the RGB image to detect hands and returns hand landmarks
    print(results.multi_hand_landmarks)                    # Prints the list of hand landmarks
    print(results.multi_handedness)

    if results.multi_hand_landmarks:                       #  Checks if any hands were detected, only proceed if detection was successful
        for handLms in results.multi_hand_landmarks:       # Loops through each detected hand as some frames might contain more than one hand (id is the index (0 to 20), and lm holds the x, y, z values of that point.)
            for id,lm in enumerate(handLms.landmark):      # Loops through 21 landmarks per hand; each landmark has an ID and coordinates
                #print(id,lm)

  
                h,w,c=img.shape                            # Gets the height, width, and channels of the image, Needed to convert landmark coordinates (which are normalized from 0–1) to actual pixel values
                cx,cy=int(lm.x*w),int(lm.y*h)              # converts normalized coordinates to actual image coordinates as can't draw in fractional space — you need actual pixels!
                print(id,cx,cy)                            # Prints landmark ID and its (x, y) position - for tracking & debugging 


                # if want to highlight specific section 
                
                #if id==4:                                 # Draws a filled circle at landmark 4 (tip of the thumb)
                    #cv2.circle(img,(cx,cy),20,(255,0,255),cv2.FILLED)

            
            mpDraw.draw_landmarks(img,handLms,mpHands.HAND_CONNECTIONS)  # Uses MediaPipe utility to draw landmarks and connect them visually on the hand as this makes hand tracking visible and interactive 


    
    cTime = time.time()                                  # current time now in seconds
    fps=1/(cTime-pTime)                                  # FPS = Number of frames per second = 1 / time per frame
    pTime=cTime                                          # set current as previous for next loop

    
   cv2.putText(img, str(int(fps)),(10,70),cv2.FONT_HERSHEY_PLAIN,3,(255,0,255),3)    # mentions FPS value on the top-left corner of the video.
    

    cv2.imshow("Image",img)
    if cv2.waitKey(1) == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

[landmark {
  x: 0.828112185
  y: 0.961711
  z: -7.79594e-007
}
landmark {
  x: 0.763981342
  y: 0.975015283
  z: -0.00915439054
}
landmark {
  x: 0.696432531
  y: 0.950211585
  z: -0.0151352044
}
landmark {
  x: 0.646963656
  y: 0.924334526
  z: -0.0266241878
}
landmark {
  x: 0.603148937
  y: 0.901951909
  z: -0.0339592434
}
landmark {
  x: 0.680957079
  y: 0.851983428
  z: 0.0169998035
}
landmark {
  x: 0.632546
  y: 0.833801091
  z: -0.0174773391
}
landmark {
  x: 0.664764881
  y: 0.890550554
  z: -0.035320323
}
landmark {
  x: 0.69363004
  y: 0.908631384
  z: -0.0410861671
}
landmark {
  x: 0.70271939
  y: 0.829070747
  z: 0.00826849137
}
landmark {
  x: 0.653575361
  y: 0.819564223
  z: -0.0294181779
}
landmark {
  x: 0.694942772
  y: 0.890810966
  z: -0.0357742608
}
landmark {
  x: 0.725860953
  y: 0.899314344
  z: -0.0291104726
}
landmark {
  x: 0.730218709
  y: 0.80625695
  z: -0.00718844
}
landmark {
  x: 0.677590132
  y: 0.800727367
  z: -0.0459791794
}
landmark {
  x: 0.716

# 1. mp. solutions :

A submodule in MediaPipe that gives you access to pre-trained ML models.
Provides easy access to powerful models without needing to train them.
Contains high-level APIs for ready-to-use solutions like:

    1. mp.solutions.hands = Hand tracking
    2. mp.solutions.face_detection = Face detection
    3. mp.solutions.pose = Full-body pose estimation
    4. mp.solutions.holistic = Combined face, pose, and hands
    

# 2. Creating the Hand Detection Object :

hands = mpHands.Hands()   ->    This line creates an instance of the Hands class (initialized the model)

When you call Hands(), it does the following :

    1. Loads the palm detection model (to first find a hand).
    2. Loads the hand landmark model (to find 21 keypoints).
    3. Prepares to run inference on each frame you pass using hands.process(image).

It includes optional configuration parameters like:

    1. static_image_mode – True for still images, False for video stream.
    2. max_num_hands – How many hands to detect at once.
    3. min_detection_confidence – Threshold to declare a hand is found.
    4. min_tracking_confidence – Threshold to continue tracking across frames.


# 3. Drawing Landmarks and Connections :

mpDraw = mp.solutions.drawing_utils -> gives access to drawing_utils, a utility module in MediaPipe designed to draw stuff, helps in visualization
Specifically, it lets you draw:

    1. Landmarks: small points for joints/fingertips/wrist
    2. Connections: lines that connect the landmarks to form the shape of a hand


# 4. Processing the Image for Hand Detection :

results = hands.process(imgBGR) -> Passes the current frame into the MediaPipe hand tracking object made earlier 

MediaPipe internally:
    
    1. Detects palms first (lightweight model).
    2. If palms are found, it runs the landmark model to locate 21 keypoints on each detected hand.
    3. The output is stored in results, which is an object containing all hand-related detection info.

The result now contains the following :

    1. results.multi_hand_landmarks: List of hands, each with 21 landmarks.
    2. results.multi_handedness: Info on whether it's a left or right hand.
    3. results might be empty if no hands are detected.


MediaPipe guesses "Left" or "Right" based on how the hand looks **from the camera’s point of view** — like it's looking at you, not like a mirror. So if your **left hand appears on the right side** of the image, it may be labeled Right (instead of left) 
To fix it, flip the image (img = cv2.flip(img, 1))


# 5. Frame per second :

Frames Per Second measures how many images (frames) your program processes or displays every second in a video stream.
Higher FPS = smoother video or detection
Lower FPS = choppy or laggy performance