
# ✋ MediaPipe Hands — 손가락 포즈(펴짐 개수) 카운트 + 화면맞춤

이 노트북은 이전 MediaPipe 데모 구조를 유지하면서 **손가락이 몇 개 펴져 있는지**를 실시간으로 카운트합니다.  
카메라 재시도 초기화, 레터박싱(Screen-Fit), FPS 오버레이, 단축키, 스냅샷 저장 등 동일한 UX를 제공합니다.

**기능**
- 손 랜드마크(21포인트) 탐지
- 각 손별로 **펴진 손가락 개수(0~5)** 계산
- 좌측 하단 HUD에 **왼손/오른손별 개수 + 총합** 표기
- `COUNT_THRESHOLD` 이상이면 콘솔 경고 메시지
- `s`로 스냅샷 저장

**단축키**
- `q` / `ESC`: 종료
- `f`: 풀스크린 토글
- `s`: 스냅샷 저장



## 1) 설치 (필요 시만 실행)
- 로컬/도커 이미지에 이미 포함되어 있으면 **건너뛰세요**.


In [2]:

#!pip install --upgrade pip
#!pip install mediapipe


Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting mediapipe
  Downloading mediapipe-0.10.18-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (9.7 kB)
Collecting opencv-contrib-python (from mediapipe)
  Downloading opencv_contrib_python-4.12.0.88-cp37-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.metadata (19 kB)
Collecting sounddevice>=0.4.4 (from mediapipe)
  Downloading sounddevice-0.5.2-py3-none-any.whl.metadata (1.6 kB)
Collecting sentencepiece (from mediapipe)
  Downloading sentencepiece-0.2.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.metadata (10 kB)
Collecting CFFI>=1.0 (from sounddevice>=0.4.4->mediapipe)
  Downloading cffi-1.17.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (1.5 kB)
Collecting pycparser (from CFFI>=1.0->sounddevice>=0.4.4->mediapipe)
  Downloading pycparser-2.22-py3-none-a


## 2) 모듈 임포트


In [3]:

import os
import time
from datetime import datetime

import cv2
import numpy as np

# MediaPipe
try:
    import mediapipe as mp
except Exception as e:
    raise RuntimeError("mediapipe가 설치되어 있지 않습니다. 위 설치 셀을 실행하세요.") from e

mp_drawing = mp.solutions.drawing_utils
mp_styles   = mp.solutions.drawing_styles
mp_hands    = mp.solutions.hands



## 3) 화면 해상도 탐지


In [4]:

def _get_screen_size():
    # 1) screeninfo 우선
    try:
        from screeninfo import get_monitors
        m = get_monitors()[0]
        return int(m.width), int(m.height)
    except Exception:
        pass
    # 2) tkinter 대체
    try:
        import tkinter as tk
        root = tk.Tk()
        root.withdraw()
        w = root.winfo_screenwidth()
        h = root.winfo_screenheight()
        root.destroy()
        return int(w), int(h)
    except Exception:
        pass
    # 3) fallback
    return 1280, 720

SCREEN_W, SCREEN_H = _get_screen_size()
print(f"[INFO] Screen size detected: {SCREEN_W}x{SCREEN_H}")


[INFO] Screen size detected: 1920x2160



## 4) 설정값


In [5]:

USE_CAMERA   = True
CAP_INDEX    = 0
VIDEO_SOURCE = "./sample.mp4"

WINDOW_NAME = "MediaPipe Hands — Finger Count"
SAVE_DIR = "./mp_hands_snaps"
os.makedirs(SAVE_DIR, exist_ok=True)

COUNT_THRESHOLD = 8   # 총 펴진 손가락 수가 이 값 이상이면 경고 출력

# MediaPipe Hands 파라미터
HANDS_MAX_NUM = 2
HANDS_DET_CONF = 0.5
HANDS_TRK_CONF = 0.5



## 5) 카메라 초기화 (재시도)
V4L2(YUYV/MJPEG) 및 기본 백엔드 순차 시도 후, **프레임 안정성**을 검증합니다.


In [6]:

def setup_camera_with_retry(index=0):
    methods = [
        {
            'name': 'V4L2_YUYV',
            'backend': cv2.CAP_V4L2,
            'settings': {
                'fourcc': cv2.VideoWriter_fourcc('Y', 'U', 'Y', 'V'),
                'width': 640,
                'height': 480,
                'fps': 30,
                'buffersize': 1,
            }
        },
        {
            'name': 'V4L2_MJPEG',
            'backend': cv2.CAP_V4L2,
            'settings': {
                'fourcc': cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'),
                'width': 640,
                'height': 480,
                'fps': 30,
                'buffersize': 1,
            }
        },
        {
            'name': 'DEFAULT',
            'backend': None,
            'settings': {
                'width': 640,
                'height': 480,
                'fps': 30,
                'buffersize': 1,
            }
        },
    ]

    for method in methods:
        print(f"[CAM] Trying {method['name']}...")
        try:
            cap = cv2.VideoCapture(index) if method['backend'] is None else cv2.VideoCapture(index, method['backend'])
            if not cap.isOpened():
                print(f"[CAM] Open failed with {method['name']}")
                continue

            s = method['settings']
            if 'fourcc' in s:
                cap.set(cv2.CAP_PROP_FOURCC, s['fourcc'])
            cap.set(cv2.CAP_PROP_FRAME_WIDTH,  s['width'])
            cap.set(cv2.CAP_PROP_FRAME_HEIGHT, s['height'])
            cap.set(cv2.CAP_PROP_FPS,          s['fps'])
            cap.set(cv2.CAP_PROP_BUFFERSIZE,   s['buffersize'])

            time.sleep(1.0)

            ok_cnt = 0
            for _ in range(5):
                ret, f = cap.read()
                if ret and f is not None:
                    ok_cnt += 1
                time.sleep(0.1)

            if ok_cnt >= 3:
                print(f"[CAM] Ready with {method['name']}")
                return cap, method['name']
            else:
                print(f"[CAM] Unstable with {method['name']}")
                cap.release()

        except Exception as e:
            print(f"[CAM] Error on {method['name']}: {e}")

    return None, None



## 6) 화면 맞춤 & 손가락 카운트 유틸
**손가락 판정 로직(휴리스틱)**  
- 엄지(Thumb): handedness(좌/우)에 따라 TIP.x vs IP.x 비교  
- 나머지 4개(Index/Middle/Ring/Pinky): TIP.y < PIP.y 이면 '펴짐'으로 간주 (영상 좌표계 기준 y는 아래로 증가)


In [7]:

def letterbox_fit_to_screen(frame, screen_w, screen_h, color=(0,0,0)):
    h, w = frame.shape[:2]
    scale = min(screen_w / w, screen_h / h)
    new_w, new_h = int(w * scale), int(h * scale)
    resized = cv2.resize(frame, (new_w, new_h), interpolation=cv2.INTER_LINEAR)

    canvas = np.zeros((screen_h, screen_w, 3), dtype=np.uint8)
    canvas[:] = color
    x_off = (screen_w - new_w) // 2
    y_off = (screen_h - new_h) // 2
    canvas[y_off:y_off+new_h, x_off:x_off+new_w] = resized
    return canvas, scale, x_off, y_off

# MediaPipe Hands 인덱스
THUMB_TIP = 4
THUMB_IP  = 3
INDEX_TIP = 8
INDEX_PIP = 6
MIDDLE_TIP = 12
MIDDLE_PIP = 10
RING_TIP = 16
RING_PIP = 14
PINKY_TIP = 20
PINKY_PIP = 18

def count_fingers_one_hand(hand_landmarks, handed_label):
    lm = hand_landmarks.landmark

    # 엄지: handedness 기반 x 축 비교
    thumb_open = None
    if handed_label.lower().startswith("right"):
        thumb_open = lm[THUMB_TIP].x < lm[THUMB_IP].x
    else:  # left
        thumb_open = lm[THUMB_TIP].x > lm[THUMB_IP].x

    # 나머지: TIP.y < PIP.y
    index_open  = lm[INDEX_TIP].y  < lm[INDEX_PIP].y
    middle_open = lm[MIDDLE_TIP].y < lm[MIDDLE_PIP].y
    ring_open   = lm[RING_TIP].y   < lm[RING_PIP].y
    pinky_open  = lm[PINKY_TIP].y  < lm[PINKY_PIP].y

    opens = [thumb_open, index_open, middle_open, ring_open, pinky_open]
    return sum(int(v) for v in opens), opens

def draw_hand_info(bgr, hand_landmarks, count, handed_label):
    # 손목 좌표 근처에 텍스트로 표시
    h, w = bgr.shape[:2]
    wrist = hand_landmarks.landmark[0]
    px, py = int(wrist.x * w), int(wrist.y * h)
    txt = f"{handed_label}: {count}"
    cv2.putText(bgr, txt, (px+10, py-10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0,255,255), 2)



## 7) 실시간 추론 루프 (손가락 개수 HUD 포함)
**왼손/오른손별 개수 + 총합**을 계산하여 화면과 콘솔에 출력합니다.  
총합이 `COUNT_THRESHOLD` 이상이면 콘솔에 경고 메시지를 표시합니다.


In [8]:

print("[INFO] Starting MediaPipe Hands — Finger Count...")

if USE_CAMERA:
    cap, cam_method = setup_camera_with_retry(CAP_INDEX)
    if cap is None:
        raise SystemExit("[FATAL] 카메라 초기화 실패")
    src_desc = f"camera:{CAP_INDEX} ({cam_method})"
else:
    cap = cv2.VideoCapture(VIDEO_SOURCE)
    if not cap.isOpened():
        raise SystemExit(f"[FATAL] 비디오 파일 열기 실패: {VIDEO_SOURCE}")
    src_desc = f"video:{VIDEO_SOURCE}"

print(f"[INFO] Source: {src_desc}")
cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)
cv2.resizeWindow(WINDOW_NAME, SCREEN_W, SCREEN_H)

fullscreen = False
fps = 0.0
frame_count = 0
last_time = time.time()

hands_ctx = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=HANDS_MAX_NUM,
    min_detection_confidence=HANDS_DET_CONF,
    min_tracking_confidence=HANDS_TRK_CONF,
)

try:
    while True:
        ret, frame = cap.read()
        if not ret or frame is None:
            print("[WARN] Frame read failed")
            time.sleep(0.05)
            continue

        # 셀피 스타일
        frame = cv2.flip(frame, 1)

        # MediaPipe는 RGB 입력 권장
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        results = hands_ctx.process(rgb)

        vis = frame.copy()
        total_open = 0
        left_count = 0
        right_count = 0

        if results.multi_hand_landmarks:
            # handedness 정보(좌/우) 매칭
            handedness_list = []
            if results.multi_handedness:
                for hlabel in results.multi_handedness:
                    handedness_list.append(hlabel.classification[0].label)
            else:
                handedness_list = ["Unknown"] * len(results.multi_hand_landmarks)

            for hand_lm, handed_label in zip(results.multi_hand_landmarks, handedness_list):
                # 랜드마크 그리기
                mp_drawing.draw_landmarks(
                    vis, hand_lm, mp_hands.HAND_CONNECTIONS,
                    mp_styles.get_default_hand_landmarks_style(),
                    mp_styles.get_default_hand_connections_style(),
                )
                # 손가락 개수 계산
                cnt, opens = count_fingers_one_hand(hand_lm, handed_label)
                draw_hand_info(vis, hand_lm, cnt, handed_label)
                total_open += cnt
                if handed_label.lower().startswith("left"):
                    left_count = cnt
                elif handed_label.lower().startswith("right"):
                    right_count = cnt

        # 화면 맞춤(레터박싱)
        disp, scale, x_off, y_off = letterbox_fit_to_screen(vis, SCREEN_W, SCREEN_H, color=(0,0,0))

        # FPS
        frame_count += 1
        if frame_count % 30 == 0:
            now = time.time()
            fps = 30.0 / (now - last_time)
            last_time = now

        # HUD
        hud = f"Left: {left_count} | Right: {right_count} | Total: {total_open} | FPS: {fps:.1f}"
        cv2.putText(disp, hud, (10, SCREEN_H-10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255,255,0), 2)

        # 콘솔 출력 및 임계치 경고
        print(f"[INFO] Finger open count — Left:{left_count} Right:{right_count} Total:{total_open}")
        if total_open >= COUNT_THRESHOLD:
            print(f"⚠️ Total finger open count >= {COUNT_THRESHOLD}!")

        cv2.imshow(WINDOW_NAME, disp)

        key = cv2.waitKey(1) & 0xFF
        if key == ord('q') or key == 27:
            print("[INFO] Exit requested.")
            break
        elif key == ord('f'):
            fullscreen = not fullscreen
            prop = cv2.WND_PROP_FULLSCREEN
            cv2.setWindowProperty(WINDOW_NAME, prop, cv2.WINDOW_FULLSCREEN if fullscreen else cv2.WINDOW_NORMAL)
            if not fullscreen:
                cv2.resizeWindow(WINDOW_NAME, SCREEN_W, SCREEN_H)
        elif key == ord('s'):
            ts = datetime.now().strftime("%Y%m%d_%H%M%S")
            path = os.path.join(SAVE_DIR, f"mp_hands_snap_{ts}.jpg")
            cv2.imwrite(path, vis)
            print(f"[SAVE] Snapshot: {path}")

except KeyboardInterrupt:
    print("[INFO] Interrupted by user.")

finally:
    if cap is not None:
        cap.release()
    cv2.destroyAllWindows()
    hands_ctx.close()
    print("[CLEANUP] Released resources.")


[INFO] Starting MediaPipe Hands — Finger Count...
[CAM] Trying V4L2_YUYV...
[CAM] Ready with V4L2_YUYV
[INFO] Source: camera:0 (V4L2_YUYV)


Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1755317206.037856    8688 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1755317206.061050    8688 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1755317206.123101    8687 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.


[INFO] Finger open count — Left:0 Right:0 Total:0
[INFO] Finger open count — Left:0 Right:0 Total:0
[INFO] Finger open count — Left:0 Right:0 Total:0
[INFO] Finger open count — Left:0 Right:0 Total:0
[INFO] Finger open count — Left:0 Right:0 Total:0
[INFO] Finger open count — Left:0 Right:0 Total:0
[INFO] Finger open count — Left:0 Right:0 Total:0
[INFO] Finger open count — Left:3 Right:0 Total:4
[INFO] Finger open count — Left:0 Right:0 Total:0
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4
[INFO] Finger open count — Left:4 Right:0 Total:4



## 8) 참고 & 튜닝 팁
- **엄지 판정**은 좌/우 손 구분을 활용하여 TIP.x vs IP.x 비교(간단 휴리스틱)입니다.  
  조명/각도/원근에 따라 간혹 오동작할 수 있으니, 필요한 경우 **각도 기반 보정**(벡터 내적/관절 각도)으로 개선하세요.
- **COUNT_THRESHOLD**를 높이면 경고 발생 빈도가 줄어듭니다.
- 셀피용으로 프레임을 `flip`(좌우 반전)했지만, MediaPipe의 handedness는 **이미지 기준**이므로 판정 로직은 그대로 동작합니다.
- Jetson/도커에서 GUI를 쓰려면 X11 포워딩, 카메라 권한(`--device /dev/video0`), 성능 모드 등을 참고하세요.
