Cell 0 (Markdown)
# 02 — Realtime Webcam: Landmarks + KNN + TTS (ngomong huruf)

## Ide
- Webcam → MediaPipe HandLandmarker (VIDEO mode) → fitur 63D
- Model KNN → prediksi huruf + confidence
- Smoothing voting beberapa frame agar stabil
- TTS offline (pyttsx3) ngomong huruf secara realtime

Catatan:
VIDEO mode butuh `detect_for_video(image, timestamp_ms)`.
TTS pakai external loop agar tidak cuma bunyi sekali.

Cell 1 (Code) — Import + load model

In [None]:
from pathlib import Path
import time
from collections import deque

import cv2
import numpy as np
import joblib
import mediapipe as mp
import pyttsx3

ROOT = Path.cwd().parent

# ====== PATH FILE ======
MODEL_PATH = ROOT / "models" / "knn_landmarks.joblib"
LE_PATH = ROOT / "models" / "label_encoder.joblib"
TASK_PATH = ROOT / "assets" / "hand_landmarker.task"

# ====== VALIDASI FILE ======
if not MODEL_PATH.exists():
    raise FileNotFoundError(f"Model belum ada: {MODEL_PATH} (jalankan Notebook 01)")
if not LE_PATH.exists():
    raise FileNotFoundError(f"Label encoder belum ada: {LE_PATH} (jalankan Notebook 01)")
if not TASK_PATH.exists():
    raise FileNotFoundError(f"Task model belum ada: {TASK_PATH} (download hand_landmarker.task)")

# ====== LOAD ======
model = joblib.load(MODEL_PATH)   # pipeline scaler + knn
le = joblib.load(LE_PATH)         # label encoder


Cell 2 (Markdown)
## TTS: kenapa pakai `startLoop(False)` + `iterate()`?
Kalau  panggil `runAndWait()` terus-menerus di loop webcam, sering terjadi:
- loop jadi tersendat,
- atau suara hanya keluar sekali lalu macet.

Solusi: pakai external event loop:
- `engine.startLoop(False)`
- panggil `engine.iterate()` setiap frame
- tutup `engine.endLoop()`

Ini sesuai dokumentasi pyttsx3 (external event loop).

Cell 3 (Code) — Setup TTS external loop

In [None]:
# ====== INIT TTS ======
engine = pyttsx3.init()            # inisialisasi engine TTS
engine.setProperty("rate", 170)    # kecepatan bicara (silakan tuning)
engine.setProperty("volume", 1.0)  # volume 0..1

# start external loop: kita sendiri yang memanggil iterate()
engine.startLoop(False)

def speak(text: str):
    """
    Enqueue teks ke TTS.
    Tidak pakai runAndWait() agar tidak blocking.
    """
    engine.say(text)


Cell 4 (Markdown)
## Stabilitas realtime: confidence + smoothing
- Confidence: kalau probabilitas tertinggi < threshold → UNKNOWN (tidak ngomong)
- Smoothing: voting mayoritas dari beberapa frame terakhir (mengurangi goyang)
- Cooldown: cegah TTS spam

Cell 5 (Code) — Setup MediaPipe VIDEO + fungsi fitur

In [None]:
# ====== PARAMETER REALTIME (TUNING) ======
CONF_TH = 0.60          # minimal confidence agar dianggap valid
SMOOTH_N = 9            # jumlah frame untuk voting
SPEAK_COOLDOWN = 0.7    # jeda bicara saat huruf berubah
REPEAT_EVERY = 2.0      # ulangi huruf sama tiap N detik (biar tidak cuma sekali)

history = deque(maxlen=SMOOTH_N)
last_spoken = None
last_spoken_time = 0.0

# ====== MEDIAPIPE TASKS: VIDEO MODE ======
BaseOptions = mp.tasks.BaseOptions
HandLandmarker = mp.tasks.vision.HandLandmarker
HandLandmarkerOptions = mp.tasks.vision.HandLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = HandLandmarkerOptions(
    base_options=BaseOptions(model_asset_path=str(TASK_PATH)),
    running_mode=VisionRunningMode.VIDEO,  # VIDEO untuk webcam
    num_hands=1
)

def landmarks_to_features(lms):
    """
    Harus sama seperti training:
    - wrist origin (translasi)
    - scale normalize
    """
    pts = np.array([[lm.x, lm.y, lm.z] for lm in lms], dtype=np.float32)
    pts = pts - pts[0]
    scale = np.max(np.linalg.norm(pts[:, :2], axis=1))
    if scale < 1e-6:
        scale = 1.0
    pts = pts / scale
    return pts.reshape(1, -1)  # (1,63)


Cell 6 (Code) — Loop webcam + prediksi + TTS

In [4]:
cap = cv2.VideoCapture(0)  # buka webcam index 0
if not cap.isOpened():
    engine.endLoop()
    raise RuntimeError("Webcam tidak bisa dibuka. Coba ganti index 0/1/2.")

start_time = time.time()   # acuan timestamp untuk detect_for_video

try:
    # buat landmarker dari options
    with HandLandmarker.create_from_options(options) as landmarker:
        while True:
            ok, frame = cap.read()
            if not ok:
                break

            frame = cv2.flip(frame, 1)                 # mirror agar nyaman
            H, W = frame.shape[:2]

            # MediaPipe butuh RGB/sRGB (OpenCV default BGR)
            rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb)

            # VIDEO mode wajib timestamp_ms
            timestamp_ms = int((time.time() - start_time) * 1000)

            # deteksi landmarks realtime
            result = landmarker.detect_for_video(mp_image, timestamp_ms)

            label_text = "NO_HAND"
            conf = 0.0

            if result.hand_landmarks:
                lms = result.hand_landmarks[0]
                X = landmarks_to_features(lms)

                # ambil confidence dari probabilitas (kalau ada)
                if hasattr(model, "predict_proba"):
                    proba = model.predict_proba(X)[0]
                    conf = float(np.max(proba))
                    pred_idx = int(np.argmax(proba))
                else:
                    pred_idx = int(model.predict(X)[0])
                    conf = 1.0

                if conf >= CONF_TH:
                    # smoothing voting
                    history.append(pred_idx)
                    stable = max(set(history), key=list(history).count)
                    label_text = le.inverse_transform([stable])[0]

                    now = time.time()
                    # speak jika huruf berubah atau sudah lewat REPEAT_EVERY
                    should_speak = (
                        (label_text != last_spoken and (now - last_spoken_time) > SPEAK_COOLDOWN) or
                        (label_text == last_spoken and (now - last_spoken_time) > REPEAT_EVERY)
                    )

                    if should_speak:
                        speak(label_text)
                        last_spoken = label_text
                        last_spoken_time = now
                else:
                    label_text = "UNKNOWN"
                    history.clear()

            # PENTING: proses event loop TTS tanpa blocking (biar tidak cuma sekali)
            engine.iterate()

            # overlay teks
            cv2.putText(frame, f"Pred: {label_text} conf={conf:.2f}", (20, 40),
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

            cv2.imshow("Realtime ASL (KNN Landmarks) + TTS", frame)

            key = cv2.waitKey(1) & 0xFF
            if key in (ord("q"), 27):  # q atau ESC
                break
finally:
    cap.release()
    cv2.destroyAllWindows()
    engine.endLoop()  # tutup loop TTS


KeyboardInterrupt: 