
# Facial Recognition (Simplified, Step-by-Step) — OpenCV LBPH
**EN:** Minimal, clean pipeline for **Face Detection → Dataset Collection → Training → Real-time Recognition**, built with Python + OpenCV (LBPH).  
**VN:** Pipeline tối giản, rõ ràng cho **Phát hiện khuôn mặt → Thu thập dữ liệu → Huấn luyện → Nhận diện thời gian thực**, viết bằng Python + OpenCV (LBPH).

> This notebook is **intentionally different** from common templates: smaller, modular functions, clear thresholds, and a simple evaluation step to improve accuracy.
>
> Notebook này **được viết khác** với mẫu thường gặp: hàm gọn, ngưỡng rõ ràng, có đánh giá đơn giản để tăng độ chính xác.



## 0. Environment setup / Cài đặt môi trường
- Requires **Python 3.8–3.12** (OpenCV wheels are most stable). Python 3.13 may require newer wheels.  
- Cần **opencv-contrib-python** để dùng `cv2.face.LBPHFaceRecognizer_create`.


In [15]:
# !pip install --upgrade pip
# !pip install "opencv-contrib-python>=4.7,<4.10" imutils numpy scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting opencv-contrib-python<4.10,>=4.7
  Downloading opencv_contrib_python-4.9.0.80-cp37-abi3-win_amd64.whl.metadata (20 kB)
Collecting imutils
  Downloading imutils-0.5.4.tar.gz (17 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Downloading opencv_contrib_python-4.9.0.80-cp37-abi3-win_amd64.whl (45.3 MB)
   ---------------------------------------- 0.0/45.3 MB ? eta -:--:--
   ---------------------------------------- 0.3/45.3 MB ? eta -:--:--
   -------- ------------------------------- 9.4/45.3 MB 36.2 MB/s eta 0:00:01
   ------------ -----------


## 1. Imports & Config / Thư viện & Cấu hình
- Haar cascade for face detection (fast and robust enough for demos)
- LBPH recognizer for classic on-device recognition


In [24]:
import os, time, json, glob, random
from pathlib import Path
from typing import Dict, List, Tuple

import cv2
import numpy as np

# Optional tools
try:
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import classification_report, accuracy_score
    SKLEARN_OK = True
except Exception:
    SKLEARN_OK = False

# Paths
DATA_DIR = Path("data_faces")        # where cropped face images are stored
MODEL_DIR = Path("models")           # where trained model & label map are saved
MODEL_DIR.mkdir(parents=True, exist_ok=True)
DATA_DIR.mkdir(parents=True, exist_ok=True)

CASCADE_PATH = cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
FACE_CASCADE = cv2.CascadeClassifier(CASCADE_PATH)

LBPH_RADIUS = 2
LBPH_NEIGHBORS = 16
LBPH_GRID_X = 8
LBPH_GRID_Y = 8
FACE_SIZE = (100, 100)   # Kích thước input cho LBPH
MIN_FACE = 64  

# Prediction thresholds (tune per your dataset/camera)
# For LBPH in OpenCV: smaller distance => better match. We'll convert to "confidence" later.
DIST_THRESHOLD = 70.0  # lower is stricter; start ~60-80; tune experimentally

RANDOM_SEED = 42
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

print("OpenCV:", cv2.__version__)
print("SKLEARN available:", SKLEARN_OK)
print("Cascade path:", CASCADE_PATH)


OpenCV: 4.9.0
SKLEARN available: True
Cascade path: C:\Users\Dell\AppData\Roaming\Python\Python310\site-packages\cv2\data\haarcascade_frontalface_default.xml



## 2. Utilities / Tiện ích
**EN:** Helper functions to detect, align/normalize faces, and save crops.  
**VN:** Các hàm hỗ trợ phát hiện, căn chỉnh/chuẩn hoá ảnh khuôn mặt và lưu ảnh.


In [25]:

def detect_faces_bgr(image_bgr, scaleFactor=1.2, minNeighbors=5, minSize=(80, 80)):
    gray = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2GRAY)
    faces = FACE_CASCADE.detectMultiScale(gray, scaleFactor=scaleFactor, minNeighbors=minNeighbors, minSize=minSize)
    return faces, gray

def preprocess_crop(gray, x, y, w, h, out_size=(200, 200)):
    # Simple crop + resize + histogram equalization
    crop = gray[y:y+h, x:x+w]
    crop = cv2.resize(crop, out_size, interpolation=cv2.INTER_AREA)
    crop = cv2.equalizeHist(crop)
    return crop

def draw_face_box(img, x, y, w, h, color=(0,255,0), label=None):
    cv2.rectangle(img, (x,y), (x+w, y+h), color, 2)
    if label:
        cv2.putText(img, label, (x, max(0,y-10)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, color, 2, cv2.LINE_AA)



## 3. Collect dataset / Thu thập dữ liệu
**Tip:** Capture 10–30 faces per person under different lighting and angles.  
**Gợi ý:** Mỗi người nên chụp 10–30 ảnh ở nhiều góc/ánh sáng.

- Press **c** to capture a detected face (chụp)
- Press **q** to quit (thoát)


In [33]:

def collect_person_dataset(person_name, cam_index=0, max_capture=8, delay_ms=50):
    person_dir = DATA_DIR / person_name
    person_dir.mkdir(parents=True, exist_ok=True)
    cap = cv2.VideoCapture(cam_index)
    if not cap.isOpened():
        print("[ERROR] Cannot open webcam.")
        return

    print(f"[INFO] Collecting faces for '{person_name}'. Press 'c' to capture, 'q' to quit.")
    saved = 0
    try:
        while True:
            ret, frame = cap.read()
            if not ret:
                print("[WARN] Failed to grab frame.")
                break

            faces, gray = detect_faces_bgr(frame)
            for (x,y,w,h) in faces:
                draw_face_box(frame, x,y,w,h, (0,255,0), f"{person_name}")
            cv2.imshow("Collecting - press 'c' to capture, 'q' to quit", frame)

            key = cv2.waitKey(delay_ms) & 0xFF
            if key == ord('c') and len(faces) > 0:
                # Save the first detected face
                (x,y,w,h) = faces[0]
                crop = preprocess_crop(gray, x,y,w,h)
                filename = person_dir / f"{int(time.time()*1000)}.png"
                cv2.imwrite(str(filename), crop)
                saved += 1
                print(f"[SAVE] {filename}")
                if saved >= max_capture:
                    print(f"[DONE] Collected {saved} images for {person_name}.")
                    break
            elif key == ord('q'):
                print("[QUIT] Stopped by user.")
                break
    finally:
        cap.release()
        cv2.destroyAllWindows()

# Example (uncomment to use):
# collect_person_dataset("Nguyen Quoc Dai", cam_index=0, max_capture=25)
# collect_person_dataset("Phung Ngoc Hiep", cam_index=0, max_capture=25)
# collect_person_dataset("Duong Ngo Nhat Minh", cam_index=0, max_capture=25)
collect_person_dataset("Nguyen Viet Gia Bao", cam_index=0, max_capture=25)

[INFO] Collecting faces for 'Nguyen Viet Gia Bao'. Press 'c' to capture, 'q' to quit.
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305521452.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305521834.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305522218.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305522515.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305522878.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305523200.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305523430.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305523749.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305524441.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305525041.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305530494.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305532209.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305532442.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305532903.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305533094.png
[SAVE] data_faces\Nguyen Viet Gia Bao\1762305533468.png
[SAVE] data_faces\


## 4. Load dataset & encode labels / Nạp dữ liệu & mã hoá nhãn
This scans `data_faces/<person>/*.png` and builds `X` (images) and `y` (label ids).


In [38]:

def scan_dataset(data_dir=DATA_DIR):
    classes = sorted([p.name for p in data_dir.iterdir() if p.is_dir()])
    label2id = {c:i for i,c in enumerate(classes)}
    id2label = {i:c for c,i in label2id.items()}

    X, y = [], []
    for person in classes:
        for img_path in glob.glob(str(data_dir / person / "*.png")):
            img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
            if img is None:
                continue
            X.append(img)
            y.append(label2id[person])
    X = np.array(X, dtype=np.uint8)
    y = np.array(y, dtype=np.int32)
    print(f"[INFO] Classes: {classes}")
    print(f"[INFO] Samples: {len(X)}")
    return X, y, label2id, id2label

X, y, label2id, id2label = scan_dataset()
print("Shape:", X.shape, y.shape)


[INFO] Classes: ['Duong Ngo Nhat Minh', 'Nguyen Quoc Dai', 'Nguyen Viet Gia Bao']
[INFO] Samples: 21
Shape: (21, 200, 200) (21,)



## 5. Train LBPH model / Huấn luyện mô hình LBPH
- **LBPH** (Local Binary Patterns Histograms) is simple and works well on small datasets.  
- Save `model.yml` and `labels.json` for later use.


In [39]:
def train_lbph(X, y):
    # OpenCV's LBPH expects a list of grayscale images and a NumPy array of int labels
    if not hasattr(cv2, "face") or not hasattr(cv2.face, "LBPHFaceRecognizer_create"):
        raise RuntimeError("cv2.face.LBPHFaceRecognizer_create not available. Install opencv-contrib-python.")

    recognizer = cv2.face.LBPHFaceRecognizer_create(
        radius=LBPH_RADIUS,
        neighbors=LBPH_NEIGHBORS,
        grid_x=LBPH_GRID_X,
        grid_y=LBPH_GRID_Y
    )

    # Convert to proper types
    X_list = [np.array(img, dtype=np.uint8) for img in X]
    y_array = np.array(y, dtype=np.int32)

    recognizer.train(X_list, y_array)
    return recognizer

def save_lbph_safely(recognizer, model_path: Path):
    tmp_path = model_path.with_suffix(model_path.suffix + ".tmp")
    recognizer.write(str(tmp_path))  # OpenCV tự đóng file
    os.replace(tmp_path, model_path) # atomic replace (Windows/Linux/macOS)
    
if len(X) >= 2 and len(np.unique(y)) >= 1:
    recognizer = train_lbph(X, y)
    model_path = MODEL_DIR / "lbph_model.yml"
    save_lbph_safely(recognizer, model_path)
    with open(MODEL_DIR / "labels.json", "w", encoding="utf-8") as f:
        json.dump({str(k):v for k,v in label2id.items()}, f, ensure_ascii=False, indent=2)
    print(f"[SAVE] Model: {model_path}")
    print(f"[SAVE] Label map: {MODEL_DIR/'labels.json'}")
else:
    print("[WARN] Not enough samples to train. Please collect more data.")


[SAVE] Model: models\lbph_model.yml
[SAVE] Label map: models\labels.json


In [23]:
import os, json, cv2, numpy as np
from pathlib import Path
from typing import Dict, List, Tuple

# ====== CẤU HÌNH ======
DATASET_DIR = Path("data_faces")  # Thư mục dữ liệu: dataset/<label>/*.jpg|png|jpeg
ARTIFACTS_DIR = Path("models")
MODEL_PATH = ARTIFACTS_DIR / "lbph_model.xml"
LABELS_PATH = ARTIFACTS_DIR / "labels.json"

LBPH_RADIUS = 2
LBPH_NEIGHBORS = 16
LBPH_GRID_X = 8
LBPH_GRID_Y = 8
FACE_SIZE = (100, 100)   # Kích thước input cho LBPH
MIN_FACE = 64            # Bỏ qua box mặt nhỏ hơn cạnh này
# ======================

ARTIFACTS_DIR.mkdir(parents=True, exist_ok=True)

# 0) Kiểm tra opencv-contrib (cần có cv2.face)
if not hasattr(cv2, "face"):
    raise RuntimeError(
        "Thiếu cv2.face (cần opencv-contrib-python). "
        "Cài: pip install --upgrade opencv-contrib-python"
    )

# 1) Xoá model cũ nếu hỏng (tuỳ chọn)
def maybe_remove_corrupted_model(p: Path):
    if p.exists() and p.stat().st_size < 100:
        print(f"[WARN] Model nghi hỏng (size {p.stat().st_size}B). Xoá file.")
        p.unlink()

maybe_remove_corrupted_model(MODEL_PATH)

# 2) Hàm lưu model an toàn (atomic replace)
def save_lbph_safely(recognizer, model_path: Path):
    tmp_path = model_path.with_suffix(model_path.suffix + ".tmp")
    recognizer.write(str(tmp_path))
    os.replace(tmp_path, model_path)

# 3) Tải cascade để detect khi ảnh chưa crop
cascade_path = str(Path(cv2.data.haarcascades) / "haarcascade_frontalface_default.xml")
face_cascade = cv2.CascadeClassifier(cascade_path)

def collect_images_and_labels(dataset_dir: Path) -> Tuple[List[np.ndarray], List[int], Dict[str, int]]:
    images: List[np.ndarray] = []
    labels: List[int] = []
    label2id: Dict[str, int] = {}

    if not dataset_dir.exists():
        raise FileNotFoundError(f"Không thấy thư mục dữ liệu: {dataset_dir.resolve()}")

    next_id = 0
    total_files = 0
    used_files = 0

    for person_dir in sorted(dataset_dir.iterdir()):
        if not person_dir.is_dir():
            continue
        label = person_dir.name
        if label not in label2id:
            label2id[label] = next_id
            next_id += 1

        for img_path in person_dir.glob("*.*"):
            if img_path.suffix.lower() not in [".jpg", ".jpeg", ".png", ".bmp", ".tif", ".tiff", ".webp"]:
                continue
            total_files += 1
            img = cv2.imdecode(np.fromfile(str(img_path), dtype=np.uint8), cv2.IMREAD_COLOR)
            if img is None:
                continue

            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

            # Nếu ảnh đã crop sẵn gần như vừa khít mặt => bỏ detect, chỉ resize
            # Heuristic: nếu ảnh gần vuông và khuôn mặt chiếm phần lớn khung hình
            h, w = gray.shape
            is_probably_cropped = (min(h, w) / max(h, w) > 0.85)

            faces = []
            if not is_probably_cropped:
                faces = face_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=3, minSize=(MIN_FACE, MIN_FACE))

            if len(faces) == 0:
                # Không detect thấy mặt => dùng toàn ảnh (cho dữ liệu đã crop)
                face_roi = cv2.resize(gray, FACE_SIZE, interpolation=cv2.INTER_AREA)
                images.append(face_roi)
                labels.append(label2id[label])
                used_files += 1
            else:
                # Với ảnh chứa nhiều mặt: lấy mặt lớn nhất
                x, y, w, h = max(faces, key=lambda b: b[2]*b[3])
                if min(w, h) < MIN_FACE:
                    continue
                roi = gray[y:y+h, x:x+w]
                roi = cv2.resize(roi, FACE_SIZE, interpolation=cv2.INTER_AREA)
                images.append(roi)
                labels.append(label2id[label])
                used_files += 1

    if len(images) == 0:
        raise RuntimeError(
            f"Không thu được ảnh hợp lệ để train từ '{dataset_dir}'. "
            f"Đảm bảo cấu trúc 'dataset/<label>/*.jpg' và ảnh có mặt rõ."
        )

    print(f"[INFO] Ảnh đọc: {total_files}, Ảnh dùng để train: {used_files}, Số nhãn: {len(label2id)}")
    return images, labels, label2id

def sanity_check_model_file(p: Path):
    if not p.exists():
        raise FileNotFoundError(p)
    sz = p.stat().st_size
    with open(p, "r", errors="ignore") as f:
        head = f.read(64)
    print(f"[DEBUG] Model size={sz} bytes, head={repr(head)}")
    if (".xml" in p.suffix.lower() and "<?xml" not in head) or \
       (".yml" in p.suffix.lower() and "%YAML" not in head):
        raise RuntimeError("File ghi ra không phải định dạng FileStorage kỳ vọng.")
    
# 4) Train LBPH
def train_lbph(dataset_dir: Path, model_path: Path, labels_path: Path):
    images, labels, label2id = collect_images_and_labels(dataset_dir)
    labels_np = np.array(labels, dtype=np.int32)

    recognizer = cv2.face.LBPHFaceRecognizer_create(
        radius=LBPH_RADIUS,
        neighbors=LBPH_NEIGHBORS,
        grid_x=LBPH_GRID_X,
        grid_y=LBPH_GRID_Y
    )

    print("[INFO] Bắt đầu train LBPH...")
    recognizer.train(images, labels_np)

    print(f"[INFO] Lưu model → {model_path}")
    save_lbph_safely(recognizer, model_path)

    print(f"[INFO] Lưu labels → {labels_path}")
    with open(labels_path, "w", encoding="utf-8") as f:
        json.dump(label2id, f, ensure_ascii=False, indent=2)

    # Kiểm tra đọc lại ngay
    test_rec = cv2.face.LBPHFaceRecognizer_create(
        radius=LBPH_RADIUS, neighbors=LBPH_NEIGHBORS, grid_x=LBPH_GRID_X, grid_y=LBPH_GRID_Y
    )
    test_rec.read(str(model_path))
    print("[OK] Model LBPH (XML) đọc lại thành công.")

# 5) Thực thi
#    - Nếu model cũ hỏng/thiếu, train lại
#    - Nếu muốn buộc train lại, hãy xoá MODEL_PATH trước khi chạy cell
if not MODEL_PATH.exists():
    print("[INFO] Không có model hiện tại. Tiến hành train...")
    train_lbph(DATASET_DIR, MODEL_PATH, LABELS_PATH)
else:
    # Thử đọc lại để chắc chắn
    try:
        rec = cv2.face.LBPHFaceRecognizer_create(
            radius=LBPH_RADIUS, neighbors=LBPH_NEIGHBORS, grid_x=LBPH_GRID_X, grid_y=LBPH_GRID_Y
        )
        rec.read(str(MODEL_PATH))
        print("[OK] Model hiện có đọc được. Không cần train lại.")
    except cv2.error:
        print("[WARN] File model hiện tại không đọc được. Train lại...")
        train_lbph(DATASET_DIR, MODEL_PATH, LABELS_PATH)


[WARN] File model hiện tại không đọc được. Train lại...
[INFO] Ảnh đọc: 75, Ảnh dùng để train: 75, Số nhãn: 3
[INFO] Bắt đầu train LBPH...
[INFO] Lưu model → models\lbph_model.xml
[INFO] Lưu labels → models\labels.json


error: OpenCV(4.9.0) D:\a\opencv-python\opencv-python\opencv\modules\core\src\persistence.cpp:1601: error: (-215:Assertion failed) ofs == fs_data_blksz[blockIdx] in function 'cv::FileStorage::Impl::normalizeNodeOfs'



## 6. Quick evaluation / Đánh giá nhanh (holdout)
**EN:** Split the dataset into train/test and report simple accuracy.  
**VN:** Chia dữ liệu thành train/test và in độ chính xác đơn giản.

> If scikit-learn is not available, this cell will be skipped.


In [40]:

if SKLEARN_OK and len(X) > 2 and len(np.unique(y)) > 1:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=RANDOM_SEED, stratify=y)
    rec_tmp = train_lbph(X_train, y_train)
    preds = []
    for img in X_test:
        # OpenCV LBPH returns (label_id, distance). We transform distance to "confidence-like" score if needed.
        label_id, dist = rec_tmp.predict(img)
        preds.append(label_id)
    print(classification_report(y_test, preds, target_names=[id2label[i] for i in sorted(id2label.keys())]))
    print("Accuracy:", accuracy_score(y_test, preds))
else:
    print("[NOTE] Skip evaluation (missing sklearn or insufficient data).")


                     precision    recall  f1-score   support

Duong Ngo Nhat Minh       1.00      1.00      1.00         3
    Nguyen Quoc Dai       1.00      1.00      1.00         1
Nguyen Viet Gia Bao       1.00      1.00      1.00         2

           accuracy                           1.00         6
          macro avg       1.00      1.00      1.00         6
       weighted avg       1.00      1.00      1.00         6

Accuracy: 1.0



## 7. Real-time recognition / Nhận diện thời gian thực
- Press **q** to quit.  
- Unknown handling: if distance > `DIST_THRESHOLD`, label as `UNKNOWN`.


In [41]:
# ==== FIXES for Section 7: Real-time recognition ====

def compute_confidence(dist, dist_threshold):
    """
    Convert LBPH distance (lower is better) to a 0-100 confidence-like score.
    If dist >= threshold -> return 0.  If dist == 0 -> return ~100.
    """
    if dist is None or np.isnan(dist):
        return 0.0
    conf = (dist_threshold - float(dist)) / float(dist_threshold)
    return float(max(0.0, min(1.0, conf)) * 100.0)

def load_model():
    model_path = MODEL_DIR / "lbph_model.yml"
    label_path = MODEL_DIR / "labels.json"
    if not model_path.exists() or not label_path.exists():
        raise FileNotFoundError("Model or labels not found. Train first.")

    # 1) Kiểm tra tồn tại
    if not model_path.exists() or not label_path.exists():
        raise FileNotFoundError("Model or labels not found. Train first.")

    # 2) Kiểm tra kích thước (file rỗng hoặc quá nhỏ => hỏng)
    if model_path.stat().st_size < 100:
        raise ValueError(
            f"LBPH model file looks corrupted/empty ({model_path}, size={model_path.stat().st_size}B). "
            "Please retrain and save again."
        )

    # 3) Kiểm tra “magic” đầu file (OpenCV YAML/XML)
    with open(model_path, "r", errors="ignore") as f:
        head = f.read(64)
    if ("%YAML" not in head) and ("<?xml" not in head):
        raise ValueError(
            f"{model_path} is not a valid OpenCV YAML/XML. "
            "Did you save the model with recognizer.write(...)?"
        )

    # 4) Tạo recognizer và đọc
    recognizer = cv2.face.LBPHFaceRecognizer_create(
        radius=LBPH_RADIUS,
        neighbors=LBPH_NEIGHBORS,
        grid_x=LBPH_GRID_X,
        grid_y=LBPH_GRID_Y
    )

    try:
        recognizer.read(str(model_path))
    except cv2.error as e:
        # Thông báo rõ ràng để bạn biết phải retrain
        raise RuntimeError(
            "Failed to read LBPH model via OpenCV. The file is likely corrupted or saved by a different tool. "
            "Please retrain and re-save using recognizer.write(...)."
        ) from e

    # 5) Đọc label map
    with open(label_path, "r", encoding="utf-8") as f:
        label2id = json.load(f)

    # Đảo chiều nếu cần dùng id->label
    id2label = {int(v): k for k, v in label2id.items()}

    return recognizer, id2label

def realtime_recognition(cam_index=0, debug=False):
    recognizer, id2label = load_model()

    # 1) Khuyến nghị nới ngưỡng để calibrate lần đầu
    #    Sau khi xem log, bạn có thể chỉnh lại (ví dụ 85–95)
    dist_threshold = 130

    cap = cv2.VideoCapture(cam_index, cv2.CAP_DSHOW if hasattr(cv2, 'CAP_DSHOW') else 0)
    if not cap.isOpened():
        raise RuntimeError("Cannot open camera")

    try:
        while True:
            ok, frame = cap.read()
            if not ok:
                continue

            faces, gray = detect_faces_bgr(frame, scaleFactor=1.2, minNeighbors=5, minSize=(70, 70))
            for (x, y, w, h) in faces:
                # 2) TIỀN XỬ LÝ GIỐNG HỆT LÚC TRAIN
                face_img = preprocess_crop(gray, x, y, w, h, out_size=(200, 200))

                # 3) DỰ ĐOÁN
                label_id, dist = recognizer.predict(face_img)  # LBPH: lower dist = better
                conf = compute_confidence(dist, dist_threshold)

                if debug:
                    # In log để bạn hiệu chỉnh threshold: nhìn khoảng dist thực tế
                    print(f"[DEBUG] label_id={label_id}, name={id2label.get(label_id, '?')}, dist={dist:.2f}, conf={conf:.1f}")

                if dist <= dist_threshold:
                    name = id2label.get(label_id, "UNKNOWN")
                    color = (0, 255, 0)
                else:
                    name = "UNKNOWN"
                    color = (0, 0, 255)

                draw_face_box(frame, x, y, w, h, color, f"{name} | conf:{conf:.1f} | dist:{dist:.1f}")

            cv2.imshow("Real-time Recognition (press 'q' to quit)", frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
    finally:
        cap.release()
        cv2.destroyAllWindows()

# Gọi thử (bật debug để xem dist thực tế):
realtime_recognition(cam_index=0, debug=True)


[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=123.60, conf=4.9
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=117.20, conf=9.8
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=129.14, conf=0.7
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=128.47, conf=1.2
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=127.02, conf=2.3
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=130.66, conf=0.0
[DEBUG] label_id=0, name=Duong Ngo Nhat Minh, dist=128.85, conf=0.9
[DEBUG] label_id=0, name=Duong Ngo Nhat Minh, dist=129.02, conf=0.8
[DEBUG] label_id=0, name=Duong Ngo Nhat Minh, dist=129.52, conf=0.4
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=121.15, conf=6.8
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=126.27, conf=2.9
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=124.92, conf=3.9
[DEBUG] label_id=1, name=Nguyen Quoc Dai, dist=121.30, conf=6.7
[DEBUG] label_id=0, name=Duong Ngo Nhat Minh, dist=133.37, conf=0.0
[DEBUG] label_id=0, name=Duong Ngo Nhat Minh, dist=140.65, conf=0.0
[DEBUG] label_id=0, 


## 8. Improve accuracy / Mẹo tăng độ chính xác
**Data tips:**
- Capture more faces per person (≥ 20), with **lighting variations**, **angles**, **expressions**.
- Keep the face crop consistent (same size, centered). Use `preprocess_crop` (resize + equalize).
- Remove blurred images from your dataset.

**Training tips:**
- Tune `DIST_THRESHOLD` (try 60–80). Lower threshold → fewer false positives, more UNKNOWN.
- Adjust LBPH params (`radius`, `neighbors`, `grid_x`, `grid_y`).

**Augmentation (optional):**
- Slight rotations (±10°), horizontal flip, brightness/contrast jitter.
- Histogram equalization is already applied in this template.

**When to switch to deep features:**
- If you need higher accuracy across many identities and conditions, try `facenet`-style embeddings or `face_recognition` (dlib/HOG or CNN) and train a classifier (SVM/LogReg) on embeddings.



## 9. Utilities: maintenance / Tiện ích dọn dẹp


In [8]:

def remove_person(person_name):
    person_dir = DATA_DIR / person_name
    if person_dir.exists():
        for p in person_dir.glob("*.png"):
            p.unlink(missing_ok=True)
        try:
            person_dir.rmdir()
            print(f"[CLEAN] Removed folder: {person_dir}")
        except OSError:
            print("[WARN] Folder not empty or in use.")

def wipe_all_data():
    for child in DATA_DIR.glob("*"):
        if child.is_dir():
            for p in child.glob("*.png"):
                p.unlink(missing_ok=True)
            try:
                child.rmdir()
            except OSError:
                pass
    print("[CLEAN] Wiped dataset.")
