# Production-Grade 2D Face Recognition System
## Cell 0: Install Dependencies

This cell installs all required packages for the face recognition pipeline:
- **ultralytics**: YOLOv8-Face detection
- **insightface**: ArcFace embeddings and landmarks
- **onnxruntime**: ONNX model inference
- **scikit-learn**: Classifier training (SVM/KNN)
- **opencv-python**: Image processing
- **numpy, pandas, tqdm**: Data handling and progress bars


In [44]:
# Install all required dependencies
%pip install -q ultralytics insightface onnxruntime scikit-learn opencv-python pillow numpy pandas tqdm

print("‚úÖ All dependencies installed successfully")


Note: you may need to restart the kernel to use updated packages.
‚úÖ All dependencies installed successfully


## Cell 1: Import All Modules

Import all necessary libraries for face detection, alignment, embedding extraction, and classification.


In [45]:
import os
import cv2
import numpy as np
import json
import pickle
from pathlib import Path
from tqdm import tqdm
from PIL import Image
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import cross_val_score
import warnings
warnings.filterwarnings('ignore')

# Face recognition specific imports
from ultralytics import YOLO
import insightface
from insightface.utils.face_align import norm_crop

print("‚úÖ All modules imported successfully")

# Test: Verify key imports
assert YOLO is not None, "YOLO import failed"
assert insightface is not None, "insightface import failed"
print("‚úÖ Import verification passed")


‚úÖ All modules imported successfully
‚úÖ Import verification passed


## Cell 2: Configuration and Paths

Set up all file paths, dataset locations, and artifact directories. Create necessary folders if they don't exist.


In [46]:
# ==================== CONFIGURATION ====================

# Base directory
BASE_DIR = Path("/mnt/NewDisk/sahil_project/FRS copy")
DATASET_DIR = BASE_DIR / "Small_Dataset"
ARTIFACTS_DIR = BASE_DIR / "Artifacts"

# Create artifact subdirectories
EMBEDDINGS_DIR = ARTIFACTS_DIR / "embeddings"
ALIGNED_FACES_DIR = ARTIFACTS_DIR / "aligned_faces"
DETECTION_RESULTS_DIR = ARTIFACTS_DIR / "detection_results"
LOGS_DIR = ARTIFACTS_DIR / "logs"

# Create directories
for dir_path in [ARTIFACTS_DIR, EMBEDDINGS_DIR, ALIGNED_FACES_DIR, DETECTION_RESULTS_DIR, LOGS_DIR]:
    dir_path.mkdir(parents=True, exist_ok=True)

# YOLOv8-Face model path
YOLO_MODEL_PATH = BASE_DIR / "yolov8n-face.pt"

# Recognition threshold (angular distance in radians)
RECOGNITION_THRESHOLD = 0.5  # Will be calibrated during training
UNKNOWN_THRESHOLD = 0.6  # Threshold for unknown users

print("‚úÖ Configuration loaded")
print(f"   Dataset: {DATASET_DIR}")
print(f"   Artifacts: {ARTIFACTS_DIR}")
print(f"   YOLO Model: {YOLO_MODEL_PATH}")

# Test: Verify paths exist
assert DATASET_DIR.exists(), f"Dataset directory not found: {DATASET_DIR}"
assert YOLO_MODEL_PATH.exists(), f"YOLO model not found: {YOLO_MODEL_PATH}"
assert ARTIFACTS_DIR.exists(), f"Artifacts directory not created: {ARTIFACTS_DIR}"
print("‚úÖ Path verification passed")


‚úÖ Configuration loaded
   Dataset: /mnt/NewDisk/sahil_project/FRS copy/Small_Dataset
   Artifacts: /mnt/NewDisk/sahil_project/FRS copy/Artifacts
   YOLO Model: /mnt/NewDisk/sahil_project/FRS copy/yolov8n-face.pt
‚úÖ Path verification passed


## Cell 3: Initialize YOLOv8-Face Detector

Initialize the YOLOv8-Face model for face detection. YOLOv8-Face provides fast and accurate face bounding box detection.


In [47]:
# Initialize YOLOv8-Face detector
print("üîÑ Loading YOLOv8-Face detector...")
face_detector = YOLO(str(YOLO_MODEL_PATH))
print("‚úÖ YOLOv8-Face detector initialized")

# Test: Verify detector can be called
test_img = np.zeros((640, 640, 3), dtype=np.uint8)
test_results = face_detector(test_img, verbose=False)
assert test_results is not None, "YOLO detector failed to process test image"
print("‚úÖ YOLOv8-Face detector test passed")


üîÑ Loading YOLOv8-Face detector...
‚úÖ YOLOv8-Face detector initialized
‚úÖ YOLOv8-Face detector test passed


## Cell 4: Initialize InsightFace Models

Initialize InsightFace with buffalo_l model which includes:
- **RetinaFace detector** (used for landmark extraction only)
- **5-point facial landmarks** (eyes, nose, mouth corners)
- **ArcFace r100 embedder** (512-dimensional face embeddings)


In [48]:
# Initialize InsightFace FaceAnalysis model (buffalo_l)
print("üîÑ Loading InsightFace buffalo_l model...")
print("   (This may take a minute - downloading models on first run)")

face_model = insightface.app.FaceAnalysis(name="buffalo_l", providers=['CPUExecutionProvider'])
face_model.prepare(ctx_id=0, det_size=(640, 640))

print("‚úÖ InsightFace model initialized")
print(f"   Models: Detection, Landmarks (106 points), ArcFace Embedding")

# Test: Verify InsightFace can process an image
test_img = np.zeros((640, 640, 3), dtype=np.uint8)
test_faces = face_model.get(test_img)
assert isinstance(test_faces, list), "InsightFace failed to process test image"
print("‚úÖ InsightFace model test passed")


üîÑ Loading InsightFace buffalo_l model...
   (This may take a minute - downloading models on first run)
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/matrix/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/matrix/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/matrix/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: /home/matrix/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}}
find model: 

## Cell 5: Helper Functions

Define core helper functions for:
1. **Loading images** from file paths
2. **YOLOv8 face detection** with confidence thresholding
3. **Extracting bounding boxes** from YOLO results
4. **Landmark detection** using InsightFace on cropped faces
5. **Face alignment** using 5-point landmarks and norm_crop
6. **ArcFace embedding extraction** from aligned faces


In [49]:
def load_image(image_path):
    """
    Load image from file path and convert to BGR format (OpenCV standard)
    
    Args:
        image_path: Path to image file
    
    Returns:
        numpy array: Image in BGR format, or None if loading fails
    """
    try:
        img = cv2.imread(str(image_path))
        if img is None:
            return None
        return img
    except Exception as e:
        print(f"Error loading image {image_path}: {e}")
        return None


def detect_faces_yolo(img, conf_threshold=0.4):
    """
    Detect faces using YOLOv8-Face detector
    
    Args:
        img: Image in BGR format (numpy array)
        conf_threshold: Confidence threshold for detections
    
    Returns:
        List of detection results, each containing bbox and confidence
    """
    results = face_detector(img, conf=conf_threshold, verbose=False)
    
    detections = []
    for result in results:
        boxes = result.boxes
        for box in boxes:
            bbox = box.xyxy[0].cpu().numpy().astype(int)  # [x1, y1, x2, y2]
            confidence = float(box.conf.item())
            detections.append({
                'bbox': bbox,
                'confidence': confidence
            })
    
    return detections


def extract_landmarks_insightface(img, bbox, return_embedding=False):
    """
    Extract 5-point facial landmarks from a detected face region
    
    Args:
        img: Full image in BGR format
        bbox: Bounding box [x1, y1, x2, y2]
        return_embedding: If True, also return the ArcFace embedding
    
    Returns:
        numpy array: 5 keypoints [[x, y], ...] or None if not found
        If return_embedding=True, returns (landmarks_5, embedding) tuple
    """
    x1, y1, x2, y2 = bbox
    
    # Crop face region with padding
    padding = 0.2
    h, w = img.shape[:2]
    width = x2 - x1
    height = y2 - y1
    
    crop_x1 = max(0, int(x1 - width * padding))
    crop_y1 = max(0, int(y1 - height * padding))
    crop_x2 = min(w, int(x2 + width * padding))
    crop_y2 = min(h, int(y2 + height * padding))
    
    face_crop = img[crop_y1:crop_y2, crop_x1:crop_x2]
    
    if face_crop.size == 0:
        return (None, None) if return_embedding else None
    
    # Get landmarks from InsightFace
    faces = face_model.get(face_crop)
    
    if len(faces) == 0:
        return (None, None) if return_embedding else None
    
    # Get the best face in the crop
    best_face = max(faces, key=lambda f: f.det_score)
    
    # Extract 5-point landmarks (kps attribute)
    landmarks_5 = best_face.kps.copy()  # Shape: (5, 2)
    
    # Adjust landmarks to original image coordinates
    landmarks_5[:, 0] += crop_x1
    landmarks_5[:, 1] += crop_y1
    
    if return_embedding:
        # Extract embedding from the detected face
        embedding = best_face.embedding.copy()  # 512-dimensional vector
        # Normalize embedding (L2 normalization)
        norm = np.linalg.norm(embedding)
        if norm > 0:
            embedding = embedding / norm
        else:
            embedding = None
        return landmarks_5, embedding
    
    return landmarks_5


def align_face(img, landmarks_5, output_size=112):
    """
    Align face using 5-point landmarks with norm_crop from InsightFace
    
    Args:
        img: Image in BGR format
        landmarks_5: 5 keypoints [[x, y], ...]
        output_size: Size of aligned face (integer, e.g., 112 for 112x112 square)
    
    Returns:
        numpy array: Aligned face in RGB format (112x112), or None if alignment fails
    """
    if landmarks_5 is None:
        return None
    
    try:
        # Crop and align using InsightFace's norm_crop
        # norm_crop expects image_size as an integer (for square output), not a tuple
        if isinstance(output_size, tuple):
            output_size = output_size[0]  # Use first value if tuple provided
        
        aligned = norm_crop(img, landmarks_5, image_size=output_size)
        return aligned  # norm_crop returns RGB format
    except Exception as e:
        print(f"Error in face alignment: {e}")
        return None


def extract_arcface_embedding(aligned_face_rgb):
    """
    Extract 512-dimensional ArcFace embedding from aligned face
    
    Why extraction from aligned faces sometimes fails:
    - InsightFace's face_model.get() first runs face detection
    - On an already-aligned 112x112 face, the detector may fail because:
      1. The face lacks surrounding context the detector expects
      2. The aligned face might not meet the detector's confidence threshold
      3. The detector is trained on full images, not pre-aligned faces
    
    Solution: Use the embedding extracted during landmark detection (already available)
    
    Args:
        aligned_face_rgb: Aligned face image in RGB format (112x112)
    
    Returns:
        numpy array: 512-dimensional embedding vector, or None if extraction fails
    """
    try:
        # Convert RGB to BGR for InsightFace
        aligned_bgr = cv2.cvtColor(aligned_face_rgb, cv2.COLOR_RGB2BGR)
        
        # Try using FaceAnalysis.get() on aligned face
        # This often fails because the detector expects a full image with context
        faces = face_model.get(aligned_bgr)
        
        if len(faces) > 0:
            # Success - detector found the face in aligned image
            best_face = max(faces, key=lambda f: f.det_score)
            embedding = best_face.embedding
        else:
            # Detection failed - this is expected for aligned faces
            # The embedding from the original crop (extracted during landmark detection)
            # is actually more reliable and will be used as fallback
            return None
        
        # Normalize embedding (L2 normalization)
        embedding = embedding.astype(np.float32)
        norm = np.linalg.norm(embedding)
        if norm > 0:
            embedding = embedding / norm
        else:
            return None
        
        return embedding
    except Exception as e:
        # Silently fail - fallback embedding will be used
        return None


print("‚úÖ All helper functions defined")
print("   - load_image()")
print("   - detect_faces_yolo()")
print("   - extract_landmarks_insightface()")
print("   - align_face()")
print("   - extract_arcface_embedding()")

# Test: Verify functions are callable
test_img = np.random.randint(0, 255, (640, 480, 3), dtype=np.uint8)
test_detections = detect_faces_yolo(test_img, conf_threshold=0.1)
assert isinstance(test_detections, list), "detect_faces_yolo() failed"
print("‚úÖ Helper functions test passed")


‚úÖ All helper functions defined
   - load_image()
   - detect_faces_yolo()
   - extract_landmarks_insightface()
   - align_face()
   - extract_arcface_embedding()
‚úÖ Helper functions test passed


## Cell 6: Test Detection + Alignment + Embedding Pipeline

Test the complete pipeline on a single sample image to verify all components work correctly together. This helps catch errors early.


## Cell 7: Dataset Loader - Process All Person Folders

Iterate through all person folders in the dataset, detect faces, align them, extract embeddings, and save:
- Embeddings per person (`.npy` files in `Artifacts/embeddings/`)
- Aligned face crops (optional, saved in `Artifacts/aligned_faces/`)
- Detection logs (saved in `Artifacts/logs/`)


In [50]:
# Find a test image from the dataset
test_image_path = None
for person_dir in DATASET_DIR.iterdir():
    if person_dir.is_dir():
        image_files = list(person_dir.glob("*.jpg")) + list(person_dir.glob("*.png")) + list(person_dir.glob("*.jpeg"))
        if image_files:
            test_image_path = "/mnt/NewDisk/sahil_project/FRS copy/Small_Dataset/6003196229/6003196229_A08A0HEU0S.png"
            break

if test_image_path is None:
    print("‚ö†Ô∏è No test image found in dataset. Creating a synthetic test...")
    test_img = np.random.randint(0, 255, (640, 480, 3), dtype=np.uint8)
else:
    print(f"üß™ Testing pipeline on: {test_image_path}")
    test_img = load_image(test_image_path)

if test_img is None:
    raise ValueError("Could not load test image")

# Step 1: Detect faces
print("Step 1: Detecting faces with YOLOv8...")
detections = detect_faces_yolo(test_img, conf_threshold=0.3)
print(f"   Found {len(detections)} face(s)")

if len(detections) == 0:
    print("‚ö†Ô∏è No faces detected in test image. Pipeline test skipped.")
else:
    # Step 2: Extract landmarks AND embedding together (more reliable)
    print("Step 2: Extracting landmarks and embedding...")
    bbox = detections[0]['bbox']
    landmarks, embedding_from_crop = extract_landmarks_insightface(test_img, bbox, return_embedding=True)
    print(f"   Landmarks extracted: {landmarks is not None}")
    print(f"   Embedding extracted from crop: {embedding_from_crop is not None}")
    
    if landmarks is not None:
        # Step 3: Align face
        print("Step 3: Aligning face...")
        aligned_face = align_face(test_img, landmarks)
        print(f"   Face aligned: {aligned_face is not None}, Shape: {aligned_face.shape if aligned_face is not None else None}")
        
        if aligned_face is not None:
            # Step 4: Try extracting embedding from aligned face (preferred for accuracy)
            print("Step 4: Extracting ArcFace embedding from aligned face...")
            embedding_from_aligned = extract_arcface_embedding(aligned_face)
            
            # Use embedding from aligned face if successful, otherwise use the one from crop
            if embedding_from_aligned is not None:
                embedding = embedding_from_aligned
                print(f"   ‚úÖ Using embedding from aligned face")
            elif embedding_from_crop is not None:
                embedding = embedding_from_crop
                print(f"   ‚ö†Ô∏è Using embedding from original crop (aligned extraction failed)")
            else:
                embedding = None
            
            print(f"   Embedding extracted: {embedding is not None}, Shape: {embedding.shape if embedding is not None else None}")
            
            # Verify embedding dimensions
            if embedding is not None:
                assert embedding.shape == (512,), f"Expected embedding shape (512,), got {embedding.shape}"
                assert np.isclose(np.linalg.norm(embedding), 1.0), "Embedding should be L2 normalized"
                print("‚úÖ Pipeline test PASSED - All components working correctly!")
            else:
                print("‚ùå Pipeline test FAILED - Embedding extraction failed")
        else:
            print("‚ùå Pipeline test FAILED - Face alignment failed")
    else:
        print("‚ùå Pipeline test FAILED - Landmark extraction failed")


üß™ Testing pipeline on: /mnt/NewDisk/sahil_project/FRS copy/Small_Dataset/6003196229/6003196229_A08A0HEU0S.png
Step 1: Detecting faces with YOLOv8...
   Found 1 face(s)
Step 2: Extracting landmarks and embedding...
   Landmarks extracted: True
   Embedding extracted from crop: True
Step 3: Aligning face...
   Face aligned: True, Shape: (112, 112, 3)
Step 4: Extracting ArcFace embedding from aligned face...
   ‚ö†Ô∏è Using embedding from original crop (aligned extraction failed)
   Embedding extracted: True, Shape: (512,)
‚úÖ Pipeline test PASSED - All components working correctly!


In [52]:
# Dataset processing: Extract embeddings for all persons
print("üîÑ Processing dataset...")
print(f"   Dataset directory: {DATASET_DIR}")

# Statistics
total_persons = 0
total_images = 0
successful_embeddings = 0
failed_images = []

# Get all person directories
person_dirs = [d for d in DATASET_DIR.iterdir() if d.is_dir()]
person_dirs.sort()

print(f"   Found {len(person_dirs)} person(s)")

# Process each person
for person_dir in tqdm(person_dirs, desc="Processing persons"):
    person_name = person_dir.name
    total_persons += 1
    
    # Create embeddings directory for this person
    person_emb_dir = EMBEDDINGS_DIR / person_name
    person_emb_dir.mkdir(exist_ok=True)
    
    # Get all images for this person
    image_extensions = ['*.jpg', '*.jpeg', '*.png', '*.JPG', '*.JPEG', '*.PNG']
    image_files = []
    for ext in image_extensions:
        image_files.extend(person_dir.glob(ext))
    
    person_embeddings = []
    
    # Process each image
    for img_path in image_files:
        total_images += 1
        
        # Load image
        img = load_image(img_path)
        if img is None:
            failed_images.append((person_name, img_path.name, "Failed to load image"))
            continue
        
        # Detect faces
        detections = detect_faces_yolo(img, conf_threshold=0.4)
        
        if len(detections) == 0:
            failed_images.append((person_name, img_path.name, "No faces detected"))
            continue
        
        # Process the most confident detection
        best_detection = max(detections, key=lambda x: x['confidence'])
        bbox = best_detection['bbox']
        
        # Extract landmarks AND embedding together (more reliable)
        try:
            result = extract_landmarks_insightface(img, bbox, return_embedding=True)
            if result is None:
                failed_images.append((person_name, img_path.name, "Landmark extraction failed"))
                continue
            
            landmarks, embedding_from_crop = result
            if landmarks is None:
                failed_images.append((person_name, img_path.name, "Landmark extraction failed"))
                continue
        except Exception as e:
            failed_images.append((person_name, img_path.name, f"Landmark extraction error: {str(e)}"))
            continue
        
        # Align face
        try:
            aligned_face = align_face(img, landmarks)
            if aligned_face is None:
                failed_images.append((person_name, img_path.name, "Face alignment failed"))
                continue
        except Exception as e:
            failed_images.append((person_name, img_path.name, f"Face alignment error: {str(e)}"))
            continue
        
        # Try extracting embedding from aligned face (preferred for accuracy)
        embedding_from_aligned = None
        try:
            embedding_from_aligned = extract_arcface_embedding(aligned_face)
        except Exception as e:
            # Silently fail - will use fallback embedding from crop
            pass
        
        # Use embedding from aligned face if successful, otherwise use the one from crop
        if embedding_from_aligned is not None:
            embedding = embedding_from_aligned
        elif embedding_from_crop is not None:
            embedding = embedding_from_crop
        else:
            failed_images.append((person_name, img_path.name, "Embedding extraction failed"))
            continue
        
        # Verify embedding is valid
        if embedding is None or embedding.shape != (512,):
            failed_images.append((person_name, img_path.name, f"Invalid embedding shape: {embedding.shape if embedding is not None else None}"))
            continue
        
        # Save embedding
        embedding_filename = f"{img_path.stem}.npy"
        embedding_path = person_emb_dir / embedding_filename
        np.save(embedding_path, embedding)
        
        # Optionally save aligned face (comment out if not needed to save space)
        # aligned_path = ALIGNED_FACES_DIR / person_name / f"{img_path.stem}.jpg"
        # aligned_path.parent.mkdir(exist_ok=True)
        # cv2.imwrite(str(aligned_path), cv2.cvtColor(aligned_face, cv2.COLOR_RGB2BGR))
        
        person_embeddings.append({
            'person': person_name,
            'image': img_path.name,
            'embedding_path': str(embedding_path),
            'embedding': embedding
        })
        
        successful_embeddings += 1
    
    # Save metadata for this person
    if person_embeddings:
        person_metadata = {
            'person_name': person_name,
            'num_images': len(person_embeddings),
            'embedding_files': [e['embedding_path'] for e in person_embeddings]
        }
        metadata_path = person_emb_dir / "metadata.json"
        with open(metadata_path, 'w') as f:
            json.dump(person_metadata, f, indent=2)

# Save processing log
log_data = {
    'total_persons': total_persons,
    'total_images': total_images,
    'successful_embeddings': successful_embeddings,
    'failed_images': failed_images,
    'success_rate': successful_embeddings / total_images if total_images > 0 else 0
}

log_path = LOGS_DIR / "dataset_processing_log.json"
with open(log_path, 'w') as f:
    json.dump(log_data, f, indent=2)

print("\n‚úÖ Dataset processing completed!")
print(f"   Total persons: {total_persons}")
print(f"   Total images: {total_images}")
print(f"   Successful embeddings: {successful_embeddings}")
print(f"   Failed images: {len(failed_images)}")
print(f"   Success rate: {log_data['success_rate']:.2%}")

if failed_images:
    print(f"\n‚ö†Ô∏è Failed images saved to: {log_path}")

# Test: Verify embeddings were created
assert successful_embeddings > 0, "No embeddings were successfully created"
assert (EMBEDDINGS_DIR).exists(), "Embeddings directory not found"
print("‚úÖ Dataset processing test passed")


üîÑ Processing dataset...
   Dataset directory: /mnt/NewDisk/sahil_project/FRS copy/Small_Dataset
   Found 104 person(s)


Processing persons: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 104/104 [28:52<00:00, 16.66s/it]


‚úÖ Dataset processing completed!
   Total persons: 104
   Total images: 4116
   Successful embeddings: 2268
   Failed images: 1848
   Success rate: 55.10%

‚ö†Ô∏è Failed images saved to: /mnt/NewDisk/sahil_project/FRS copy/Artifacts/logs/dataset_processing_log.json
‚úÖ Dataset processing test passed





## Cell 8: Combine All Embeddings into Master Database

Load all embeddings from individual `.npy` files and combine them into:
- **X**: Master embedding matrix (N samples √ó 512 dimensions)
- **y**: Label vector (N samples)
- **label_encoder**: Maps person names to integer labels

Save the unified database to the Artifacts folder.


In [None]:
# Combine all embeddings into master database
print("üîÑ Combining all embeddings into master database...")

all_embeddings = []
all_labels = []
person_to_index = {}
index_to_person = {}
skipped_persons = []

# Load embeddings from all person directories
person_dirs = [d for d in EMBEDDINGS_DIR.iterdir() if d.is_dir()]
person_dirs.sort()

for person_dir in person_dirs:
    person_name = person_dir.name
    
    # Load all embeddings for this person
    embedding_files = list(person_dir.glob("*.npy"))
    
    # Only process persons that have actual embedding files
    if len(embedding_files) == 0:
        skipped_persons.append(person_name)
        continue  # Skip persons with no embeddings
    
    # Add person to mapping only if they have embeddings
    if person_name not in person_to_index:
        idx = len(person_to_index)
        person_to_index[person_name] = idx
        index_to_person[idx] = person_name
    
    for emb_file in embedding_files:
        embedding = np.load(emb_file)
        all_embeddings.append(embedding)
        all_labels.append(person_name)

# Warn about skipped persons if any
if skipped_persons:
    print(f"   ‚ö†Ô∏è Skipped {len(skipped_persons)} person(s) with no embeddings: {', '.join(skipped_persons[:5])}{'...' if len(skipped_persons) > 5 else ''}")

# Convert to numpy arrays
X = np.array(all_embeddings)  # Shape: (N, 512)
y = np.array(all_labels)      # Shape: (N,)

print(f"   Total embeddings: {X.shape[0]}")
print(f"   Embedding dimension: {X.shape[1]}")
print(f"   Number of unique persons: {len(person_to_index)}")

# Encode labels to integers
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

# Save master database
database_path = ARTIFACTS_DIR / "embedding_database.npy"
labels_path = ARTIFACTS_DIR / "labels.npy"
label_encoder_path = ARTIFACTS_DIR / "label_encoder.pkl"
person_mapping_path = ARTIFACTS_DIR / "person_mapping.json"

np.save(database_path, X)
np.save(labels_path, y_encoded)

with open(label_encoder_path, 'wb') as f:
    pickle.dump(label_encoder, f)

with open(person_mapping_path, 'w') as f:
    json.dump({
        'person_to_index': person_to_index,
        'index_to_person': index_to_person
    }, f, indent=2)

print("‚úÖ Master database created and saved")
print(f"   Database: {database_path}")
print(f"   Labels: {labels_path}")
print(f"   Label encoder: {label_encoder_path}")
print(f"   Person mapping: {person_mapping_path}")

# Test: Verify database structure
assert X.shape[1] == 512, f"Expected embedding dimension 512, got {X.shape[1]}"
assert len(y_encoded) == X.shape[0], "Mismatch between embeddings and labels"
unique_labels_in_y = len(set(y))
unique_persons_in_mapping = len(person_to_index)
assert unique_labels_in_y == unique_persons_in_mapping, \
    f"Person mapping mismatch: {unique_labels_in_y} unique labels in y, but {unique_persons_in_mapping} persons in mapping. " \
    f"This may occur if some person directories have no embedding files."
print("‚úÖ Database combination test passed")


üîÑ Combining all embeddings into master database...


NameError: name 'EMBEDDINGS_DIR' is not defined

## Cell 9: Train Classifier (SVM vs KNN)

Compare SVM-RBF and KNN classifiers using cross-validation, select the best one, train it on the full dataset, and save the model. The classifier will be used for fast face recognition.


In [54]:
# Train classifier: Compare SVM-RBF vs KNN
print("üîÑ Training classifier...")
print(f"   Dataset size: {X.shape[0]} samples, {len(np.unique(y_encoded))} classes")

# Evaluate SVM-RBF
print("\nüìä Evaluating SVM-RBF classifier...")
svm_classifier = SVC(kernel='rbf', probability=True, random_state=42)
svm_scores = cross_val_score(svm_classifier, X, y_encoded, cv=min(5, len(np.unique(y_encoded))), scoring='accuracy')
svm_mean_score = svm_scores.mean()
svm_std_score = svm_scores.std()
print(f"   SVM-RBF: {svm_mean_score:.4f} (+/- {svm_std_score*2:.4f})")

# Evaluate KNN
print("\nüìä Evaluating KNN classifier...")
knn_classifier = KNeighborsClassifier(n_neighbors=min(5, len(np.unique(y_encoded))-1))
knn_scores = cross_val_score(knn_classifier, X, y_encoded, cv=min(5, len(np.unique(y_encoded))), scoring='accuracy')
knn_mean_score = knn_scores.mean()
knn_std_score = knn_scores.std()
print(f"   KNN: {knn_mean_score:.4f} (+/- {knn_std_score*2:.4f})")

# Select best classifier
if svm_mean_score >= knn_mean_score:
    print("\n‚úÖ Selected: SVM-RBF (better accuracy)")
    best_classifier = svm_classifier
    classifier_type = "SVM-RBF"
    best_score = svm_mean_score
else:
    print("\n‚úÖ Selected: KNN (better accuracy)")
    best_classifier = knn_classifier
    classifier_type = "KNN"
    best_score = knn_mean_score

# Train on full dataset
print(f"\nüîÑ Training {classifier_type} on full dataset...")
best_classifier.fit(X, y_encoded)
print("‚úÖ Classifier trained")

# Save classifier
classifier_path = ARTIFACTS_DIR / "face_classifier.pkl"
with open(classifier_path, 'wb') as f:
    pickle.dump(best_classifier, f)

# Save classifier metadata
classifier_metadata = {
    'classifier_type': classifier_type,
    'cv_score_mean': float(best_score),
    'num_classes': len(np.unique(y_encoded)),
    'num_samples': X.shape[0]
}

classifier_metadata_path = ARTIFACTS_DIR / "classifier_metadata.json"
with open(classifier_metadata_path, 'w') as f:
    json.dump(classifier_metadata, f, indent=2)

print(f"   Saved to: {classifier_path}")
print(f"   Metadata: {classifier_metadata_path}")

# Test: Verify classifier can make predictions
test_pred = best_classifier.predict(X[:5])
test_proba = best_classifier.predict_proba(X[:5])
assert len(test_pred) == 5, "Classifier prediction failed"
assert test_proba.shape[1] == len(np.unique(y_encoded)), "Classifier probability shape mismatch"
print("‚úÖ Classifier training test passed")


üîÑ Training classifier...
   Dataset size: 2268 samples, 104 classes

üìä Evaluating SVM-RBF classifier...
   SVM-RBF: 0.9960 (+/- 0.0033)

üìä Evaluating KNN classifier...
   KNN: 0.9969 (+/- 0.0045)

‚úÖ Selected: KNN (better accuracy)

üîÑ Training KNN on full dataset...
‚úÖ Classifier trained
   Saved to: /mnt/NewDisk/sahil_project/FRS copy/Artifacts/face_classifier.pkl
   Metadata: /mnt/NewDisk/sahil_project/FRS copy/Artifacts/classifier_metadata.json
‚úÖ Classifier training test passed


## Cell 10: Calibrate Recognition Threshold

Calculate the optimal threshold for distinguishing known vs unknown faces. This uses angular distance (cosine similarity) between embeddings to determine if a face belongs to a known person or is unknown.


In [55]:
# Calibrate recognition threshold using angular distance
print("üîÑ Calibrating recognition threshold...")

def angular_distance(embedding1, embedding2):
    """
    Compute angular distance between two normalized embeddings
    Angular distance = arccos(cosine_similarity)
    
    Args:
        embedding1, embedding2: Normalized embedding vectors
    
    Returns:
        float: Angular distance in radians (0 to œÄ)
    """
    cosine_sim = np.dot(embedding1, embedding2)
    cosine_sim = np.clip(cosine_sim, -1.0, 1.0)  # Ensure valid range for arccos
    angular_dist = np.arccos(cosine_sim)
    return angular_dist

# Calculate intra-class distances (same person)
intra_class_distances = []
for person_name in person_to_index.keys():
    person_indices = np.where(y == person_name)[0]
    person_embeddings = X[person_indices]
    
    if len(person_embeddings) > 1:
        # Compute pairwise distances within this person
        for i in range(len(person_embeddings)):
            for j in range(i+1, len(person_embeddings)):
                dist = angular_distance(person_embeddings[i], person_embeddings[j])
                intra_class_distances.append(dist)

# Calculate inter-class distances (different persons)
inter_class_distances = []
unique_labels = np.unique(y_encoded)
for i in range(len(unique_labels)):
    for j in range(i+1, len(unique_labels)):
        label_i = unique_labels[i]
        label_j = unique_labels[j]
        emb_i = X[y_encoded == label_i]
        emb_j = X[y_encoded == label_j]
        
        # Sample some pairs to avoid O(n¬≤) computation
        sample_size = min(50, len(emb_i) * len(emb_j))
        for _ in range(sample_size):
            idx_i = np.random.randint(len(emb_i))
            idx_j = np.random.randint(len(emb_j))
            dist = angular_distance(emb_i[idx_i], emb_j[idx_j])
            inter_class_distances.append(dist)

if intra_class_distances and inter_class_distances:
    intra_mean = np.mean(intra_class_distances)
    intra_std = np.std(intra_class_distances)
    inter_mean = np.mean(inter_class_distances)
    inter_std = np.std(inter_class_distances)
    
    # Set threshold as mean intra-class + 2*std (conservative)
    calibrated_threshold = intra_mean + 2 * intra_std
    
    # Ensure threshold is reasonable (between intra and inter means)
    calibrated_threshold = min(calibrated_threshold, (intra_mean + inter_mean) / 2)
    
    print(f"   Intra-class distance: {intra_mean:.4f} ¬± {intra_std:.4f}")
    print(f"   Inter-class distance: {inter_mean:.4f} ¬± {inter_std:.4f}")
    print(f"   Calibrated threshold: {calibrated_threshold:.4f}")
    
    RECOGNITION_THRESHOLD = float(calibrated_threshold)
    UNKNOWN_THRESHOLD = float(calibrated_threshold * 1.2)  # Slightly higher for unknown detection
    
    # Save threshold
    threshold_data = {
        'recognition_threshold': RECOGNITION_THRESHOLD,
        'unknown_threshold': UNKNOWN_THRESHOLD,
        'intra_class_mean': float(intra_mean),
        'intra_class_std': float(intra_std),
        'inter_class_mean': float(inter_mean),
        'inter_class_std': float(inter_std)
    }
    
    threshold_path = ARTIFACTS_DIR / "recognition_thresholds.json"
    with open(threshold_path, 'w') as f:
        json.dump(threshold_data, f, indent=2)
    
    print(f"‚úÖ Threshold calibrated and saved to: {threshold_path}")
else:
    print("‚ö†Ô∏è Could not calibrate threshold (insufficient data). Using default values.")
    RECOGNITION_THRESHOLD = 0.5
    UNKNOWN_THRESHOLD = 0.6

# Test: Verify threshold values
assert 0 < RECOGNITION_THRESHOLD < np.pi, "Invalid threshold range"
print("‚úÖ Threshold calibration test passed")


üîÑ Calibrating recognition threshold...
   Intra-class distance: 0.5865 ¬± 0.2679
   Inter-class distance: 1.5095 ¬± 0.0742
   Calibrated threshold: 1.0480
‚úÖ Threshold calibrated and saved to: /mnt/NewDisk/sahil_project/FRS copy/Artifacts/recognition_thresholds.json
‚úÖ Threshold calibration test passed


## Cell 11: Recognition Function

Implement the complete face recognition pipeline:
1. YOLOv8 face detection
2. InsightFace landmark extraction and alignment
3. ArcFace embedding extraction
4. Angular distance computation with database
5. Threshold-based classification (known person or unknown)


In [60]:
def recognize_face(image_path_or_array, return_details=False):
    """
    Complete face recognition pipeline
    
    Args:
        image_path_or_array: Image path (str) or numpy array (BGR format)
        return_details: If True, return detailed information
    
    Returns:
        dict: Recognition results with identity, confidence, distances, etc.
    """
    # Load image
    if isinstance(image_path_or_array, str):
        img = load_image(image_path_or_array)
    else:
        img = image_path_or_array.copy()
        if len(img.shape) == 3:
            img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    
    if img is None:
        return {
            'identity': 'unknown', 
            'confidence': 0.0, 
            'angular_distance': None,
            'detection_confidence': 0.0,
            'bbox': None,
            'is_unknown': True,
            'error': 'Failed to load image'
        }
    
    # Step 1: Detect faces
    detections = detect_faces_yolo(img, conf_threshold=0.4)
    
    if len(detections) == 0:
        return {
            'identity': 'unknown', 
            'confidence': 0.0, 
            'angular_distance': None,
            'detection_confidence': 0.0,
            'bbox': None,
            'is_unknown': True,
            'error': 'No faces detected'
        }
    
    # Process the most confident detection
    best_detection = max(detections, key=lambda x: x['confidence'])
    bbox = best_detection['bbox']
    detection_confidence = best_detection['confidence']
    
    # Step 2: Extract landmarks
    landmarks = extract_landmarks_insightface(img, bbox)
    if landmarks is None:
        return {
            'identity': 'unknown', 
            'confidence': 0.0, 
            'angular_distance': None,
            'detection_confidence': float(detection_confidence),
            'bbox': bbox.tolist(),
            'is_unknown': True,
            'error': 'Landmark extraction failed'
        }
    
    # Step 3: Align face
    aligned_face = align_face(img, landmarks)
    if aligned_face is None:
        return {
            'identity': 'unknown', 
            'confidence': 0.0, 
            'angular_distance': None,
            'detection_confidence': float(detection_confidence),
            'bbox': bbox.tolist(),
            'is_unknown': True,
            'error': 'Face alignment failed'
        }
    
    # Step 4: Extract embedding (try with fallback)
    embedding = None
    # Try aligned face first
    embedding = extract_arcface_embedding(aligned_face)
    if embedding is None:
        # Fallback: Extract embedding during landmark extraction
        try:
            _, embedding = extract_landmarks_insightface(img, bbox, return_embedding=True)
        except:
            pass
    
    if embedding is None:
        return {
            'identity': 'unknown', 
            'confidence': 0.0, 
            'angular_distance': None,
            'detection_confidence': float(detection_confidence),
            'bbox': bbox.tolist(),
            'is_unknown': True,
            'error': 'Embedding extraction failed'
        }
    
    # Step 5: Compute angular distances to all database embeddings
    angular_distances = []
    for db_embedding in X:
        dist = angular_distance(embedding, db_embedding)
        angular_distances.append(dist)
    
    angular_distances = np.array(angular_distances)
    
    # Step 6: Find closest match
    min_distance_idx = np.argmin(angular_distances)
    min_distance = angular_distances[min_distance_idx]
    
    # Step 7: Classify using threshold
    if min_distance <= RECOGNITION_THRESHOLD:
        # Known person
        predicted_label_idx = y_encoded[min_distance_idx]
        predicted_person = label_encoder.inverse_transform([predicted_label_idx])[0]
        
        # Also use classifier for verification
        classifier_pred_idx = best_classifier.predict([embedding])[0]
        classifier_pred = label_encoder.inverse_transform([classifier_pred_idx])[0]
        classifier_proba = best_classifier.predict_proba([embedding])[0][classifier_pred_idx]
        
        # Use classifier prediction if it has high confidence, otherwise use distance-based
        if classifier_proba > 0.7 and classifier_pred == predicted_person:
            identity = classifier_pred
            confidence = float(classifier_proba)
        else:
            identity = predicted_person
            confidence = float(1.0 - (min_distance / RECOGNITION_THRESHOLD))  # Normalize to [0, 1]
        
        result = {
            'identity': identity,
            'confidence': confidence,
            'angular_distance': float(min_distance),
            'detection_confidence': float(detection_confidence),
            'bbox': bbox.tolist(),
            'is_unknown': False
        }
    else:
        # Unknown person
        result = {
            'identity': 'unknown',
            'confidence': 0.0,
            'angular_distance': float(min_distance),
            'detection_confidence': float(detection_confidence),
            'bbox': bbox.tolist(),
            'is_unknown': True
        }
    
    if return_details:
        result['embedding'] = embedding
        result['aligned_face'] = aligned_face
        result['landmarks'] = landmarks.tolist()
        result['all_distances'] = angular_distances.tolist()
    
    return result

print("‚úÖ Recognition function defined")

# Test: Verify function can be called
test_result = recognize_face(test_img if 'test_img' in locals() else np.zeros((640, 480, 3), dtype=np.uint8))
assert 'identity' in test_result, "Recognition function failed"
print("‚úÖ Recognition function test passed")


‚úÖ Recognition function defined
‚úÖ Recognition function test passed


## Cell 12: Test Recognition on Dataset Images

Load a random image from the dataset and verify the complete recognition pipeline works correctly end-to-end.


In [61]:
# Test recognition on a random image from dataset
print("üß™ Testing recognition on dataset image...")

# Verify recognize_face function is available and has correct signature
if 'recognize_face' not in globals():
    raise NameError("recognize_face function not defined. Please run Cell 11 (Recognition Function) first.")

# Find a random image from dataset
test_image_path = None
test_person_name = None

for person_dir in DATASET_DIR.iterdir():
    if person_dir.is_dir():
        image_files = list(person_dir.glob("*.jpg")) + list(person_dir.glob("*.png"))
        if image_files:
            import random
            test_image_path = random.choice(image_files)
            test_person_name = person_dir.name
            break

if test_image_path is None:
    print("‚ö†Ô∏è No test image found. Using synthetic image...")
    test_img_array = np.random.randint(0, 255, (640, 480, 3), dtype=np.uint8)
    result = recognize_face(test_img_array)
    print(f"   Result: {result['identity']} (expected: unknown for synthetic image)")
else:
    print(f"   Test image: {test_image_path.name}")
    print(f"   Expected person: {test_person_name}")
    
    # Perform recognition
    result = recognize_face(str(test_image_path), return_details=False)
    
    print(f"\nüìä Recognition Results:")
    print(f"   Identity: {result['identity']}")
    print(f"   Confidence: {result['confidence']:.4f}")
    
    # Handle angular_distance - may be None for errors
    if result.get('angular_distance') is not None:
        print(f"   Angular distance: {result['angular_distance']:.4f}")
    else:
        print(f"   Angular distance: N/A (error occurred)")
    
    print(f"   Detection confidence: {result.get('detection_confidence', 0.0):.4f}")
    print(f"   Is unknown: {result.get('is_unknown', True)}")
    
    # Show error if present
    if 'error' in result:
        print(f"   ‚ö†Ô∏è Error: {result['error']}")
    
    if result['identity'] == test_person_name:
        print(f"\n‚úÖ Recognition test PASSED - Correctly identified as {test_person_name}")
    elif result['identity'] == 'unknown':
        print(f"\n‚ö†Ô∏è Recognition test - Classified as unknown (may be due to threshold)")
    else:
        print(f"\n‚ùå Recognition test - Misidentified: expected {test_person_name}, got {result['identity']}")

# Test: Verify recognition returns valid structure
required_keys = ['identity', 'confidence', 'angular_distance', 'detection_confidence', 'bbox', 'is_unknown']
missing_keys = [key for key in required_keys if key not in result]
if missing_keys:
    print(f"\n‚ö†Ô∏è Warning: Missing keys in result: {missing_keys}")
    print(f"   Result keys: {list(result.keys())}")
else:
    print("\n‚úÖ Recognition test structure verified - all required keys present")


üß™ Testing recognition on dataset image...
   Test image: 6204010540_A08A0HG3HA_aug10.jpg
   Expected person: 6204010540

üìä Recognition Results:
   Identity: 6204010540
   Confidence: 1.0000
   Angular distance: 0.0000
   Detection confidence: 0.8478
   Is unknown: False

‚úÖ Recognition test PASSED - Correctly identified as 6204010540

‚úÖ Recognition test structure verified - all required keys present


## Cell 14: Demo - Recognize Face from Image Path

Final demonstration cell that accepts an image path, performs recognition, and prints detailed results including predicted identity, confidence scores, and angular distances.


## Cell 13: Register New Person

Function to dynamically register a new person by processing multiple images, extracting embeddings, and adding them to the database and classifier.


In [64]:
def register_new_person(person_name, image_paths, min_images=3):
    """
    Register a new person by processing multiple images and adding to database
    
    Args:
        person_name: Name/ID of the person to register
        image_paths: List of image file paths for this person
        min_images: Minimum number of successful embeddings required
    
    Returns:
        dict: Registration results
    """
    # Declare global variables at the start of the function
    global X, y, y_encoded, person_to_index, index_to_person
    
    print(f"üîÑ Registering new person: {person_name}")
    
    if person_name in person_to_index:
        return {'success': False, 'error': f'Person {person_name} already exists in database'}
    
    # Create embeddings directory for this person
    person_emb_dir = EMBEDDINGS_DIR / person_name
    person_emb_dir.mkdir(parents=True, exist_ok=True)
    
    new_embeddings = []
    successful_count = 0
    failed_count = 0
    
    # Process each image
    for img_path in image_paths:
        img_path = Path(img_path)
        if not img_path.exists():
            print(f"   ‚ö†Ô∏è Image not found: {img_path}")
            failed_count += 1
            continue
        
        # Load and process image
        img = load_image(img_path)
        if img is None:
            failed_count += 1
            continue
        
        # Detect faces
        detections = detect_faces_yolo(img, conf_threshold=0.4)
        if len(detections) == 0:
            failed_count += 1
            continue
        
        # Process best detection
        best_detection = max(detections, key=lambda x: x['confidence'])
        bbox = best_detection['bbox']
        
        # Extract landmarks AND embedding together (more reliable)
        try:
            result = extract_landmarks_insightface(img, bbox, return_embedding=True)
            if result is None:
                failed_count += 1
                continue
            
            landmarks, embedding_from_crop = result
            if landmarks is None:
                failed_count += 1
                continue
        except Exception as e:
            failed_count += 1
            continue
        
        # Align face
        try:
            aligned_face = align_face(img, landmarks)
            if aligned_face is None:
                failed_count += 1
                continue
        except Exception as e:
            failed_count += 1
            continue
        
        # Try extracting embedding from aligned face (preferred for accuracy)
        embedding_from_aligned = None
        try:
            embedding_from_aligned = extract_arcface_embedding(aligned_face)
        except Exception as e:
            # Silently fail - will use fallback embedding from crop
            pass
        
        # Use embedding from aligned face if successful, otherwise use the one from crop
        if embedding_from_aligned is not None:
            embedding = embedding_from_aligned
        elif embedding_from_crop is not None:
            embedding = embedding_from_crop
        else:
            failed_count += 1
            continue
        
        # Verify embedding is valid
        if embedding is None or embedding.shape != (512,):
            failed_count += 1
            continue
        
        # Save embedding
        embedding_filename = f"{img_path.stem}.npy"
        embedding_path = person_emb_dir / embedding_filename
        np.save(embedding_path, embedding)
        
        new_embeddings.append(embedding)
        successful_count += 1
    
    # Check if we have enough embeddings
    if successful_count < min_images:
        return {
            'success': False,
            'error': f'Insufficient successful embeddings: {successful_count}/{min_images} required',
            'successful': successful_count,
            'failed': failed_count
        }
    
    # Update global variables (already declared at function start)
    
    # Add new embeddings to database
    if len(new_embeddings) == 0:
        return {
            'success': False,
            'error': 'No embeddings extracted from provided images',
            'successful': 0,
            'failed': failed_count
        }
    
    # Ensure embeddings array has correct shape
    new_embeddings_array = np.array(new_embeddings)
    if new_embeddings_array.ndim == 1:
        # Single embedding case
        new_embeddings_array = new_embeddings_array.reshape(1, -1)
    
    # Verify embedding dimension is 512
    if new_embeddings_array.shape[1] != 512:
        return {
            'success': False,
            'error': f'Invalid embedding dimension: {new_embeddings_array.shape[1]}, expected 512',
            'successful': successful_count,
            'failed': failed_count
        }
    
    X = np.vstack([X, new_embeddings_array])
    
    # Add labels
    new_labels = [person_name] * successful_count
    y = np.hstack([y, np.array(new_labels)])
    
    # Update label encoder
    label_encoder.fit(y)
    y_encoded = label_encoder.transform(y)
    
    # Update person mappings
    if person_name not in person_to_index:
        idx = len(person_to_index)
        person_to_index[person_name] = idx
        index_to_person[idx] = person_name
    
    # Retrain classifier
    print(f"   üîÑ Retraining classifier with {X.shape[0]} total samples...")
    best_classifier.fit(X, y_encoded)
    
    # Save updated database
    database_path = ARTIFACTS_DIR / "embedding_database.npy"
    labels_path = ARTIFACTS_DIR / "labels.npy"
    label_encoder_path = ARTIFACTS_DIR / "label_encoder.pkl"
    person_mapping_path = ARTIFACTS_DIR / "person_mapping.json"
    classifier_path = ARTIFACTS_DIR / "face_classifier.pkl"
    
    np.save(database_path, X)
    np.save(labels_path, y_encoded)
    
    with open(label_encoder_path, 'wb') as f:
        pickle.dump(label_encoder, f)
    
    with open(person_mapping_path, 'w') as f:
        json.dump({
            'person_to_index': person_to_index,
            'index_to_person': index_to_person
        }, f, indent=2)
    
    with open(classifier_path, 'wb') as f:
        pickle.dump(best_classifier, f)
    
    print(f"‚úÖ Person {person_name} registered successfully!")
    print(f"   Successful embeddings: {successful_count}")
    print(f"   Failed images: {failed_count}")
    
    return {
        'success': True,
        'person_name': person_name,
        'successful': successful_count,
        'failed': failed_count,
        'total_in_database': X.shape[0]
    }

print("‚úÖ Register new person function defined")

# Test: Verify function is callable (with empty list, should return error)
test_result = register_new_person("_test_person", [], min_images=0)
assert isinstance(test_result, dict), "Registration function failed"
assert 'success' in test_result, "Test result should contain 'success' key"
# Expected: success=False due to no images provided
print("‚úÖ Registration function test passed")


‚úÖ Register new person function defined
üîÑ Registering new person: _test_person
‚úÖ Registration function test passed


In [65]:
def demo_recognize(image_path):
    """
    Demo function: Recognize face from image path and print results
    
    Args:
        image_path: Path to image file
    """
    print("=" * 60)
    print("FACE RECOGNITION DEMO")
    print("=" * 60)
    print(f"\nüì∏ Processing image: {image_path}")
    
    # Perform recognition
    result = recognize_face(str(image_path), return_details=False)
    
    print("\n" + "-" * 60)
    print("RECOGNITION RESULTS")
    print("-" * 60)
    
    print(f"üë§ Identity: {result['identity']}")
    print(f"üéØ Confidence: {result['confidence']:.4f} ({result['confidence']*100:.2f}%)")
    
    # Handle angular_distance - may be None for errors
    if result.get('angular_distance') is not None:
        print(f"üìè Angular Distance: {result['angular_distance']:.4f} radians")
        print(f"   (Threshold: {RECOGNITION_THRESHOLD:.4f})")
    else:
        print(f"üìè Angular Distance: N/A (error occurred)")
    
    print(f"üîç Detection Confidence: {result.get('detection_confidence', 0.0):.4f}")
    print(f"üì¶ Bounding Box: {result.get('bbox', None)}")
    print(f"‚ùì Is Unknown: {result.get('is_unknown', True)}")
    
    # Show error if present
    if 'error' in result:
        print(f"\n‚ö†Ô∏è Error: {result['error']}")
        return
    
    if result['is_unknown']:
        print("\n‚ö†Ô∏è  Person not recognized - classified as UNKNOWN")
    else:
        print(f"\n‚úÖ Person recognized as: {result['identity']}")
    
    print("=" * 60)
    
    return result

# Example usage (uncomment and provide image path):
# demo_recognize("/path/to/test/image.jpg")

# Or test with a dataset image:
test_image_for_demo = None
for person_dir in DATASET_DIR.iterdir():
    if person_dir.is_dir():
        image_files = list(person_dir.glob("*.jpg")) + list(person_dir.glob("*.png"))
        if image_files:
            test_image_for_demo = image_files[0]
            break

if test_image_for_demo:
    print("üß™ Running demo on sample dataset image...\n")
    demo_recognize(str(test_image_for_demo))
else:
    print("‚úÖ Demo function ready")
    print("   Usage: demo_recognize('/path/to/image.jpg')")

print("\n‚úÖ All cells completed successfully!")
print(f"\nüìÅ Artifacts saved to: {ARTIFACTS_DIR}")
print(f"   - Embedding database")
print(f"   - Trained classifier")
print(f"   - Recognition thresholds")
print(f"   - Label mappings")
print(f"   - Processing logs")


üß™ Running demo on sample dataset image...

FACE RECOGNITION DEMO

üì∏ Processing image: /mnt/NewDisk/sahil_project/FRS copy/Small_Dataset/6204010540/6204010540_A08A0HG3HA_aug19.jpg

------------------------------------------------------------
RECOGNITION RESULTS
------------------------------------------------------------
üë§ Identity: 6204010540
üéØ Confidence: 1.0000 (100.00%)
üìè Angular Distance: 0.0000 radians
   (Threshold: 1.0480)
üîç Detection Confidence: 0.8407
üì¶ Bounding Box: [441, 225, 803, 742]
‚ùì Is Unknown: False

‚úÖ Person recognized as: 6204010540

‚úÖ All cells completed successfully!

üìÅ Artifacts saved to: /mnt/NewDisk/sahil_project/FRS copy/Artifacts
   - Embedding database
   - Trained classifier
   - Recognition thresholds
   - Label mappings
   - Processing logs
