# Face Recognition System - Complete Tutorial

**Author:** Anik Tahabilder  
**Project:** 11 of 22 - Kaggle ML Portfolio  
**Dataset:** LFW (Labeled Faces in the Wild)  
**Difficulty:** 8/10 | **Learning Value:** 9/10

---

## What Will You Learn?

This tutorial teaches **Face Recognition from fundamentals to implementation**.

| Topic | What You'll Understand |
|-------|------------------------|
| **Face Detection vs Recognition** | Two different problems! |
| **Detection Algorithms** | Haar Cascades, HOG+SVM, CNN-based |
| **Recognition Algorithms** | Eigenfaces, LBPH, Deep Learning embeddings |
| **Face Embeddings** | How faces become 128-dimensional vectors |
| **Similarity Metrics** | Euclidean distance, cosine similarity |
| **Parameter Tuning** | How to optimize each algorithm |
| **Complete Pipeline** | Detection → Alignment → Encoding → Recognition |

---

## The Two Problems in Face Recognition

```
FACE DETECTION                      FACE RECOGNITION
"Where are the faces?"              "Who is this person?"

┌─────────────────┐                 ┌─────────────────┐
│  Input Image    │                 │  Detected Face  │
│  ┌───┐  ┌───┐   │                 │     ┌───┐       │
│  │ ? │  │ ? │   │    ────────>    │     │???│       │    ────────>   "John"
│  └───┘  └───┘   │                 │     └───┘       │
└─────────────────┘                 └─────────────────┘
Output: Bounding boxes              Output: Person identity
```

---

## Table of Contents

1. [Part 1: Understanding Face Detection vs Recognition](#part1)
2. [Part 2: Face Detection Algorithms](#part2)
3. [Part 3: Face Recognition Algorithms](#part3)
4. [Part 4: Face Embeddings & Similarity](#part4)
5. [Part 5: Dataset Loading & Preprocessing](#part5)
6. [Part 6: Face Detection Implementation](#part6)
7. [Part 7: Face Recognition Implementation](#part7)
8. [Part 8: Complete Recognition Pipeline](#part8)
9. [Part 9: Evaluation & Results](#part9)
10. [Part 10: Summary & Key Takeaways](#part10)

---

<a id='part1'></a>
# Part 1: Understanding Face Detection vs Recognition

---

## 1.1 The Two Separate Problems

| Aspect | Face Detection | Face Recognition |
|--------|---------------|------------------|
| **Question** | "Where are faces in this image?" | "Who is this person?" |
| **Input** | Any image | Cropped face image |
| **Output** | Bounding box coordinates | Person identity/name |
| **Type** | Object Detection | Classification/Verification |
| **Difficulty** | Easier | Harder |

## 1.2 Face Recognition Sub-Tasks

| Task | Description | Example |
|------|-------------|--------|
| **Verification (1:1)** | "Is this person X?" | Phone unlock, passport control |
| **Identification (1:N)** | "Who is this person?" | Finding person in database |
| **Clustering** | "Group similar faces" | Photo organization |

## 1.3 The Complete Pipeline

```
┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  Input   │───>│  Face    │───>│  Face    │───>│  Face    │───>│  Match   │
│  Image   │    │Detection │    │Alignment │    │ Encoding │    │ /Identify│
└──────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
                     │               │               │               │
                     v               v               v               v
               Bounding box    Normalized face   128-D vector    "Person X"
```

## 1.4 Why Is Face Recognition Hard?

| Challenge | Description |
|-----------|-------------|
| **Pose Variation** | Same person looks different from different angles |
| **Illumination** | Lighting changes appearance dramatically |
| **Expression** | Smiling vs neutral vs surprised |
| **Occlusion** | Glasses, masks, hair covering face |
| **Age** | People change over time |
| **Image Quality** | Blur, low resolution, compression |

In [None]:
# ============================================================
# SETUP AND IMPORTS
# ============================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import cv2
from PIL import Image
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# Sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

# Display settings
plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)

print("="*70)
print("FACE RECOGNITION SYSTEM - TUTORIAL")
print("="*70)
print(f"NumPy: {np.__version__}")
print(f"OpenCV: {cv2.__version__}")
print("\nAll libraries loaded!")

---

<a id='part2'></a>
# Part 2: Face Detection Algorithms

---

## 2.1 Overview of Detection Methods

| Algorithm | Year | Type | Speed | Accuracy | Best For |
|-----------|------|------|-------|----------|----------|
| **Haar Cascades** | 2001 | Classical CV | Very Fast | Medium | Real-time, embedded |
| **HOG + SVM** | 2005 | Classical ML | Fast | Good | Balanced performance |
| **MTCNN** | 2016 | Deep Learning | Medium | Very Good | Accuracy-critical |
| **RetinaFace** | 2019 | Deep Learning | Slow | Excellent | State-of-the-art |

---

## 2.2 Haar Cascade Classifier

### How It Works:

1. **Haar Features**: Simple rectangular features that capture patterns
2. **Integral Image**: Fast computation of feature values
3. **AdaBoost**: Selects best features and creates weak classifiers
4. **Cascade**: Chain of classifiers that quickly reject non-faces

```
Haar Features:
┌───┬───┐   ┌───────┐   ┌───┬───┐
│ W │ B │   │   W   │   │ W │ B │
├───┼───┤   ├───────┤   │   │   │
│ B │ W │   │   B   │   │ B │ W │
└───┴───┘   └───────┘   └───┴───┘
Edge feat.  Line feat.  Four-rect.

Value = Σ(white pixels) - Σ(black pixels)
```

### Key Parameters:

| Parameter | Description | Default | Tuning Tip |
|-----------|-------------|---------|------------|
| `scaleFactor` | Image size reduction per scale | 1.1 | Lower = more accurate, slower |
| `minNeighbors` | Min detections to keep | 3 | Higher = fewer false positives |
| `minSize` | Minimum face size | (30,30) | Set based on expected face size |
| `maxSize` | Maximum face size | None | Limit for efficiency |

---

## 2.3 HOG + SVM (Histogram of Oriented Gradients)

### How It Works:

1. **Compute Gradients**: Find edge directions in image
2. **Create Histograms**: Count gradient orientations in cells
3. **Normalize Blocks**: Improve invariance to lighting
4. **SVM Classifier**: Trained to separate faces vs non-faces

```
Original     Gradient      HOG Cells     Concatenated
 Image      Directions                    Feature Vector
┌─────┐      ┌─────┐      ┌─┬─┬─┐       ┌─────────────┐
│     │  ->  │↗↘↙↖│  ->  │█│▄│░│  ->   │ 0.1 0.3 ... │
│  :) │      │↗↘↙↖│      │▄│█│▄│       │    3780-D   │
└─────┘      └─────┘      └─┴─┴─┘       └─────────────┘
```

### Advantages over Haar:

| Aspect | Haar | HOG |
|--------|------|-----|
| Feature type | Simple rectangles | Gradient histograms |
| Rotation | Sensitive | More robust |
| Lighting | Sensitive | More robust |
| Speed | Faster | Slightly slower |

---

## 2.4 CNN-Based Detection (MTCNN)

### Architecture: Three-Stage Cascade

```
Input Image
     │
     v
┌─────────┐     ┌─────────┐     ┌─────────┐
│ P-Net   │────>│ R-Net   │────>│ O-Net   │
│(Proposal)│     │(Refine) │     │(Output) │
└─────────┘     └─────────┘     └─────────┘
     │               │               │
     v               v               v
 Candidate       Filtered        Final boxes
  boxes          boxes          + landmarks
```

| Stage | Purpose | Output |
|-------|---------|--------|
| **P-Net** | Fast scanning, propose candidates | Many rough boxes |
| **R-Net** | Refine candidates, remove false positives | Fewer, better boxes |
| **O-Net** | Final refinement + facial landmarks | Precise boxes + 5 landmarks |

In [None]:
# ============================================================
# FACE DETECTION ALGORITHMS COMPARISON
# ============================================================
print("="*70)
print("FACE DETECTION ALGORITHMS")
print("="*70)

# Detection Algorithm Summary Table
detection_methods = pd.DataFrame({
    'Algorithm': ['Haar Cascades', 'HOG + SVM', 'MTCNN', 'RetinaFace'],
    'Type': ['Classical CV', 'Classical ML', 'Deep Learning', 'Deep Learning'],
    'Year': [2001, 2005, 2016, 2019],
    'Speed': ['Very Fast', 'Fast', 'Medium', 'Slow'],
    'Accuracy': ['Medium', 'Good', 'Very Good', 'Excellent'],
    'Key_Feature': ['Haar features + AdaBoost', 'Gradient histograms + SVM', 
                    'Three-stage CNN cascade', 'Single-stage detector + FPN'],
    'Best_For': ['Real-time/Embedded', 'Balanced', 'High accuracy', 'State-of-the-art']
})

print("\nFace Detection Methods Comparison:")
print(detection_methods.to_string(index=False))

---

<a id='part3'></a>
# Part 3: Face Recognition Algorithms

---

## 3.1 Overview of Recognition Methods

| Algorithm | Year | Type | Training Data | Accuracy | Interpretable |
|-----------|------|------|---------------|----------|---------------|
| **Eigenfaces (PCA)** | 1991 | Statistical | Medium | Low-Medium | Yes |
| **Fisherfaces (LDA)** | 1997 | Statistical | Medium | Medium | Yes |
| **LBPH** | 2006 | Texture-based | Low | Medium | Somewhat |
| **Deep Learning** | 2014+ | Neural Network | Very High | Very High | No |

---

## 3.2 Eigenfaces (PCA-based)

### How It Works:

1. **Flatten faces**: Convert each face image to a vector
2. **Compute mean face**: Average of all training faces
3. **PCA**: Find principal components (eigenfaces)
4. **Project**: Represent each face as combination of eigenfaces
5. **Classify**: Compare projections using distance metric

```
Training Faces         Eigenfaces              Recognition
┌───┐ ┌───┐ ┌───┐      ┌───┐ ┌───┐ ┌───┐      New Face -> Project -> Compare
│ A │ │ B │ │ C │  ->  │EF1│ │EF2│ │EF3│  ->  [0.3, 0.7, 0.1] ≈ Person B
└───┘ └───┘ └───┘      └───┘ └───┘ └───┘
```

### Mathematical Formulation:

| Step | Formula | Description |
|------|---------|-------------|
| Mean face | $\mu = \frac{1}{N}\sum_{i=1}^{N} x_i$ | Average of all faces |
| Centered | $\Phi_i = x_i - \mu$ | Subtract mean |
| Covariance | $C = \frac{1}{N}\sum \Phi_i \Phi_i^T$ | Compute covariance |
| Eigenfaces | $C \cdot v_k = \lambda_k \cdot v_k$ | Top eigenvectors |
| Projection | $\omega = U^T (x - \mu)$ | Face in eigenface space |

### Key Parameters:

| Parameter | Description | Tuning Tip |
|-----------|-------------|------------|
| `n_components` | Number of eigenfaces to keep | 50-150 typically works well |
| `whiten` | Normalize variance | Often improves results |

---

## 3.3 LBPH (Local Binary Pattern Histograms)

### How It Works:

1. **Divide face into regions** (e.g., 8x8 grid)
2. **For each pixel**: Compare with neighbors, create binary pattern
3. **Create histogram**: Count pattern occurrences per region
4. **Concatenate**: Combine all region histograms
5. **Compare**: Use histogram distance for matching

```
LBP Computation:
                    Binary Pattern
┌───┬───┬───┐      ┌───┬───┬───┐
│ 6 │ 5 │ 2 │      │ 1 │ 1 │ 0 │    Pattern: 11010011
├───┼───┼───┤  ->  ├───┼───┼───┤    Decimal: 211
│ 7 │[4]│ 1 │      │ 1 │   │ 0 │
├───┼───┼───┤      ├───┼───┼───┤    (Compare each neighbor
│ 8 │ 3 │ 9 │      │ 1 │ 0 │ 1 │     with center value 4)
└───┴───┴───┘      └───┴───┴───┘
```

### Key Parameters:

| Parameter | Description | Default | Tuning Tip |
|-----------|-------------|---------|------------|
| `radius` | Radius of circular pattern | 1 | Larger = captures larger features |
| `neighbors` | Number of sampling points | 8 | 8 or 16 typical |
| `grid_x, grid_y` | Number of cells | 8x8 | More cells = more detail |

### Advantages of LBPH:

| Advantage | Description |
|-----------|-------------|
| **Illumination invariant** | Compares relative values, not absolute |
| **Computationally simple** | No complex training |
| **Works with few samples** | Can work with single training image |
| **Real-time capable** | Fast enough for live recognition |

---

## 3.4 Deep Learning Embeddings

### How Modern Face Recognition Works:

```
Face Image        CNN Backbone         Embedding          Comparison
┌─────────┐      ┌───────────┐      ┌─────────────┐      ┌─────────┐
│         │      │           │      │             │      │         │
│   :)    │ ---> │  ResNet   │ ---> │ 128-D vector│ ---> │ Distance│
│         │      │  VGGFace  │      │ [0.1, 0.3...│      │  < 0.6  │
└─────────┘      └───────────┘      └─────────────┘      └─────────┘
                                                          Same person!
```

### Popular Deep Learning Models:

| Model | Architecture | Embedding Size | Training Data | LFW Accuracy |
|-------|--------------|----------------|---------------|---------------|
| **FaceNet** | Inception | 128-D | 200M faces | 99.63% |
| **VGGFace** | VGG-16 | 4096-D | 2.6M faces | 98.95% |
| **ArcFace** | ResNet | 512-D | 5.8M faces | 99.83% |
| **dlib** | ResNet | 128-D | 3M faces | 99.38% |

### Training Objectives:

| Loss Function | Description |
|---------------|-------------|
| **Triplet Loss** | Push same-person embeddings together, different apart |
| **Contrastive Loss** | Similar to triplet, uses pairs |
| **ArcFace/CosFace** | Angular margin for better separation |

In [None]:
# ============================================================
# FACE RECOGNITION ALGORITHMS COMPARISON
# ============================================================
print("="*70)
print("FACE RECOGNITION ALGORITHMS")
print("="*70)

recognition_methods = pd.DataFrame({
    'Algorithm': ['Eigenfaces (PCA)', 'Fisherfaces (LDA)', 'LBPH', 'Deep Learning'],
    'Year': [1991, 1997, 2006, '2014+'],
    'Type': ['Statistical', 'Statistical', 'Texture-based', 'Neural Network'],
    'How_It_Works': [
        'PCA dimensionality reduction',
        'LDA maximizes class separation',
        'Local binary patterns + histograms',
        'CNN extracts 128-D embeddings'
    ],
    'Training_Needed': ['Medium', 'Medium', 'Low', 'Very High (pre-trained)'],
    'Accuracy': ['Low-Medium', 'Medium', 'Medium', 'Very High'],
    'Pros': [
        'Simple, interpretable',
        'Better class separation',
        'Works with few samples, fast',
        'State-of-the-art accuracy'
    ],
    'Cons': [
        'Sensitive to lighting/pose',
        'Needs multiple samples per class',
        'Not as accurate as DL',
        'Needs large training data'
    ]
})

print("\nFace Recognition Methods:")
for _, row in recognition_methods.iterrows():
    print(f"\n{row['Algorithm']} ({row['Year']})")
    print(f"  Type: {row['Type']}")
    print(f"  How: {row['How_It_Works']}")
    print(f"  Accuracy: {row['Accuracy']}")
    print(f"  Pros: {row['Pros']}")
    print(f"  Cons: {row['Cons']}")

---

<a id='part4'></a>
# Part 4: Face Embeddings & Similarity Metrics

---

## 4.1 What is a Face Embedding?

A face embedding is a **compact numerical representation** of a face.

```
Face Image (150x150x3)          Face Embedding (128-D)
     67,500 values                  128 values
┌─────────────────┐             ┌─────────────────────┐
│                 │    CNN      │                     │
│      :)         │  ────────>  │ [0.12, -0.34, 0.56, │
│                 │             │  0.23, -0.11, ...]  │
└─────────────────┘             └─────────────────────┘
```

### Properties of Good Embeddings:

| Property | Description |
|----------|-------------|
| **Compact** | Low-dimensional (typically 128-512) |
| **Discriminative** | Different people have different embeddings |
| **Robust** | Same person has similar embeddings despite variations |
| **Normalized** | Often unit length (L2 normalized) |

---

## 4.2 Similarity Metrics

To compare two face embeddings, we use **distance metrics**:

### Euclidean Distance:

$$d(a, b) = \sqrt{\sum_{i=1}^{n} (a_i - b_i)^2}$$

| Distance | Interpretation |
|----------|----------------|
| < 0.6 | Same person (typical threshold) |
| 0.6 - 1.0 | Uncertain |
| > 1.0 | Different people |

### Cosine Similarity:

$$\cos(a, b) = \frac{a \cdot b}{\|a\| \|b\|}$$

| Similarity | Interpretation |
|------------|----------------|
| > 0.7 | Same person |
| 0.5 - 0.7 | Uncertain |
| < 0.5 | Different people |

---

## 4.3 Threshold Selection

The **threshold** determines who is considered a match:

```
                    False Reject Rate (FRR)
                    ←───────────────────────
                         ┌─────┐
Same Person:        ████████   │
                    ████████   │
                         └─────┼─────┐
Different Person:              │█████████████
                               │█████████████
                    ───────────┴─────────────>
                           Threshold
                    ───────────────────────>
                    False Accept Rate (FAR)
```

| Threshold | FAR | FRR | Use Case |
|-----------|-----|-----|----------|
| **Strict (0.4)** | Very Low | High | High security |
| **Balanced (0.6)** | Low | Medium | General use |
| **Lenient (0.8)** | Medium | Low | Convenience |

In [None]:
# ============================================================
# SIMILARITY METRICS IMPLEMENTATION
# ============================================================
print("="*70)
print("SIMILARITY METRICS")
print("="*70)

def euclidean_distance(a, b):
    """Compute Euclidean distance between two vectors."""
    return np.sqrt(np.sum((a - b) ** 2))

def cosine_similarity(a, b):
    """Compute cosine similarity between two vectors."""
    dot_product = np.dot(a, b)
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)
    return dot_product / (norm_a * norm_b)

# Example with synthetic embeddings
print("\nExample: Comparing Face Embeddings")
print("-" * 50)

# Simulate embeddings
np.random.seed(42)
person_a_face1 = np.random.randn(128) * 0.1 + np.array([1] * 64 + [0] * 64)  # Person A
person_a_face2 = person_a_face1 + np.random.randn(128) * 0.05  # Same person, different photo
person_b_face = np.random.randn(128) * 0.1 + np.array([0] * 64 + [1] * 64)   # Person B

# Normalize (like real embeddings)
person_a_face1 = person_a_face1 / np.linalg.norm(person_a_face1)
person_a_face2 = person_a_face2 / np.linalg.norm(person_a_face2)
person_b_face = person_b_face / np.linalg.norm(person_b_face)

# Compare
print("\nComparisons:")
print(f"  Person A (face1) vs Person A (face2):")
print(f"    Euclidean: {euclidean_distance(person_a_face1, person_a_face2):.4f}")
print(f"    Cosine:    {cosine_similarity(person_a_face1, person_a_face2):.4f}")
print(f"    Result:    SAME PERSON (distance < 0.6)")

print(f"\n  Person A vs Person B:")
print(f"    Euclidean: {euclidean_distance(person_a_face1, person_b_face):.4f}")
print(f"    Cosine:    {cosine_similarity(person_a_face1, person_b_face):.4f}")
print(f"    Result:    DIFFERENT PEOPLE (distance > 0.6)")

---

<a id='part5'></a>
# Part 5: Dataset Loading & Preprocessing

---

## 5.1 LFW Dataset (Labeled Faces in the Wild)

| Attribute | Value |
|-----------|-------|
| **Kaggle Dataset** | `lfwpeople` |
| **Kaggle Path** | `/kaggle/input/lfwpeople/lfw_funneled` |
| **Total Images** | 13,233 |
| **People** | 5,749 |
| **Image Size** | 250x250 (original) |
| **Format** | JPEG |
| **Challenge** | Unconstrained (real-world conditions) |

In [None]:
# ============================================================
# LOAD LFW DATASET FROM KAGGLE
# ============================================================
print("="*70)
print("LOADING LFW DATASET")
print("="*70)

# ============================================================
# KAGGLE PATH CONFIGURATION
# ============================================================
# Dataset: https://www.kaggle.com/datasets/jessicali9530/lfw-dataset
#
# HOW TO ADD DATASET IN KAGGLE:
# 1. Click "Add data" button (top right of notebook)
# 2. Search for "lfw" or "lfwpeople"
# 3. Click "Add" to attach the dataset

# Check if running on Kaggle
USE_KAGGLE = os.path.exists('/kaggle/input')

if USE_KAGGLE:
    # Try multiple possible paths (dataset structure may vary)
    POSSIBLE_PATHS = [
        '/kaggle/input/lfwpeople/lfw_funneled',           # Primary path
        '/kaggle/input/lfwpeople',                         # Alternative
        '/kaggle/input/lfw-dataset/lfw-deepfunneled/lfw-deepfunneled',
        '/kaggle/input/lfw-dataset/lfw-deepfunneled',
        '/kaggle/input/lfw-dataset'
    ]
    DATASET_PATH = None
    for path in POSSIBLE_PATHS:
        if os.path.exists(path):
            # Check if it has person subdirectories
            subdirs = [d for d in os.listdir(path) if os.path.isdir(os.path.join(path, d))]
            if len(subdirs) > 10:  # LFW has many person folders
                DATASET_PATH = path
                break

    if DATASET_PATH is None:
        print("WARNING: LFW dataset not found!")
        print("Please add the dataset to your Kaggle notebook:")
        print("  1. Click 'Add data' button")
        print("  2. Search for 'lfwpeople' or 'lfw-dataset'")
        print("  3. Click 'Add' and re-run this cell")
        print("\nFalling back to sklearn...")
        USE_KAGGLE = False
else:
    DATASET_PATH = None

def load_lfw_from_kaggle(base_path, min_faces_per_person=20, image_size=(100, 100)):
    """
    Load LFW dataset from Kaggle directory structure.

    Parameters:
    - base_path: Path to lfw folder
    - min_faces_per_person: Minimum images per person to include
    - image_size: Resize images to this size

    Returns:
    - images: numpy array of face images
    - labels: numpy array of person names
    - label_names: list of unique person names
    """
    images = []
    labels = []

    # Get all person directories
    person_dirs = [d for d in os.listdir(base_path)
                   if os.path.isdir(os.path.join(base_path, d))]

    print(f"Found {len(person_dirs)} people in dataset")

    # Filter by minimum faces
    valid_persons = []
    for person in person_dirs:
        person_path = os.path.join(base_path, person)
        # Check for both .jpg and .png files
        n_images = len([f for f in os.listdir(person_path) 
                       if f.lower().endswith(('.jpg', '.jpeg', '.png'))])
        if n_images >= min_faces_per_person:
            valid_persons.append((person, n_images))

    print(f"People with >= {min_faces_per_person} images: {len(valid_persons)}")

    # Load images
    for person, _ in valid_persons:
        person_path = os.path.join(base_path, person)
        for img_file in os.listdir(person_path):
            if img_file.lower().endswith(('.jpg', '.jpeg', '.png')):
                img_path = os.path.join(person_path, img_file)

                # Load and preprocess
                img = cv2.imread(img_path)
                if img is not None:
                    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
                    img = cv2.resize(img, image_size)
                    images.append(img)
                    labels.append(person)

    images = np.array(images)
    labels = np.array(labels)
    label_names = list(set(labels))

    return images, labels, label_names

# Load dataset
if USE_KAGGLE and DATASET_PATH:
    try:
        print(f"\nLoading from: {DATASET_PATH}")
        images, labels, label_names = load_lfw_from_kaggle(
            DATASET_PATH,
            min_faces_per_person=20,
            image_size=(100, 100)
        )
        print(f"\nDataset loaded successfully!")
    except Exception as e:
        print(f"Error loading from Kaggle: {e}")
        print("Falling back to sklearn...")
        USE_KAGGLE = False

if not USE_KAGGLE or DATASET_PATH is None:
    # Import here so it's available when needed
    from sklearn.datasets import fetch_lfw_people
    print("\nLoading from sklearn (this may take a moment)...")
    lfw = fetch_lfw_people(min_faces_per_person=20, resize=0.4)
    images = lfw.images
    labels = np.array([lfw.target_names[i] for i in lfw.target])
    label_names = list(lfw.target_names)
    # Convert grayscale to RGB-like
    images = np.stack([images] * 3, axis=-1) if len(images.shape) == 3 else images

print(f"\n" + "="*50)
print("DATASET SUMMARY")
print("="*50)
print(f"Total images: {len(images)}")
print(f"Image shape: {images[0].shape}")
print(f"Number of people: {len(label_names)}")
print(f"\nImages per person:")
label_counts = Counter(labels)
for person, count in sorted(label_counts.items(), key=lambda x: -x[1])[:10]:
    print(f"  {person}: {count} images")

In [None]:
# Visualize sample faces
print("="*70)
print("SAMPLE FACES FROM DATASET")
print("="*70)

# Show samples from different people
unique_labels = list(set(labels))
n_show = min(10, len(unique_labels))

fig, axes = plt.subplots(2, 5, figsize=(15, 7))

for i, ax in enumerate(axes.flat):
    if i < n_show:
        person = unique_labels[i]
        # Get first image of this person
        idx = np.where(labels == person)[0][0]
        
        ax.imshow(images[idx])
        ax.set_title(person.replace('_', ' '), fontsize=10, fontweight='bold')
        ax.axis('off')

plt.suptitle('Sample Faces from LFW Dataset', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# ============================================================
# PREPROCESS DATA
# ============================================================
print("="*70)
print("DATA PREPROCESSING")
print("="*70)

# Encode labels
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(labels)
n_classes = len(label_encoder.classes_)

print(f"\nLabel encoding: {n_classes} classes")

# Normalize images to [0, 1]
X = images.astype('float32') / 255.0

# For classical methods, convert to grayscale and flatten
X_gray = np.array([cv2.cvtColor((img * 255).astype('uint8'), cv2.COLOR_RGB2GRAY) 
                   for img in X])
X_flat = X_gray.reshape(len(X_gray), -1)

print(f"\nData shapes:")
print(f"  Original (RGB):  {X.shape}")
print(f"  Grayscale:       {X_gray.shape}")
print(f"  Flattened:       {X_flat.shape}")

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X_flat, y, test_size=0.2, random_state=42, stratify=y
)

# Also split grayscale images (for LBPH)
X_train_gray, X_test_gray, _, _ = train_test_split(
    X_gray, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\nTrain/Test split:")
print(f"  Train: {len(X_train)} samples")
print(f"  Test:  {len(X_test)} samples")

---

<a id='part6'></a>
# Part 6: Face Detection Implementation

---

In [None]:
# ============================================================
# HAAR CASCADE FACE DETECTION
# ============================================================
print("="*70)
print("HAAR CASCADE FACE DETECTION")
print("="*70)

# Load pre-trained Haar cascade
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

print("""
Haar Cascade Parameters:
========================

| Parameter      | Description                    | Our Value | Effect |
|----------------|--------------------------------|-----------|--------|
| scaleFactor    | Image pyramid scale            | 1.1       | Smaller = more accurate, slower |
| minNeighbors   | Detections needed to confirm   | 5         | Higher = fewer false positives |
| minSize        | Minimum face size              | (30, 30)  | Filter out small faces |
""")

def detect_faces_haar(image, scale_factor=1.1, min_neighbors=5, min_size=(30, 30)):
    """
    Detect faces using Haar Cascade.
    
    Parameters:
    - image: Input image (RGB or grayscale)
    - scale_factor: How much image size is reduced at each scale
    - min_neighbors: Minimum number of neighbor rectangles to retain
    - min_size: Minimum face size to detect
    
    Returns:
    - faces: List of (x, y, w, h) bounding boxes
    """
    # Convert to grayscale if needed
    if len(image.shape) == 3:
        gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    else:
        gray = image
    
    # Detect faces
    faces = face_cascade.detectMultiScale(
        gray,
        scaleFactor=scale_factor,
        minNeighbors=min_neighbors,
        minSize=min_size
    )
    
    return faces

# Test on sample images
print("\nTesting face detection on sample images...")

fig, axes = plt.subplots(2, 4, figsize=(16, 8))

for i, ax in enumerate(axes.flat):
    # Convert normalized image back to uint8
    img = (X[i] * 255).astype('uint8')
    
    # Detect faces
    faces = detect_faces_haar(img)
    
    # Draw bounding boxes
    img_with_boxes = img.copy()
    for (x, y, w, h) in faces:
        cv2.rectangle(img_with_boxes, (x, y), (x+w, y+h), (0, 255, 0), 2)
    
    ax.imshow(img_with_boxes)
    ax.set_title(f'Detected: {len(faces)} face(s)', fontweight='bold')
    ax.axis('off')

plt.suptitle('Haar Cascade Face Detection Results', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# ============================================================
# EFFECT OF HAAR CASCADE PARAMETERS
# ============================================================
print("="*70)
print("PARAMETER TUNING: EFFECT OF minNeighbors")
print("="*70)

# Test different minNeighbors values
test_img = (X[0] * 255).astype('uint8')
min_neighbors_values = [1, 3, 5, 10]

fig, axes = plt.subplots(1, 4, figsize=(16, 4))

for ax, min_n in zip(axes, min_neighbors_values):
    faces = detect_faces_haar(test_img, min_neighbors=min_n)
    
    img_copy = test_img.copy()
    for (x, y, w, h) in faces:
        cv2.rectangle(img_copy, (x, y), (x+w, y+h), (0, 255, 0), 2)
    
    ax.imshow(img_copy)
    ax.set_title(f'minNeighbors={min_n}\n({len(faces)} detections)', fontweight='bold')
    ax.axis('off')

plt.suptitle('Effect of minNeighbors Parameter', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print("\nObservation:")
print("  - Low minNeighbors: More detections, more false positives")
print("  - High minNeighbors: Fewer detections, may miss faces")
print("  - Recommended: 3-5 for balanced results")

---

<a id='part7'></a>
# Part 7: Face Recognition Implementation

---

## 7.1 Eigenfaces (PCA-based Recognition)

In [None]:
# ============================================================
# EIGENFACES IMPLEMENTATION
# ============================================================
print("="*70)
print("EIGENFACES (PCA-BASED) FACE RECOGNITION")
print("="*70)

print("""
Eigenfaces Algorithm:
=====================
1. Flatten each face image into a vector
2. Compute mean face (average of all faces)
3. Apply PCA to find principal components (eigenfaces)
4. Project faces into eigenface space
5. Use classifier (KNN/SVM) on projections

Key Parameter: n_components (number of eigenfaces)
  - Too few: Loses important information
  - Too many: Keeps noise, slower
  - Typical: 50-150 components
""")

class EigenfaceRecognizer:
    """
    Face recognition using Eigenfaces (PCA + Classifier).
    """
    
    def __init__(self, n_components=100, classifier='svm'):
        """
        Initialize Eigenface recognizer.
        
        Parameters:
        - n_components: Number of eigenfaces to keep
        - classifier: 'svm' or 'knn'
        """
        self.n_components = n_components
        self.pca = PCA(n_components=n_components, whiten=True)
        
        if classifier == 'svm':
            self.classifier = SVC(kernel='rbf', C=1.0, gamma='scale')
        else:
            self.classifier = KNeighborsClassifier(n_neighbors=5)
        
        self.mean_face = None
        self.eigenfaces = None
    
    def fit(self, X, y):
        """
        Train the recognizer.
        
        Parameters:
        - X: Training faces, shape (n_samples, n_pixels)
        - y: Labels
        """
        # Compute mean face
        self.mean_face = np.mean(X, axis=0)
        
        # Fit PCA
        X_pca = self.pca.fit_transform(X)
        
        # Store eigenfaces
        self.eigenfaces = self.pca.components_
        
        # Train classifier
        self.classifier.fit(X_pca, y)
        
        return self
    
    def predict(self, X):
        """
        Predict identities.
        
        Parameters:
        - X: Test faces, shape (n_samples, n_pixels)
        
        Returns:
        - Predicted labels
        """
        X_pca = self.pca.transform(X)
        return self.classifier.predict(X_pca)
    
    def get_explained_variance(self):
        """Get cumulative explained variance by components."""
        return np.cumsum(self.pca.explained_variance_ratio_)

# Train Eigenface recognizer
print("\nTraining Eigenface recognizer...")
print(f"  n_components: 100")
print(f"  Classifier: SVM (RBF kernel)")

eigenface_recognizer = EigenfaceRecognizer(n_components=100, classifier='svm')
eigenface_recognizer.fit(X_train, y_train)

# Evaluate
y_pred_eigen = eigenface_recognizer.predict(X_test)
accuracy_eigen = accuracy_score(y_test, y_pred_eigen)

print(f"\nEigenfaces Accuracy: {accuracy_eigen*100:.2f}%")

In [None]:
# Visualize eigenfaces
print("="*70)
print("VISUALIZING EIGENFACES")
print("="*70)

fig, axes = plt.subplots(2, 5, figsize=(15, 7))

# Mean face
img_shape = X_gray[0].shape

axes[0, 0].imshow(eigenface_recognizer.mean_face.reshape(img_shape), cmap='gray')
axes[0, 0].set_title('Mean Face', fontweight='bold')
axes[0, 0].axis('off')

# First 9 eigenfaces
for i in range(1, 10):
    row, col = i // 5, i % 5
    eigenface = eigenface_recognizer.eigenfaces[i-1].reshape(img_shape)
    axes[row, col].imshow(eigenface, cmap='gray')
    axes[row, col].set_title(f'Eigenface {i}', fontweight='bold')
    axes[row, col].axis('off')

plt.suptitle('Mean Face and Top Eigenfaces', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("  - Eigenface 1-3: Capture lighting variations")
print("  - Eigenface 4+: Capture facial structure differences")
print("  - Each face = mean + weighted sum of eigenfaces")

In [None]:
# Plot explained variance
print("="*70)
print("PCA EXPLAINED VARIANCE")
print("="*70)

explained_var = eigenface_recognizer.get_explained_variance()

fig, ax = plt.subplots(figsize=(10, 5))

ax.plot(range(1, len(explained_var) + 1), explained_var * 100, 'b-', linewidth=2)
ax.axhline(y=95, color='r', linestyle='--', label='95% variance')
ax.axhline(y=90, color='orange', linestyle='--', label='90% variance')

# Find components for 90% and 95%
n_90 = np.argmax(explained_var >= 0.90) + 1
n_95 = np.argmax(explained_var >= 0.95) + 1

ax.axvline(x=n_90, color='orange', linestyle=':', alpha=0.7)
ax.axvline(x=n_95, color='r', linestyle=':', alpha=0.7)

ax.set_xlabel('Number of Components')
ax.set_ylabel('Cumulative Explained Variance (%)')
ax.set_title('PCA: Explained Variance vs Components', fontweight='bold', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nComponents needed:")
print(f"  90% variance: {n_90} components")
print(f"  95% variance: {n_95} components")
print(f"\nUsing {eigenface_recognizer.n_components} components captures {explained_var[-1]*100:.1f}% variance")

In [None]:
# ============================================================
# LBPH FACE RECOGNITION
# ============================================================
print("="*70)
print("LBPH (LOCAL BINARY PATTERN HISTOGRAM) RECOGNITION")
print("="*70)

print("""
LBPH Algorithm:
===============
1. Divide face into grid of cells
2. For each pixel, compute LBP code (compare with neighbors)
3. Build histogram of LBP codes for each cell
4. Concatenate histograms into feature vector
5. Compare using histogram distance

Key Parameters:
  - radius: Radius for LBP (1, 2, or 3)
  - neighbors: Number of sampling points (8, 16, 24)
  - grid_x, grid_y: Number of cells in grid
""")

# Create LBPH recognizer
lbph_recognizer = cv2.face.LBPHFaceRecognizer_create(
    radius=1,        # Radius of circular LBP pattern
    neighbors=8,     # Number of neighbors to sample
    grid_x=8,        # Number of cells in X direction
    grid_y=8         # Number of cells in Y direction
)

print("\nLBPH Parameters:")
print(f"  radius: 1 (local pattern size)")
print(f"  neighbors: 8 (sampling points)")
print(f"  grid: 8x8 (spatial granularity)")

# Train LBPH
print("\nTraining LBPH recognizer...")
lbph_recognizer.train(X_train_gray, y_train)

# Evaluate
y_pred_lbph = []
for face in X_test_gray:
    label, confidence = lbph_recognizer.predict(face)
    y_pred_lbph.append(label)

y_pred_lbph = np.array(y_pred_lbph)
accuracy_lbph = accuracy_score(y_test, y_pred_lbph)

print(f"\nLBPH Accuracy: {accuracy_lbph*100:.2f}%")

In [None]:
# ============================================================
# COMPARE DIFFERENT LBPH PARAMETERS
# ============================================================
print("="*70)
print("LBPH PARAMETER TUNING")
print("="*70)

# Test different parameters
param_combinations = [
    {'radius': 1, 'neighbors': 8, 'grid_x': 8, 'grid_y': 8},
    {'radius': 2, 'neighbors': 8, 'grid_x': 8, 'grid_y': 8},
    {'radius': 1, 'neighbors': 16, 'grid_x': 8, 'grid_y': 8},
    {'radius': 1, 'neighbors': 8, 'grid_x': 4, 'grid_y': 4},
    {'radius': 1, 'neighbors': 8, 'grid_x': 12, 'grid_y': 12},
]

results = []
for params in param_combinations:
    lbph = cv2.face.LBPHFaceRecognizer_create(**params)
    lbph.train(X_train_gray, y_train)
    
    y_pred = [lbph.predict(face)[0] for face in X_test_gray]
    acc = accuracy_score(y_test, y_pred)
    
    results.append({
        'radius': params['radius'],
        'neighbors': params['neighbors'],
        'grid': f"{params['grid_x']}x{params['grid_y']}",
        'accuracy': acc
    })

results_df = pd.DataFrame(results)
results_df['accuracy'] = results_df['accuracy'] * 100

print("\nLBPH Parameter Comparison:")
print(results_df.to_string(index=False))

best_idx = results_df['accuracy'].idxmax()
print(f"\nBest parameters: {param_combinations[best_idx]}")
print(f"Best accuracy: {results_df.loc[best_idx, 'accuracy']:.2f}%")

---

<a id='part8'></a>
# Part 8: Complete Recognition Pipeline

---

In [None]:
# ============================================================
# COMPLETE FACE RECOGNITION PIPELINE
# ============================================================
print("="*70)
print("COMPLETE FACE RECOGNITION PIPELINE")
print("="*70)

class FaceRecognitionPipeline:
    """
    Complete face recognition pipeline:
    Detection -> Preprocessing -> Recognition
    """
    
    def __init__(self, recognizer_type='eigenfaces', n_components=100):
        """
        Initialize pipeline.
        
        Parameters:
        - recognizer_type: 'eigenfaces' or 'lbph'
        - n_components: For eigenfaces
        """
        # Face detector (Haar cascade)
        self.face_cascade = cv2.CascadeClassifier(
            cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
        )
        
        # Face recognizer
        self.recognizer_type = recognizer_type
        if recognizer_type == 'eigenfaces':
            self.recognizer = EigenfaceRecognizer(n_components=n_components)
        else:
            self.recognizer = cv2.face.LBPHFaceRecognizer_create()
        
        self.label_encoder = None
        self.target_size = (100, 100)
        self.is_trained = False
    
    def detect_face(self, image):
        """
        Detect and extract face from image.
        Returns cropped face or None.
        """
        gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY) if len(image.shape) == 3 else image
        
        faces = self.face_cascade.detectMultiScale(
            gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30)
        )
        
        if len(faces) == 0:
            return None, None
        
        # Take largest face
        x, y, w, h = max(faces, key=lambda f: f[2] * f[3])
        face = gray[y:y+h, x:x+w]
        face = cv2.resize(face, self.target_size)
        
        return face, (x, y, w, h)
    
    def train(self, images, labels):
        """
        Train the recognizer.
        
        Parameters:
        - images: List of face images (already cropped)
        - labels: List of person names
        """
        # Encode labels
        self.label_encoder = LabelEncoder()
        y = self.label_encoder.fit_transform(labels)
        
        # Preprocess images
        X_processed = []
        for img in images:
            if len(img.shape) == 3:
                img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
            img = cv2.resize(img, self.target_size)
            X_processed.append(img)
        X_processed = np.array(X_processed)
        
        # Train recognizer
        if self.recognizer_type == 'eigenfaces':
            X_flat = X_processed.reshape(len(X_processed), -1)
            self.recognizer.fit(X_flat, y)
        else:
            self.recognizer.train(X_processed, y)
        
        self.is_trained = True
        return self
    
    def recognize(self, image, return_confidence=False):
        """
        Recognize person in image.
        
        Returns:
        - name: Predicted person name
        - bbox: Face bounding box
        - confidence: (optional) Recognition confidence
        """
        if not self.is_trained:
            raise ValueError("Pipeline not trained! Call train() first.")
        
        # Detect face
        face, bbox = self.detect_face(image)
        
        if face is None:
            return "No face detected", None, 0
        
        # Recognize
        if self.recognizer_type == 'eigenfaces':
            face_flat = face.flatten().reshape(1, -1)
            pred = self.recognizer.predict(face_flat)[0]
            confidence = 1.0  # Eigenfaces doesn't give confidence
        else:
            pred, confidence = self.recognizer.predict(face)
            confidence = max(0, 100 - confidence) / 100  # Convert to 0-1 scale
        
        name = self.label_encoder.inverse_transform([pred])[0]
        
        if return_confidence:
            return name, bbox, confidence
        return name, bbox

# Create and train pipeline
print("\nCreating Face Recognition Pipeline...")
pipeline = FaceRecognitionPipeline(recognizer_type='lbph')

# Train on grayscale images
print("Training pipeline...")
pipeline.train(X_gray, labels)

print("\nPipeline ready!")
print(f"  Recognizer: LBPH")
print(f"  Trained on: {len(X_gray)} images")
print(f"  Classes: {len(set(labels))} people")

In [None]:
# Test the pipeline
print("="*70)
print("TESTING RECOGNITION PIPELINE")
print("="*70)

# Test on some images
n_test = 12
test_indices = np.random.choice(len(X), n_test, replace=False)

fig, axes = plt.subplots(3, 4, figsize=(16, 12))

correct = 0
for i, ax in enumerate(axes.flat):
    idx = test_indices[i]
    img = (X[idx] * 255).astype('uint8')
    true_name = labels[idx]
    
    # Recognize
    pred_name, bbox, conf = pipeline.recognize(img, return_confidence=True)
    
    # Draw result
    img_display = img.copy()
    if bbox is not None:
        x, y, w, h = bbox
        color = (0, 255, 0) if pred_name == true_name else (255, 0, 0)
        cv2.rectangle(img_display, (x, y), (x+w, y+h), color, 2)
    
    ax.imshow(img_display)
    
    is_correct = pred_name.replace('_', ' ') == true_name.replace('_', ' ')
    if is_correct:
        correct += 1
    
    color = 'green' if is_correct else 'red'
    ax.set_title(f"True: {true_name.replace('_', ' ')}\nPred: {pred_name.replace('_', ' ')}", 
                 fontsize=9, color=color, fontweight='bold')
    ax.axis('off')

plt.suptitle(f'Face Recognition Results ({correct}/{n_test} correct)', 
             fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print(f"\nTest Accuracy: {correct}/{n_test} = {correct/n_test*100:.1f}%")

---

<a id='part9'></a>
# Part 9: Evaluation & Results

---

In [None]:
# ============================================================
# COMPREHENSIVE EVALUATION
# ============================================================
print("="*70)
print("MODEL COMPARISON")
print("="*70)

# Compare methods
comparison_results = pd.DataFrame({
    'Method': ['Eigenfaces (PCA + SVM)', 'LBPH'],
    'Accuracy': [accuracy_eigen * 100, accuracy_lbph * 100],
    'Training': ['Requires PCA + SVM training', 'Builds histograms per person'],
    'Speed': ['Medium', 'Fast'],
    'Min Samples': ['Many (for good PCA)', 'Few (even 1 per person)'],
})

print("\nMethod Comparison:")
print(comparison_results.to_string(index=False))

# Bar chart
fig, ax = plt.subplots(figsize=(10, 6))

methods = comparison_results['Method']
accuracies = comparison_results['Accuracy']

bars = ax.bar(methods, accuracies, color=['steelblue', 'coral'], edgecolor='black')

for bar, acc in zip(bars, accuracies):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,
            f'{acc:.1f}%', ha='center', fontweight='bold', fontsize=12)

ax.set_ylabel('Accuracy (%)')
ax.set_title('Face Recognition Method Comparison', fontweight='bold', fontsize=14)
ax.set_ylim(0, 100)
ax.axhline(y=90, color='green', linestyle='--', alpha=0.5, label='90% baseline')
ax.legend()

plt.tight_layout()
plt.show()

In [None]:
# Confusion matrix for best model
print("="*70)
print("CONFUSION MATRIX (TOP CLASSES)")
print("="*70)

# Get predictions
best_pred = y_pred_lbph if accuracy_lbph > accuracy_eigen else y_pred_eigen
best_name = "LBPH" if accuracy_lbph > accuracy_eigen else "Eigenfaces"

# Get top N classes by frequency
n_top = 10
top_classes = [label_encoder.transform([c])[0] for c in 
               [x[0] for x in Counter(labels).most_common(n_top)]]

# Filter to top classes
mask = np.isin(y_test, top_classes)
y_test_top = y_test[mask]
best_pred_top = best_pred[mask]

# Confusion matrix
cm = confusion_matrix(y_test_top, best_pred_top, labels=top_classes)

fig, ax = plt.subplots(figsize=(12, 10))

top_names = [label_encoder.inverse_transform([c])[0].replace('_', ' ')[:15] 
             for c in top_classes]

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax,
            xticklabels=top_names, yticklabels=top_names)
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')
ax.set_title(f'Confusion Matrix - {best_name} (Top {n_top} Classes)', fontweight='bold')

plt.tight_layout()
plt.show()

---

<a id='part10'></a>
# Part 10: Summary & Key Takeaways

---

In [None]:
# Final summary
print("="*70)
print("FACE RECOGNITION SYSTEM - SUMMARY")
print("="*70)

print("""
WHAT WE LEARNED:
================

1. FACE DETECTION vs RECOGNITION
   - Detection: WHERE are faces? (bounding boxes)
   - Recognition: WHO is this? (identity)

2. DETECTION METHODS:
   ┌─────────────┬──────────┬──────────┬─────────────┐
   │ Method      │ Type     │ Speed    │ Accuracy    │
   ├─────────────┼──────────┼──────────┼─────────────┤
   │ Haar Cascade│ Classical│ Very Fast│ Medium      │
   │ HOG + SVM   │ ML       │ Fast     │ Good        │
   │ MTCNN       │ DL       │ Medium   │ Very Good   │
   └─────────────┴──────────┴──────────┴─────────────┘

3. RECOGNITION METHODS:
   ┌─────────────┬──────────────────┬─────────────┐
   │ Method      │ How It Works     │ Best For    │
   ├─────────────┼──────────────────┼─────────────┤
   │ Eigenfaces  │ PCA projection   │ Learning    │
   │ LBPH        │ Texture patterns │ Few samples │
   │ Deep Learning│ CNN embeddings  │ Production  │
   └─────────────┴──────────────────┴─────────────┘

4. KEY PARAMETERS TO TUNE:
   - Haar: scaleFactor, minNeighbors, minSize
   - Eigenfaces: n_components (typically 50-150)
   - LBPH: radius, neighbors, grid_x, grid_y

5. FACE EMBEDDINGS:
   - Convert face to 128-D vector
   - Compare using Euclidean or cosine distance
   - Threshold determines match (typically 0.6)
""")

print(f"\nRESULTS ON LFW DATASET:")
print(f"  Eigenfaces: {accuracy_eigen*100:.2f}%")
print(f"  LBPH:       {accuracy_lbph*100:.2f}%")

print("\n" + "="*70)

## Algorithm Taxonomy

### Face Detection Methods

| Method | Year | Type | Key Idea | Parameters |
|--------|------|------|----------|------------|
| **Haar Cascades** | 2001 | Classical | Haar features + AdaBoost cascade | scaleFactor, minNeighbors |
| **HOG + SVM** | 2005 | ML | Gradient histograms + linear SVM | cell_size, block_size |
| **MTCNN** | 2016 | Deep Learning | 3-stage CNN cascade | confidence threshold |

### Face Recognition Methods

| Method | Year | Type | Key Idea | Parameters |
|--------|------|------|----------|------------|
| **Eigenfaces** | 1991 | Statistical | PCA dimensionality reduction | n_components |
| **Fisherfaces** | 1997 | Statistical | LDA class separation | n_components |
| **LBPH** | 2006 | Texture | Local binary patterns | radius, neighbors, grid |
| **FaceNet** | 2015 | Deep Learning | Triplet loss embeddings | embedding_size |
| **ArcFace** | 2019 | Deep Learning | Angular margin loss | margin, scale |

---

## Checklist

- [x] Understood Face Detection vs Recognition
- [x] Learned Haar Cascade parameters and tuning
- [x] Implemented Eigenfaces from scratch
- [x] Implemented LBPH recognition
- [x] Understood face embeddings and similarity
- [x] Built complete recognition pipeline
- [x] Evaluated and compared methods
- [x] Learned parameter tuning for each algorithm

---

## Next Steps

| Step | What to Learn |
|------|---------------|
| 1 | Try deep learning with `face_recognition` library |
| 2 | Implement real-time video recognition |
| 3 | Add face alignment for better accuracy |
| 4 | Deploy as web application |

---

**End of Face Recognition Tutorial**