# Improved Face Recognition with DeepFace (ArcFace Model)
This notebook provides an improved face recognition solution using the DeepFace library, which integrates state-of-the-art models like ArcFace. This approach is designed to be more robust against facial changes (e.g., beard growth, skin tone variations, eyebrow modifications, nose alterations) and to better distinguish between identical twins, compared to simpler methods like YOLO-based classification.
**Note:** Achieving 100% accuracy in real-world face recognition, especially with identical twins and significant facial alterations, is an extremely challenging, if not practically impossible, goal. While state-of-the-art models like ArcFace offer very high accuracy, they are still subject to limitations based on data quality, environmental factors, and the inherent similarities between individuals.

## 1. Setup and InstallationRun the following cell to install the necessary libraries. This may take a few minutes.

In [None]:
!pip install deepface rich numpy opencv-python splitfolders

## 2. Mount Google Drive (Optional)If your dataset is stored in Google Drive, run the following cell to mount your Drive. Otherwise, you can upload your dataset directly to the Colab environment.

In [None]:
from google.colab import drive
import os

if not os.path.exists("/content/drive"):
    drive.mount("/content/drive")
else:
    print("Google Drive is already mounted.")

## 3. Configuration and Data Preparation**Important:** You need to prepare your dataset. The `DATA_DIR` should point to a directory containing subfolders, where each subfolder represents a unique identity (person), and contains images of that person.
Example directory structure:```
your_dataset_folder/
  Person_A/
    image1.jpg
    image2.png
  Person_B/
    image3.jpg
    image4.jpeg
```
Update `DATA_DIR` and `OUT_DIR` below to match your setup.

In [None]:
import random
import shutil
from pathlib import Path
import numpy as np
from rich import print as rprint
import splitfolders

# --- User Configuration ---
# Path to your dataset directory. Replace with your actual path.
# Example: `DATA_DIR = '/content/drive/MyDrive/your_face_dataset'` if using Google Drive
DATA_DIR = './your_dataset_folder' # <--- IMPORTANT: CHANGE THIS PATH

# Output directory for processed data (e.g., train/val/test splits)
OUT_DIR = './face_recognition_data_split'

# DeepFace Model Configuration
# Recommended models for high accuracy: "ArcFace", "Facenet512"
# Detector backends: "opencv", "ssd", "dlib", "mtcnn", "retinaface", "mediapipe"
# "retinaface" is generally robust.
FACE_RECOGNITION_MODEL = "ArcFace"
DETECTOR_BACKEND = "retinaface"

# Set seeds for reproducibility
SEED = 42
random.seed(SEED)
np.random.seed(SEED)

rprint("[bold red]DATA PREPARATION AND CONFIGURATION[/bold red]")

# Data splitting (Train/Validation/Test)
if os.path.exists(OUT_DIR):
    shutil.rmtree(OUT_DIR)

rprint("[bold cyan]Creating dataset split (80/10/10 for train/val/test)...[/bold cyan]")
try:
    splitfolders.ratio(
        DATA_DIR,
        output=OUT_DIR,
        seed=SEED,
        ratio=(0.8, 0.1, 0.1),  # Train/Val/Test split
        group_prefix=None,
        move=False,
    )
    rprint("[bold green]Dataset split created successfully![/bold green]")
except Exception as e:
    rprint(f"[red]Error creating dataset split: {str(e)}. Please ensure DATA_DIR is correct and contains subfolders with images.[/red]")

# Dataset statistics
for split in ["train", "val", "test"]:
    split_dir = os.path.join(OUT_DIR, split)
    if os.path.exists(split_dir):
        classes = [p for p in os.listdir(split_dir) if os.path.isdir(os.path.join(split_dir, p))]
        total_imgs = sum(len(os.listdir(os.path.join(split_dir, cls))) for cls in classes)
        rprint(f"{split.upper()}: {len(classes)} classes, {total_imgs:,} images")



## 4. Face Recognition Implementation with DeepFaceThis section defines the function to perform face recognition using DeepFace and provides a conceptual example of how to use it. In a real application, you would iterate through your test images and compare them against your training (or a separate enrollment) dataset.

In [None]:
from deepface import DeepFace

# Function to perform face recognition
def recognize_faces(img_path, db_path):
    try:
        # DeepFace.find returns a list of dataframes, one for each detected face in the img_path
        # We are interested in the identity and distance.
        dfs = DeepFace.find(
            img_path=img_path,
            db_path=db_path,
            model_name=FACE_RECOGNITION_MODEL,
            detector_backend=DETECTOR_BACKEND,
            distance_metric="cosine", # Cosine similarity is common for ArcFace
            enforce_detection=False # Set to True if you want to skip images where no face is detected
        )
        return dfs
    except Exception as e:
        rprint(f"[red]Error during face recognition for {img_path}: {str(e)}[/red]")
        return []

rprint("[bold yellow]DEMONSTRATING FACE RECOGNITION WITH DEEPFACE:[/bold yellow]")

# --- Conceptual Usage Example ---
# This section demonstrates how you would use the DeepFace library.
# Replace `test_image_path` with an actual image from your test set or a new image.
# The `db_path` should be your training data directory (or a dedicated enrollment database).

# Example: Get a sample image from the test set for demonstration
test_data_path = os.path.join(OUT_DIR, "test")
test_image_path = None
if os.path.exists(test_data_path) and os.listdir(test_data_path):
    sample_person_dir = os.path.join(test_data_path, os.listdir(test_data_path)[0])
    if os.path.exists(sample_person_dir) and os.listdir(sample_person_dir):
        test_image_path = os.path.join(sample_person_dir, os.listdir(sample_person_dir)[0])

if test_image_path and os.path.exists(test_image_path):
    rprint(f"[cyan]Attempting to recognize faces in: {test_image_path}[/cyan]")
    # The db_path is the directory containing subfolders of known identities (your training data)
    results = recognize_faces(test_image_path, os.path.join(OUT_DIR, "train"))

    if results:
        rprint("[bold green]Recognition Results:[/bold green]")
        for df in results:
            if not df.empty:
                # `identity` column contains the path to the matched image in the database.
                # We can extract the person's name from the path.
                matched_identity = Path(df['identity'].iloc[0]).parent.name
                distance = df['distance'].iloc[0]
                rprint(f"  - Detected face matched with: [green]{matched_identity}[/green] (Distance: {distance:.4f})")
            else:
                rprint("  - No match found for a detected face.")
    else:
        rprint("[yellow]No faces detected or no matches found.[/yellow]")
else:
    rprint("[red]No test image found for demonstration. Please ensure your DATA_DIR is populated and the dataset split is successful.[/red]")


## 5. Further Improvements and ConsiderationsTo achieve the highest possible accuracy, especially with challenging cases like identical twins and significant facial changes, consider the following:
*   **Large and Diverse Dataset:** A very large and diverse dataset covering various facial changes (e.g., different hairstyles, presence/absence of facial hair, aging) and numerous identical twin pairs is crucial for training and evaluating robust models.*   **Fine-tuning Pre-trained Models:** Fine-tuning pre-trained models (like ArcFace) on your specific dataset can significantly improve performance for your particular use case.*   **Active Learning/Human-in-the-Loop:** For critical applications, consider incorporating active learning or human-in-the-loop processes to handle ambiguous cases or verify challenging identifications (e.g., twin verification).*   **Real-time Optimization:** For real-time applications, optimize model inference speed using techniques like ONNX Runtime or NVIDIA TensorRT.*   **Robust Preprocessing:** Implement robust face detection and alignment as a preprocessing step to ensure consistent input to the recognition model.*   **Advanced Data Augmentation:** Explore advanced data augmentation techniques that specifically simulate facial changes (e.g., adding synthetic beards, changing skin tones, altering hairstyles) to make the model more invariant to these variations.*   **Multi-modal Approaches:** For twin recognition, consider multi-modal approaches that combine face recognition with other biometric modalities like voice recognition or gait analysis.