[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Infineon-X/multi-face-rec/blob/main/train.ipynb)  \n
train.ipynb is compatible with Google Colab.

# Face Training Notebook

This notebook mirrors the project's `train.py` and adds step-by-step explanations. It incrementally builds and updates a face encodings database (`encodings.pkl`) from images stored under the `training/` directory. It also tracks which person folders have already been processed in `trained_folders.csv` to avoid retraining the same images.

## How It Works
- Scans `training/` for subfolders, one per person (folder name = label).
- Optionally loads existing encodings to add new faces incrementally.
- Detects faces in each image and computes embeddings using `face_recognition`.
- Saves updated encodings to `encodings.pkl` and updates `trained_folders.csv`.

If you add a new folder under `training/`, re-run the training cell to append new encodings without retraining previous ones.

## Colab Setup (Optional)
If running on Google Colab: install deps, optionally mount Drive, and set training/output paths.

In [5]:
# Detect Colab and install dependencies
import importlib.util, os
IN_COLAB = importlib.util.find_spec('google.colab') is not None
if IN_COLAB:
    # Lightweight install; face-recognition pulls dlib
    !pip -q install cmake face-recognition
    print('Dependencies installed for Colab')
else:
    print('Running locally (no Colab installs)')


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.1/100.1 MB[0m [31m26.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for face-recognition-models (setup.py) ... [?25l[?25hdone
Dependencies installed for Colab


In [19]:
# Optionally mount Google Drive to read/write data
# Uncomment if your training images live on Drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
import os
os.environ["TRAINING_DIR"] = "/content/drive/MyDrive/infineon-project/training"
os.environ["OUTPUT_DIR"] = "/content/drive/MyDrive/infineon-project/training-outputs"



Mounted at /content/drive


In [20]:
# Configure paths (override via env if desired)
BASE_DIR = '/content' if 'IN_COLAB' in globals() and IN_COLAB else os.getcwd()
TRAINING_DIR = os.environ.get('TRAINING_DIR', str(os.path.join(BASE_DIR, 'training')))
OUTPUT_DIR = os.environ.get('OUTPUT_DIR', BASE_DIR)
print('Training dir:', TRAINING_DIR)
print('Output dir  :', OUTPUT_DIR)


Training dir: /content/drive/MyDrive/infineon-project/training
Output dir  : /content/drive/MyDrive/infineon-project/training-outputs


## Imports and Configuration
These imports and constants are identical to `train.py`. The `FOLDER_CSV` file records which person folders have already been processed.

In [21]:
import face_recognition
from pathlib import Path
import pickle
import os
import csv

FOLDER_CSV = "trained_folders.csv"


## Tracking Trained Folders
We maintain a simple CSV file containing folder names that have been trained. This prevents reprocessing the same people on subsequent runs (incremental training).

In [None]:
def load_trained_folders(csv_path=FOLDER_CSV):
    trained = set()
    if os.path.exists(csv_path):
        with open(csv_path, newline="") as csvfile:
            reader = csv.reader(csvfile)
            for row in reader:
                if row:
                    trained.add(row[0])
    return trained


def save_trained_folders(trained, csv_path=FOLDER_CSV):
    with open(csv_path, "w", newline="") as csvfile:
        writer = csv.writer(csvfile)
        for folder in sorted(trained):
            writer.writerow([folder])


## Training Function
The `train_faces` function is copied directly from `train.py`.
- If `incremental=True` and `encodings.pkl` exists, the existing encodings are loaded and extended.
- The code identifies new person folders (not listed in `trained_folders.csv`) and processes only those.
- Each image is read, faces are located using the HOG model, and encodings are computed and stored along with the associated person name.
- Results are persisted to `encodings.pkl`, and the folder list is updated.

In [None]:
def train_faces(incremental=True, training_dir: str = TRAINING_DIR, output_dir: str = OUTPUT_DIR):
    """Train faces and skip folders (people) already trained before.

    Args:
        incremental: If True, load existing encodings and append new ones.
        training_dir: Directory containing person subfolders with images.
        output_dir: Directory to write `encodings.pkl` and `trained_folders.csv`.
    """

    encodings_path = os.path.join(output_dir, 'encodings.pkl')
    csv_path = os.path.join(output_dir, FOLDER_CSV)

    # Load model if exists
    if incremental and os.path.exists(encodings_path):
        print('Loading existing encodings...')
        with open(encodings_path, 'rb') as f:
            data = pickle.load(f)
            names = data['names']
            encodings = data['encodings']
        print(f'Loaded {len(encodings)} known faces.')
    else:
        print('Starting fresh training...')
        names, encodings = [], []

    # Load which folders (names) are already trained
    trained_folders = load_trained_folders(csv_path)
    print(f"Already trained folders: {trained_folders or 'None'}")

    training_root = Path(training_dir)
    new_folders = [p for p in training_root.iterdir() if p.is_dir() and p.name not in trained_folders]

    if not new_folders:
        print('No new folders to train. Everything is up to date.')
        return

    for folder in new_folders:
        person_name = folder.name
        print(f'\nTraining new folder: {person_name}')

        for fp in folder.glob('*'):
            if fp.is_file():
                img = face_recognition.load_image_file(fp)
                locs = face_recognition.face_locations(img, model='hog')
                codes = face_recognition.face_encodings(img, locs)
                for code in codes:
                    names.append(person_name)
                    encodings.append(code)

        trained_folders.add(person_name)
        print(f"Trained folder '{person_name}' with {len(list(folder.glob('*')))} images.")

    # Save updated encodings
    out = {'names': names, 'encodings': encodings}
    os.makedirs(output_dir, exist_ok=True)
    with open(encodings_path, 'wb') as f:
        pickle.dump(out, f)
    print('✅ Updated model saved as encodings.pkl')

    # Save updated trained folder list
    save_trained_folders(trained_folders, csv_path)
    print(f'✅ Updated folder list saved in {FOLDER_CSV}')


## Run Training
Run the following cell to perform training. It will:
- Load existing encodings if available (when `incremental=True`).
- Process only new person folders under `training/`.
- Save the updated encodings file and CSV.

Tip: Add a new subfolder under `training/` (e.g., `training/alex/`) with a few face images, then rerun this cell to append Alex's encodings without retraining others.

In [None]:
# Run training (uses TRAINING_DIR and OUTPUT_DIR above)
train_faces(incremental=True)


Loading existing encodings...
Loaded 428 known faces.
Already trained folders: {'mark_zuckerberg', 'saba', 'trump', 'inky'}
No new folders to train. Everything is up to date.
