[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Infineon-X/multi-face-rec/blob/main/train.ipynb)  \ntrain.ipynb is compatible with Google Colab.

# Face Training Notebook

This notebook mirrors the project's `train.py` and adds step-by-step explanations. It incrementally builds and updates a face encodings database (`encodings.pkl`) from images stored under the `training/` directory. It also tracks which person folders have already been processed in `trained_folders.csv` to avoid retraining the same images.

## How It Works
- Scans `training/` for subfolders, one per person (folder name = label).
- Optionally loads existing encodings to add new faces incrementally.
- Detects faces in each image and computes embeddings using `face_recognition`.
- Saves updated encodings to `encodings.pkl` and updates `trained_folders.csv`.

If you add a new folder under `training/`, re-run the training cell to append new encodings without retraining previous ones.

## Imports and Configuration
These imports and constants are identical to `train.py`. The `FOLDER_CSV` file records which person folders have already been processed.

In [1]:
import face_recognition
from pathlib import Path
import pickle
import os
import csv

FOLDER_CSV = "trained_folders.csv"


  from pkg_resources import resource_filename


## Tracking Trained Folders
We maintain a simple CSV file containing folder names that have been trained. This prevents reprocessing the same people on subsequent runs (incremental training).

In [2]:
def load_trained_folders(csv_path=FOLDER_CSV):
    trained = set()
    if os.path.exists(csv_path):
        with open(csv_path, newline="") as csvfile:
            reader = csv.reader(csvfile)
            for row in reader:
                if row:
                    trained.add(row[0])
    return trained


def save_trained_folders(trained, csv_path=FOLDER_CSV):
    with open(csv_path, "w", newline="") as csvfile:
        writer = csv.writer(csvfile)
        for folder in sorted(trained):
            writer.writerow([folder])


## Training Function
The `train_faces` function is copied directly from `train.py`.
- If `incremental=True` and `encodings.pkl` exists, the existing encodings are loaded and extended.
- The code identifies new person folders (not listed in `trained_folders.csv`) and processes only those.
- Each image is read, faces are located using the HOG model, and encodings are computed and stored along with the associated person name.
- Results are persisted to `encodings.pkl`, and the folder list is updated.

In [3]:
def train_faces(incremental=True):
    """Train faces and skip folders (people) already trained before."""

    # Load model if exists
    if incremental and os.path.exists("encodings.pkl"):
        print("Loading existing encodings...")
        with open("encodings.pkl", "rb") as f:
            data = pickle.load(f)
            names = data["names"]
            encodings = data["encodings"]
        print(f"Loaded {len(encodings)} known faces.")
    else:
        print("Starting fresh training...")
        names, encodings = [], []

    # Load which folders (names) are already trained
    trained_folders = load_trained_folders()
    print(f"Already trained folders: {trained_folders or 'None'}")

    training_root = Path("training")
    new_folders = [p for p in training_root.iterdir() if p.is_dir() and p.name not in trained_folders]

    if not new_folders:
        print("No new folders to train. Everything is up to date.")
        return

    for folder in new_folders:
        person_name = folder.name
        print(f"\nTraining new folder: {person_name}")

        for fp in folder.glob("*"):
            if fp.is_file():
                img = face_recognition.load_image_file(fp)
                locs = face_recognition.face_locations(img, model="hog")
                codes = face_recognition.face_encodings(img, locs)
                for code in codes:
                    names.append(person_name)
                    encodings.append(code)

        trained_folders.add(person_name)
        print(f"Trained folder '{person_name}' with {len(list(folder.glob('*')))} images.")

    # Save updated encodings
    out = {"names": names, "encodings": encodings}
    with open("encodings.pkl", "wb") as f:
        pickle.dump(out, f)
    print("✅ Updated model saved as encodings.pkl")

    # Save updated trained folder list
    save_trained_folders(trained_folders)
    print(f"✅ Updated folder list saved in {FOLDER_CSV}")


## Run Training
Run the following cell to perform training. It will:
- Load existing encodings if available (when `incremental=True`).
- Process only new person folders under `training/`.
- Save the updated encodings file and CSV.

Tip: Add a new subfolder under `training/` (e.g., `training/alex/`) with a few face images, then rerun this cell to append Alex's encodings without retraining others.

In [4]:
if __name__ == "__main__":
    train_faces(incremental=True)


Loading existing encodings...
Loaded 428 known faces.
Already trained folders: {'mark_zuckerberg', 'saba', 'trump', 'inky'}
No new folders to train. Everything is up to date.
