# Extract Frame-level Features
Convert MediaPipe Face Mesh landmark outputs into a frame-level feature table that can be used later for window-level aggregation and traditional machine learning models for drowsiness recognition.

---

## 1. Inputs
- A root directory `data/` containing per-video landmark results:
  - `*_landmarks.npz`
      - NumPy archive with:
          - `landmarks`: array of shape (T, 468, 3)
              - T = number of processed frames
              - 468 = MediaPipe Face Mesh points
              - 3 = (x, y, z) normalized coordinates
          - `frame_indices`: array of shape (T,) with original frame indices
  - `*_meta.json`
      - JSON metadata for each video, containing at least:
          - `video_path`: original video path (not necessarily used as real path)
          - `fps`, `total_frames_est`, `processed_frames`, etc.

--

## 2. Outputs
- A CSV file with frame-level features, saved to:
  - `exported_data/features_frame_level.csv`

Each row in the CSV corresponds to one valid frame (where a face was detected), with the following columns:

- `subject_id`: folder name containing the npz/json (e.g. "01")
- `video_id`: npz file stem (e.g. "0_landmarks")
- `frame_idx`: original frame index from `frame_indices`
- `ear_left`: left eye aspect ratio (EAR)
- `ear_right`: right eye aspect ratio (EAR)
- `ear_mean`: mean of left/right EAR
- `mar`: mouth aspect ratio (MAR)
- `label`: placeholder for drowsiness label (alert / low / drowsy)
          (currently None; to be filled later based on video grouping)

---

## 3. Main objectives of this notebook
1. Recursively scans the given `data/` directory for all files matching `*_landmarks.npz`.

2. For each `*_landmarks.npz`:
   - Loads the `landmarks` array (T, 468, 3) and `frame_indices`.
   - Loads the corresponding `*_meta.json`.
   - Derives `subject_id` and `video_id` from the npz file path.
   - For each frame:
       - Skips frames where all landmarks are NaN (no face detected).
       - Computes:
           - eye aspect ratio (EAR) for left and right eye
           - mouth aspect ratio (MAR)
       - Stores these features together with `subject_id`, `video_id`, `frame_idx` and a placeholder `label`.

3. Concatenates all per-video DataFrames into one large frame-level feature table using pandas.

4. Creates the `exported_data/` directory if it does not exist and saves the final table as `features_frame_level.csv`.

---

## Codes

---

### 1. Import modules and database 

In [6]:
import os
import numpy as np
import json
import pandas as pd
from pathlib import Path

npz_path = "data/01/0_landmarks.npz"
data = np.load(npz_path)
print(data.files)   

landmarks = data[data.files[0]]  
print(landmarks.shape)

['landmarks', 'frame_indices']
(18053, 468, 3)


### 2. Define helpers for computation later

In [7]:
# Set index: a commonly used set of eye/mouth points in MediaPipe Face Mesh
LEFT_EYE_IDX  = [33, 160, 158, 133, 153, 144]
RIGHT_EYE_IDX = [263, 385, 387, 362, 380, 373]
MOUTH_IDX     = [13, 14, 78, 308, 82, 312]  # Upper/lower lip + left/right mouth corners

# fonction: get the distance in 2D
def dist2d(a, b):
    return np.linalg.norm(a[:2] - b[:2])

# fonction: compute EAR 
def eye_aspect_ratio(pts, idx):
    # EAR formule： (‖p2-p6‖ + ‖p3-p5‖) / (2‖p1-p4‖ +1e-6)
    p1, p2, p3, p4, p5, p6 = [pts[i] for i in idx]
    vertical = dist2d(p2, p6) + dist2d(p3, p5)
    horizontal = 2.0 * dist2d(p1, p4) + 1e-6
    return vertical / horizontal

# fonction：compute MAR
def mouth_aspect_ratio(pts, idx):
    # MAR formule： (‖p_up1 - p_down1‖ + ‖p_up2-p_down2‖) / (2‖p_left-p_right‖ +1e-6)
    p_up1, p_down1, p_left, p_right, p_up2, p_down2 = [pts[i] for i in idx]
    vertical = dist2d(p_up1, p_down1) + dist2d(p_up2, p_down2)
    horizontal = 2.0 * dist2d(p_left, p_right) + 1e-6
    return vertical / horizontal

def compute_frame_features(pts):
    """
    pts: (468, 3) numpy array for ONE frame
    return: dict of features for this frame
    """
    if np.isnan(pts).all():
        return None  

    ear_left = eye_aspect_ratio(pts, LEFT_EYE_IDX)
    ear_right = eye_aspect_ratio(pts, RIGHT_EYE_IDX)
    mar = mouth_aspect_ratio(pts, MOUTH_IDX)

    return {
        "ear_left": ear_left,
        "ear_right": ear_right,
        "ear_mean": (ear_left + ear_right) / 2.0,
        "mar": mar,
    }


### 3. Extract frame level features from videos

In [8]:
def compute_features_for_video(npz_path, meta_path):
    # read landmarks &  frame_indices
    data = np.load(npz_path)
    landmarks = data["landmarks"]        # (T, 468, 3)
    frame_indices = data["frame_indices"]  # (T,)

    # read meta.json
    with open(meta_path, "r", encoding="utf-8") as f:
        meta = json.load(f)

    p = Path(npz_path)
    subject_id = p.parent.name      # data/01/0_landmarks.npz -> "01"
    video_id = p.stem               # "0_landmarks"

    rows = []
    T = landmarks.shape[0]

    for i in range(T):
        pts = landmarks[i]  # (468, 3)
        feats = compute_frame_features(pts)
        if feats is None:
            continue  # skip the frame without face

        row = {
            "subject_id": subject_id,
            "video_id": video_id,
            "frame_idx": int(frame_indices[i]),
            "label": meta.get("label", None),  
        }
        row.update(feats)
        rows.append(row)

    df = pd.DataFrame(rows)
    return df

npz_path = "data/01/0_landmarks.npz"
meta_path = "data/01/0_meta.json"

df_one = compute_features_for_video(npz_path, meta_path)
print(df_one.head())
print("Frames with features:", len(df_one))


  subject_id     video_id  frame_idx label  ear_left  ear_right  ear_mean  \
0         01  0_landmarks          1  None  0.150208   0.414041  0.282124   
1         01  0_landmarks          2  None  0.129768   0.413464  0.271616   
2         01  0_landmarks          3  None  0.130547   0.411373  0.270960   
3         01  0_landmarks          4  None  0.126204   0.413346  0.269775   
4         01  0_landmarks          5  None  0.128361   0.414641  0.271501   

        mar  
0  0.134435  
1  0.133739  
2  0.132525  
3  0.133026  
4  0.132456  
Frames with features: 18049


### 4. Concatenates all per-video DataFrames into one large frame-level feature table `exported_data/features_frame_level.csv`

In [9]:
root = Path("data") 
all_dfs = []

for npz_path in root.rglob("*_landmarks.npz"):
    meta_path = npz_path.with_name(npz_path.name.replace("_landmarks.npz", "_meta.json"))
    print("Processing:", npz_path)
    
    df_video = compute_features_for_video(npz_path, meta_path)
    all_dfs.append(df_video)

df_all = pd.concat(all_dfs, ignore_index=True)
print("Total frame-level samples:", len(df_all))

# save as csv
export_dir = "exported_data"
os.makedirs(export_dir, exist_ok=True)

output_path = os.path.join(export_dir, "features_frame_level.csv")
df_all.to_csv(output_path, index=False)

print("Saved to:", output_path)

Processing: data\01\0_landmarks.npz
Processing: data\01\10_landmarks.npz
Processing: data\02\0_landmarks.npz
Processing: data\02\10_landmarks.npz
Processing: data\03\0_landmarks.npz
Processing: data\03\10_landmarks.npz
Processing: data\04\0_landmarks.npz
Processing: data\04\10_landmarks.npz
Processing: data\05\0_landmarks.npz
Processing: data\05\10_landmarks.npz
Processing: data\06\0_landmarks.npz
Processing: data\06\10_landmarks.npz
Processing: data\07\0_landmarks.npz
Processing: data\07\10_landmarks.npz
Processing: data\08\0_landmarks.npz
Processing: data\08\10_landmarks.npz
Processing: data\09\0_landmarks.npz
Processing: data\09\10_landmarks.npz
Processing: data\10\0_landmarks.npz
Processing: data\10\10_landmarks.npz
Processing: data\11\0_landmarks.npz
Processing: data\11\10_landmarks.npz
Processing: data\12\0_landmarks.npz
Processing: data\12\10_landmarks.npz
Processing: data\13\0_landmarks.npz
Processing: data\13\10_landmarks.npz
Processing: data\14\0_landmarks.npz
Processing: dat