# Build Window-Level Features

Aggregate frame-level facial features (EAR, MAR, etc.) into window-level samples and add temporal features (blink/yawn ratios, EAR dynamics) that can be used by traditional machine learning models for drowsiness recognition.

The label definition follows the UTA-RLDD (https://sites.google.com/view/utarldd/home) dataset description:
- Each original video belongs to one of three classes:  
  - **alert** (labeled as `0`)  
  - **low vigilant** (labeled as `5`)  
  - **drowsy** (labeled as `10`)  
- In this project, we **only use the alert (`0`) and drowsy (`10`) classes**, and ignore the low vigilant (`5`) class.

---

## 1. Inputs

- A CSV file with **frame-level features**, produced by `extract_frame_features.ipynb`:
  - `exported_data/features_frame_level.csv`

## 2. Outputs

---

- A CSV file with **window-level features**, saved to:
  - `exported_data/features_window_level.csv`

Each row in the window-level CSV corresponds to one **time window** within a given video, with the following columns:

- `subject_id`: subject/folder identifier
- `video_id`: video identifier
- `window_id`: integer window index within the video
- `ear_mean_mean`: mean EAR over all frames in the window
- `ear_mean_std`: standard deviation of EAR in the window
- `mar_mean`: mean MAR over all frames in the window
- `mar_std`: standard deviation of MAR in the window
- `blink_ratio`: fraction of frames in the window where `EAR < BLINK_EAR_THRESH` (proxy for eye closure -> blinking)
- `yawn_ratio`: fraction of frames in the window where `MAR > YAWN_MAR_THRESH` (proxy for mouth opening -> yawning)
- `ear_diff_mean`: mean absolute frame-to-frame change in EAR within the window (captures dynamics of eye opening/closing)
- `num_frames`: number of frames contributing to this window
- `label`: **window-level drowsiness label**

---

## 3. Main objectives of this notebook

1. Load the frame-level feature table from `exported_data/features_frame_level.csv` and sort frames by `subject_id`, `video_id`, and `frame_idx`.

2. For each `(subject_id, video_id)` sequence:
   - Assign a `frame_order` index within the video.
   - Group frames into fixed-length time windows using a configurable `WINDOW_SIZE`, and assign a `window_id` to each frame.

3. For each window (unique combination of `subject_id`, `video_id`, `window_id`):
   - Aggregate frame-level EAR/MAR into summary statistics:
     - `ear_mean_mean`, `ear_mean_std`
     - `mar_mean`, `mar_std`
   - Compute temporal features:
     - `blink_ratio` based on `EAR < BLINK_EAR_THRESH`
     - `yawn_ratio` based on `MAR > YAWN_MAR_THRESH`
     - `ear_diff_mean` as the mean absolute difference of EAR between consecutive frames
   - Count `num_frames` in the window.
   - Assign a **window-level label**
     - `video_id` starting with `"0"` → label `0` (alert)
     - `video_id` starting with `"10"` → label `10` (drowsy)

5. Create the `exported_data/` directory if it does not exist and save the final window-level feature table as `features_window_level.csv`, which will be used as input for traditional machine learning models for drowsiness recognition.

---

## Codes

---

### 1. Import modules

In [1]:
import os
from pathlib import Path

import numpy as np
import pandas as pd


### 2. Parameters

In [None]:
# How many frames per time window
WINDOW_SIZE = 60
# STRIDE = 60 -> non-overlapping, stride == window_size
# FPS, which is read from video metadata, ≈ 30
# window_len_sec = WINDOW_SIZE / FPS = 2s

# Thresholds for blink -> yawn detection 
BLINK_EAR_THRESH = 0.21   # EAR < this -> eye considered "closed"
YAWN_MAR_THRESH  = 0.60   # MAR > this -> mouth considered "open / yawn"

### 4. Load frame-level features

In [3]:
path = r"exported_data\features_frame_level.csv"
print("Loading frame-level features from:", path)

df = pd.read_csv(path)

# Basic sanity check
expected_cols = {"subject_id", "video_id", "frame_idx", "ear_mean", "mar"}
missing = expected_cols - set(df.columns)
if missing:
    raise ValueError(f"Missing expected columns in frame-level CSV: {missing}")

# Ensure proper sorting by video & time
df = df.sort_values(["subject_id", "video_id", "frame_idx"]).reset_index(drop=True)

Loading frame-level features from: exported_data\features_frame_level.csv


### 5. Create frame_order and window_id within each video

In [4]:
# frame_order
df["frame_order"] = df.groupby(["subject_id", "video_id"]).cumcount()

# window_id
df["window_id"] = (df["frame_order"] // WINDOW_SIZE).astype(int)

### 6. Define aggregation function 

In [5]:
def agg_window(group: pd.DataFrame) -> pd.Series:
    """
    Aggregate certain frames in one window into a single feature vector.
    `group` is a subset of df for one (subject_id, video_id, window_id).
    """
    g = group.sort_values("frame_idx")

    ear = g["ear_mean"].values
    mar = g["mar"].values

    # Basic statistics
    ear_mean = ear.mean()
    ear_std  = ear.std()
    mar_mean = mar.mean()
    mar_std  = mar.std()

    # blink_ratio: fraction of frames where EAR < threshold
    blink_ratio = (ear < BLINK_EAR_THRESH).mean()

    # yawn_ratio: fraction of frames where MAR > threshold
    yawn_ratio = (mar > YAWN_MAR_THRESH).mean()

    # ear_diff_mean: mean |EAR_t - EAR_{t-1}|
    if len(ear) > 1:
        ear_diff_mean = np.abs(np.diff(ear)).mean()
    else:
        ear_diff_mean = 0.0

    # label: majority vote within the window
    vid = str(g["video_id"].iloc[0])
    if vid.startswith("10"):
        label = "drowsy" # 10
    elif vid.startswith("0"):
        label = "alert" # 0
    else:
        label = None

    return pd.Series({
        "ear_mean_mean": ear_mean,
        "ear_mean_std": ear_std,
        "mar_mean": mar_mean,
        "mar_std": mar_std,
        "blink_ratio": blink_ratio,
        "yawn_ratio": yawn_ratio,
        "ear_diff_mean": ear_diff_mean,
        "num_frames": len(g),
        "label": label,
    })


### 7. Group by (`subject_id`, `video_id`, `window_id`) and aggregate

In [6]:
group_cols = ["subject_id", "video_id", "window_id"]
print("Aggregating windows with groupby on:", group_cols)

window_df = (
    df
    .groupby(group_cols, as_index=False)
    .apply(agg_window)
    .reset_index()  
)

for col in ["level_0", "index"]:
    if col in window_df.columns:
        window_df = window_df.drop(columns=[col])


Aggregating windows with groupby on: ['subject_id', 'video_id', 'window_id']


  .apply(agg_window)


### 8. Save as .csv

In [7]:
out_path = r"exported_data\features_window_level.csv"
window_df.to_csv(out_path, index=False)

print("Saved window-level dataset to:", out_path)
print("Num windows:", len(window_df))
print(window_df.head())

Saved window-level dataset to: exported_data\features_window_level.csv
Num windows: 17149
   subject_id     video_id  window_id  ear_mean_mean  ear_mean_std  mar_mean  \
0           1  0_landmarks          0       0.283420      0.012871  0.133524   
1           1  0_landmarks          1       0.297406      0.016750  0.134261   
2           1  0_landmarks          2       0.287027      0.006120  0.134697   
3           1  0_landmarks          3       0.290294      0.008107  0.133690   
4           1  0_landmarks          4       0.291793      0.012964  0.134424   

    mar_std  blink_ratio  yawn_ratio  ear_diff_mean  num_frames  label  
0  0.003331          0.0         0.0       0.004260          60  alert  
1  0.001269          0.0         0.0       0.005684          60  alert  
2  0.001130          0.0         0.0       0.001528          60  alert  
3  0.001217          0.0         0.0       0.002574          60  alert  
4  0.000981          0.0         0.0       0.003801          60 