
# 👁️ **Step 1: Visual Validation of Blink Annotations Against Signal Data**

This is the **first step** in the blink analysis pipeline. It focuses on **visually verifying** that the manually labeled blink events—annotated by a human expert in **CVAT**—are properly aligned with physiological signal data:

* **EAR (Eye Aspect Ratio)** time series
* **EEG/EOG signals**

---

## 🎯 **Purpose of This Step**

Before performing any analysis, we need to **validate the accuracy** of the blink annotations. These labels were created **manually in CVAT**, by identifying:

* 🟢 When the blink **starts**
* 🔴 When the eye is **fully closed** (minimum EAR)
* 🟢 When the blink **ends**

These annotations are made on a **frame basis** (e.g., 30 Hz video) and must be **mapped** to high-frequency sample-based physiological signals (e.g., EEG at 500–1000 Hz).
This phase ensures the labels visually tally with the signal data before any downstream processing.

---

## ✨ **What This Script Does**

* Loads raw `.fif` eye-tracking + EEG data and CVAT `.zip` annotations
* Extracts blink intervals and converts them to time-series sample indices
* Plots:

  * EAR and EEG signals
  * Blink annotations (start, min, end) overlaid on signals
* Exports visual PDF reports for inspection

---

## 📦 **Inputs Required**

| File                      | Description                                       |
| ------------------------- | ------------------------------------------------- |
| `S01_20170519_043933.fif` | Raw physiological data (EEG, EAR, etc.)           |
| `S01_20170519_043933.zip` | Blink annotations from CVAT (JSON/XML inside ZIP) |

---

## 🧠 **How Blink Intervals Are Extracted**

The function `extract_blink_durations(...)` handles the critical step of **converting frame-based CVAT annotations** into **sample indices** that align with high-frequency time-series data (e.g., EEG):

```python
def extract_blink_durations(annotation_df, frame_offset, sfreq, video_fps):
    ...
```

### 🛠️ What It Does:

* Processes blink annotations that appear in **triplets**:
  `start → min → end`
* Subtracts a **frame offset** (if any cropping or indexing shift is needed)
* Converts CVAT **video frame indices** to **sample indices** for time series, using:

  ```
  sample_index = (frame_index - offset) * (sfreq / video_fps)
  ```

### 📤 Output Columns:

| Column                               | Description                                           |
| ------------------------------------ | ----------------------------------------------------- |
| `startFrame`, `endFrame`, `minFrame` | Original CVAT frame indices                           |
| `startBlinks_cvat`, `...`            | Adjusted frame indices after offset                   |
| `startBlinks`, `...`                 | Final sample indices aligned to time series           |
| `blink_type`                         | Type/category of blink (e.g., 'blink', 'long\_blink') |

✅ This ensures that annotation markers precisely line up with the raw signals.

---

---

## 📘 **Overview of the 3 Steps**

| Step | Description                                      |
| ---- | ------------------------------------------------ |
| 1️⃣  | Visualize overall time-series signal (EAR + EEG) |
| 2️⃣  | Plot each blink with start, min, and end markers |
| 3️⃣  | Save a consolidated visual report using MNE      |

---

## 🧪 **Step 1a: Plot the Full Time-Series Signal**

This helps you check for:

* Overall quality of the data
* Expected variations in EAR and EEG/EOG
* Regions of interest for blinks

```python
raw.plot(
    picks=['avg_ear', 'E8'],
    block=True,
    show_scrollbars=False,
    title='avg_ear Blink Signal'
)
```

---

## 🧩 **Step 1b: Inspect Blink Intervals One-by-One**

Each blink is annotated by a triplet: `start`, `min`, `end`. This step plots those over the EAR signal for **detailed inspection**.

```python
for _, row in blink_df.iterrows():
    plot_with_annotation_lines(
        raw=raw,
        start_frame=row['startBlinks'],
        end_frame=row['endBlinks'],
        mid_frame=row['blink_min'],
        picks='avg_ear',
        sfreq=sfreq,
    )
```

This lets you manually verify that annotations are consistent with the physiological signal dips and recoveries.

---

## 📊 **Step 1c: Generate HTML Report with MNE**

Instead of viewing blink events one by one, this step **automatically generates a consolidated HTML report** using **MNE’s reporting tool**.

```python
generate_blink_reports(
    raw=raw,
    blink_df=blink_df,
    picks='avg_ear',
    sfreq=sfreq,
    output_dir='blink_reports',
    base_filename='blink_report',
    max_events_per_report=40
)
```

### 📂 Output:

```
blink_reports/
└── blink_report.html
```

The report provides a scrollable interface to review blink events in bulk with annotation markers.



---

## ⚙️ **Behind the Scenes: Blink Frame Mapping**

The function `extract_blink_durations(...)` ensures frame-based CVAT labels are aligned with high-frequency signal data:

```python
sample_index = (frame_index - offset) * (sfreq / video_fps)
```

This mapping converts frame labels to time-series sample indices that can be plotted accurately.

---

## 📌 **Why This Step Is Critical**

✅ Confirms label–signal alignment
✅ Flags mislabeled or shifted annotations early
✅ Builds confidence in the dataset before training models or running statistics

---

## ⚠️ **Reminder**

> This is a **quality control step**.
> Visual inspection is **mandatory** to catch any misalignments between annotations and raw signals.


In [None]:
from direct_blink_properties.util import load_fif_and_annotations,extract_blink_durations
from direct_blink_properties.viz import generate_blink_reports,plot_with_annotation_lines

# We will use subject S01 from the dataset
fif_path = r"C:\Users\balan\IdeaProjects\pyblinker_optimize_gpt\data_new_pipeline\S01_20170519_043933.fif"
zip_path = r"C:\Users\balan\IdeaProjects\pyblinker_optimize_gpt\data_new_pipeline\S01_20170519_043933.zip"


# Load data
raw, annotation_df = load_fif_and_annotations(fif_path, zip_path)
# Extract blink intervals
frame_offset=5
video_fps=30
sfreq = raw.info['sfreq']
blink_df = extract_blink_durations(annotation_df,frame_offset,sfreq,video_fps)


# get the sampling rate

# Get overview about the time series data
raw.plot(
    picks=['avg_ear','E8'],
    block=True,
    show_scrollbars=False,
    title='avg_ear Blink Signal'
)


# Get a plot by plotting the blink signal with the annotation lines
# ⚠️ WARNING: Only plotting the first 10 blinks for visual inspection
print("⚠️ WARNING: Only plotting the first 10 blink events...")
for _, row in  blink_df.head(10).iterrows():
    plot_with_annotation_lines(
        raw=raw,
        start_frame=row['startBlinks'],
        end_frame=row['endBlinks'],
        mid_frame=row['blink_min'],
        picks='avg_ear',
        sfreq=sfreq ,
    )

# Generate a report for the blink signal

generate_blink_reports(
    raw=raw,
    blink_df=blink_df,
    picks='avg_ear',
    sfreq=sfreq ,
    output_dir='blink_reports',
    base_filename='blink_report',
    max_events_per_report=40
)

Generate a report for the blink signal

generate_blink_reports(
    raw=raw,
    blink_df=blink_df,
    picks='avg_ear',
    sfreq=sfreq ,
    output_dir='blink_reports',
    base_filename='blink_report',
    max_events_per_report=40
)