In [None]:
import pandas as pd

df = pd.read_csv("../test_data/outputs/onlyvelocity100_output5.tsv", sep="\t", decimal=",", low_memory=False)

gt_col = "gt_event_type" if "gt_event_type" in df.columns else "Eye movement type"
pred_col = "ivt_event_type_smoothed" if "ivt_event_type_smoothed" in df.columns else "ivt_event_type"

valid_labels = {"Fixation", "Saccade"}
mask_mismatch = (
    df[gt_col].isin(valid_labels)
    & df[pred_col].isin(valid_labels)
    & (df[gt_col].astype(str) != df[pred_col].astype(str))
)

mismatch_idx = df.index[mask_mismatch].to_list()
len(mismatch_idx)

context = 1
n_total = len(df)

indices = set()
for i in mismatch_idx:
    start = max(0, i - context)
    end = min(n_total - 1, i + context)
    indices.update(range(start, end + 1))

df_ctx = df.loc[sorted(indices)].copy()

cols = []
if "time_ms" in df_ctx.columns:
    cols.append("time_ms")
if "velocity_deg_per_sec" in df_ctx.columns:
    cols.append("velocity_deg_per_sec")
cols += [gt_col, pred_col]

df_min = df_ctx[cols].copy()
df_min = df_min.rename(columns={gt_col: "gt_class", pred_col: "ivt_class"})
if "time_ms" in df_min.columns:
    df_min = df_min.sort_values("time_ms")

df_min.head(60)


Unnamed: 0,time_ms,velocity_deg_per_sec,gt_class,ivt_class
3660,12404,80.88,Fixation,Fixation
3661,12407,92.74,Saccade,Fixation
3662,12410,204.32,Saccade,Saccade
3705,12554,175.96,Saccade,Saccade
3706,12557,121.89,Fixation,Saccade
3707,12560,98.8,Fixation,Fixation
4885,16487,120.12,Saccade,Saccade
4886,16490,85.34,Saccade,Fixation
4887,16494,195.03,Saccade,Saccade
5600,18870,445.69,Saccade,Saccade


# Analysis: 300Hz Dataset Errors

## Key Findings

**Total Errors:** 63/68,582 samples (99.91% accuracy)
- Fixation→Saccade: 35 errors
- Saccade→Fixation: 28 errors

### Pattern 1: Event Boundary Clustering
- **81% of errors** (51/63) occur at event boundaries
- GT transitions: S→S→F (15), F→F→S (12), F→S→S (11), S→F→F (9)
- Most are **last saccade sample** or **first fixation sample** at transitions

### Pattern 2: Near-Threshold Distribution
- Median error velocity: **30.32°/s** (threshold: 30.2°/s)
- 22 errors within 0.5°/s of threshold
- 31 errors within 1.0°/s of threshold
- 39 errors within 2.0°/s of threshold

### Pattern 3: Acceleration Signature
- **Fixation→Saccade errors** (wrongly kept as Fixation):
  - Mean acceleration: **+3,268 deg/s²** (speeding up)
  - 71% have positive acceleration
  - Median: +5,087 deg/s²
  
- **Saccade→Fixation errors** (wrongly kept as Saccade):
  - Mean acceleration: **-202 deg/s²** (slowing down)
  - 54% have negative acceleration
  - Median: -567 deg/s²

### Pattern 4: GT vs Velocity Mismatch
Many errors show **ground truth labels contradicting velocity**:
- GT="Saccade" but v=28.9°/s (below threshold)
- GT="Fixation" but v=31.0°/s (above threshold)
- GT="Saccade" but v=29.9°/s (below threshold)

This suggests **inherent ambiguity** in transition samples where:
1. Human annotators use temporal context (event continuity)
2. Velocity-based classifier uses instantaneous measurement
3. Both are "correct" given their information

## Tested Strategies (All Increased Errors)

1. **Dynamic margin** (dist-based confidence): 62→63 errors ❌
2. **Acceleration rule** (accel>3000 → Saccade): 62→167 errors ❌
3. **Acceleration+context** (accel + neighbors): 62→105 errors ❌
4. **Run-length smoothing** (isolated outlier removal): 62→95 errors ❌

## Conclusion

The remaining 62 errors represent the **accuracy limit** of threshold-based IVT classification:
- Ground truth uses event-level context (transitions span multiple samples)
- IVT uses sample-level velocity (instantaneous, no temporal context)
- Boundary samples are **fundamentally ambiguous** without higher-level features

Further improvement would require:
- Machine learning classifier with temporal features
- Event-level post-processing (HMM, CRF)
- Multi-modal features (acceleration, pupil, eye model confidence)