# 2024 Validation Metrics & Brain-to-Text 2025 Evaluation Demo

This notebook summarizes the reproduction of the 2024 validation metrics
using the archived prediction samples and walks through the new Kaggle-ready
evaluation pipeline built around `eval_competition.py`.

## Load archived predictions

The `notebooks/samples/` directory contains the CIBR 2024 validation targets
and two baseline model hypotheses. We parse them to compute character and
word error rates using `jiwer`.

In [None]:
from pathlib import Path

from jiwer import cer, wer

def find_repo_root(start: Path) -> Path:
    for candidate in (start,) + tuple(start.parents):
        if (candidate / 'eval_competition.py').exists():
            return candidate
    raise RuntimeError('Could not locate repository root from ' + str(start))

PROJECT_ROOT = find_repo_root(Path.cwd())
SAMPLES_DIR = PROJECT_ROOT / 'notebooks' / 'samples'
reference_path = SAMPLES_DIR / 'target_test.txt'
model_paths = {
    'Model 1 test': SAMPLES_DIR / 'model1_test.txt',
    'Model 2 test': SAMPLES_DIR / 'model2_test.txt',
}

with reference_path.open() as handle:
    references = [line.strip() for line in handle]

metrics = []
for name, pred_path in model_paths.items():
    with pred_path.open() as handle:
        predictions = [line.strip() for line in handle]
    metrics.append({
        'model': name,
        'CER': cer(references, predictions),
        'WER': wer(references, predictions),
    })

metrics


## Generate a Kaggle-formatted submission

The helper `save_submission` function (defined in `eval_competition.py`) is
now responsible for formatting predictions for the Brain-to-Text 2025
competition. We reuse the 2024 validation references as example predictions
to illustrate how `.csv` files and zipped archives are produced.

In [None]:
import sys

if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

from eval_competition import save_submission

demo_dir = PROJECT_ROOT / 'notebooks' / 'samples' / 'demo_submission'
submission_path, zip_path = save_submission(
    predictions=references[:5],
    output_dir=str(demo_dir),
    submission_format='csv',
    compress=True,
)
submission_path, zip_path


## Inspect the submission preview

The `.csv` file uses the `id,text` schema required by Kaggle.

In [None]:
submission_path.read_text().splitlines()[:6]
