Paper: CIPHER
CIPHER is an EEG speech-decoding pipeline covering:
- ERP and DDA feature extraction from BIDS EEG,
- multi-task neural decoding for phoneme and articulatory targets,
- matched-split baselines and control analyses,
- automatic generation of publication figures and tables.
- End-to-end pipeline: preprocessing -> training -> evaluation.
- Deterministic defaults for reproducibility (seeded + deterministic backends).
- Baseline suite: chance, LR, LDA, EEGNet, ShallowConvNet, EEG-Conformer.
- WER-focused analyses and sweep scripts for robust model selection.
- preprocess.py: preprocesses raw BIDS EEG into ERP/DDA tensors.
- train_all.py: main CIPHER training entrypoint.
- evaluate_all.py: main evaluation/analysis entrypoint.
- evaluate/run_baselines.py: matched-split baseline benchmarking.
- evaluate/run_wer_baselines_ci.py: WER baselines with bootstrap confidence intervals.
- evaluate/make_paper_figures.py: regenerates paper-ready figures from result tables.
- run_cipher.sh: one-command pipeline runner.
- run_wer_sweep.sh: ERP+DDA WER sweep.
- run_dda_mini_sweep.sh: focused DDA WER sweep.
Python 3.10+ recommended.
Option A (venv, for manual python commands):
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtOption B (conda, recommended if you use run_cipher.sh):
conda create -n cipher python=3.10 -y
conda activate cipher
pip install --upgrade pip
pip install -r requirements.txtIf your GPU/CUDA setup requires a different PyTorch build, install the appropriate wheel after requirements installation.
Dataset page:
Recommended download method (OpenNeuro CLI):
npm install -g openneuro-cli
openneuro download --dataset ds006104 --snapshot 1.0.2 ./ds006104Expected local structure:
ds006104/
derivatives/
eeglab/
sub-P01/
...
sub-S16/
This repository uses deterministic defaults in preprocessing, training, and evaluation:
- fixed random seeds,
- deterministic torch/cudnn settings,
- deterministic DataLoader generator use in training,
- pinned dependencies in requirements.txt.
Recommended seed for paper replication:
python preprocess.py --seed 42
python train_all.py --seed 42
python evaluate_all.py --seed 42For speed-oriented ablations (reduced training budget):
python train_all.py --max-epochs 40 --patience 8 --seed 42
python evaluate_all.py --analysis metrics --analysis wer --dry-run --seed 42Run everything:
bash run_cipher.shSmoke test:
bash run_cipher.sh --dry-runStage-wise execution:
bash run_cipher.sh --stage deps
bash run_cipher.sh --stage wav2vec
bash run_cipher.sh --stage preprocess
bash run_cipher.sh --stage train
bash run_cipher.sh --stage evalpython preprocess.py --skip-existing --seed 42python train_all.py --skip-existing --seed 42Example targeted run (phoneme identity, NULL condition):
python train_all.py \
--task phoneme_identity \
--feature-type all \
--tms null \
--skip-modality \
--seed 42python evaluate_all.py --seed 42Subset of analyses:
python evaluate_all.py --analysis metrics --analysis wer --seed 42Matched-split baselines:
python evaluate/run_baselines.pyWER baseline table with bootstrap CI:
python evaluate/run_wer_baselines_ci.py --n-boot 2000Joint ERP+DDA WER sweep:
bash run_wer_sweep.shFocused DDA mini-sweep:
bash run_dda_mini_sweep.shGenerate publication figures from computed tables:
python evaluate/make_paper_figures.pyOutputs are saved under:
results/figures/paper/
- models_out/: trained checkpoints and logs.
- results/tables/: aggregate metrics, ablations, and control tables.
- results/figures/: evaluation and publication plots.
- results/summary_report.txt: consolidated run summary.
For issues, open a GitHub issue with:
- environment details,
- exact command run,
- full traceback/log snippet,
- expected vs observed behavior.