(anonymised companion repository for NeurIPS 2025 submission)
This repo hosts the exact pipeline used in the paper:
| Stage | Script / Module | Purpose |
|---|---|---|
| Word-embedding pre-processing | preprocessing/** |
Vector-Norm Re-parameterization (VNR) and related steps that reshape GloVe embeddings before any sparse coding (see paper §3.2). |
| Sparse dictionary training | submodule WordEmbVis + scripts/train_sparse_dictionary.py |
Learns the 1 000 interpretable concept atoms used throughout the study. |
| Feature preprocessing | sparseconcept/preprocessing/store_save_all_features.py |
Converts transcripts → sparse concept-atom matrices (uses the dictionary above). |
| Model fitting | sparseconcept/model_fitting/fit_banded_ridge_models.py |
Trains voxel-wise Sparse Concept Encoding Models (banded-ridge). |
| Evaluation & figures | scripts/figure_plotting/* |
Reproduces every plot and table in the manuscript. |
Everything runs with Python 3.9 and open-source packages only.
git clone <repo_url> sparseconcept && cd sparseconcept
# pull the sparse-dictionary helper (≈2 MB)
git submodule update --init --recursive # brings in WordEmbVis/
python3.9 -m venv venv && source venv/bin/activate
pip install -r requirements.txt| Dataset | Size | Anonymous link | Destination |
|---|---|---|---|
| fMRI responses, stimulus features, subject mapper | ≈ 2.5 GB | https://osf.io/2pmrw/?view_only=ef353352b3fc4c3a84407925f32fb9de | data/ |
data/
├── exps
│ └── stories
│ ├── data
│ │ ├── S01
│ │ │ ├── S01_exp-stories_run-onsets.npy
│ │ │ ├── S01_responses_test.npy
│ │ │ └── S01_responses_train.npy
│ │ └── S02
│ │ ├── S02_exp-stories_run-onsets.npy
│ │ ├── S02_responses_test.npy
│ │ └── S02_responses_train.npy
│ └── features
│ ├── stories-numphonemes.npz
│ ├── stories-numwords.npz
│ └── stories-phonemes.npz
├── mappers
│ ├── S01_mappers.hdf
│ └── S02_mappers.hdf
│ ...
│ # ⇨ required by all figure scripts
└── stimuli
├── grids
│ ├── BAD_life_jasedit_corr.TextGrid
│ ├── LonelyBetters2_WH_uncorr.TextGrid
│ ├── ...
└── trfiles
├── alternateithicatom.report
├── avatar.report
├── ...
The .hdf mapper files inside data/mappers/ are required for all plotting scripts.
Ensure the OSF download is unpacked into the data/ folder exactly as shown above
before running any of the scripts.
| Folder | Contents | Used by |
|---|---|---|
exps/stories/data/<SUBJECT>/ |
Raw BOLD matrices (*_train.npy, *_test.npy) and a run-onset vector indicating which TRs belong to which story. |
Model-fitting scripts (fit_banded_ridge_models.py) |
exps/stories/features/ |
Presaved feature matrix (NPZ). You can replace / extend with your own features. | Optional baseline models |
mappers/ |
Subject-specific .hdf files containing sparse matrices that map voxel indices → flat-map pixel positions, plus curvature, ROI masks, etc. |
All visualisation utilities (plot_flatmap_from_mapper, figure scripts) |
stimuli/grids/ |
Praat TextGrid files with word- and phoneme-level alignments of each story’s audio. | Feature-preprocessing step (store_save_all_features.py) |
stimuli/trfiles/ |
Derived .report files with metadata . |
Feature-preprocessing step (store_save_all_features.py) |
Tip: if you add more subjects, place their BOLD files under
exps/stories/data/<NEW_ID>/
python -m sparseconcept.features.store_save_all_features– transform the stimuli into feature matricespython -m sparseconcept.model_fitting.fit_banded_ridge_models S01 --do-nothing– fit a voxelwise encoding model using the raw features (no normalization). Run this once for each subject before plotting.python scripts/figure_plotting/Fig_4_model_performance_scatter.py– reproduce one of the figurespython scripts/run_all_figures.py– generate every figure in one go
Code and documentation are released under CC BY 4.0.