Disentangling Superpositions

Code & Data for “Interpretable Brain Encoding Model with Sparse Concept Atoms”

(anonymised companion repository for NeurIPS 2025 submission)

1 Project Scope

This repo hosts the exact pipeline used in the paper:

Stage	Script / Module	Purpose
Word-embedding pre-processing	`preprocessing/**`	Vector-Norm Re-parameterization (VNR) and related steps that reshape GloVe embeddings before any sparse coding (see paper §3.2).
Sparse dictionary training	submodule `WordEmbVis` + `scripts/train_sparse_dictionary.py`	Learns the 1 000 interpretable concept atoms used throughout the study.
Feature preprocessing	`sparseconcept/preprocessing/store_save_all_features.py`	Converts transcripts → sparse concept-atom matrices (uses the dictionary above).
Model fitting	`sparseconcept/model_fitting/fit_banded_ridge_models.py`	Trains voxel-wise Sparse Concept Encoding Models (banded-ridge).
Evaluation & figures	`scripts/figure_plotting/*`	Reproduces every plot and table in the manuscript.

Everything runs with Python 3.9 and open-source packages only.

2 Installation

git clone <repo_url> sparseconcept && cd sparseconcept

# pull the sparse-dictionary helper (≈2 MB)
git submodule update --init --recursive    # brings in WordEmbVis/

python3.9 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

3 Data Download (anonymous)

Dataset	Size	Anonymous link	Destination
fMRI responses, stimulus features, subject mapper	≈ 2.5 GB	https://osf.io/2pmrw/?view_only=ef353352b3fc4c3a84407925f32fb9de	`data/`


data/
├── exps
│ └── stories
│ ├── data
│ │ ├── S01
│ │ │ ├── S01_exp-stories_run-onsets.npy
│ │ │ ├── S01_responses_test.npy
│ │ │ └── S01_responses_train.npy
│ │ └── S02
│ │ ├── S02_exp-stories_run-onsets.npy
│ │ ├── S02_responses_test.npy
│ │ └── S02_responses_train.npy
│ └── features
│    ├── stories-numphonemes.npz
│    ├── stories-numwords.npz
│    └── stories-phonemes.npz
├── mappers
│ ├── S01_mappers.hdf
│ └── S02_mappers.hdf
│   ...
│   # ⇨ required by all figure scripts
└── stimuli
├── grids
│ ├── BAD_life_jasedit_corr.TextGrid
│ ├── LonelyBetters2_WH_uncorr.TextGrid
│ ├── ...
└── trfiles
├── alternateithicatom.report
├── avatar.report
├── ...

The .hdf mapper files inside data/mappers/ are required for all plotting scripts. Ensure the OSF download is unpacked into the data/ folder exactly as shown above before running any of the scripts.

Folder explanations

Folder	Contents	Used by
`exps/stories/data/<SUBJECT>/`	Raw BOLD matrices (`_train.npy`, `_test.npy`) and a run-onset vector indicating which TRs belong to which story.	Model-fitting scripts (`fit_banded_ridge_models.py`)
`exps/stories/features/`	Presaved feature matrix (NPZ). You can replace / extend with your own features.	Optional baseline models
`mappers/`	Subject-specific `.hdf` files containing sparse matrices that map voxel indices → flat-map pixel positions, plus curvature, ROI masks, etc.	All visualisation utilities (`plot_flatmap_from_mapper`, figure scripts)
`stimuli/grids/`	Praat TextGrid files with word- and phoneme-level alignments of each story’s audio.	Feature-preprocessing step (`store_save_all_features.py`)
`stimuli/trfiles/`	Derived `.report` files with metadata .	Feature-preprocessing step (`store_save_all_features.py`)

Tip: if you add more subjects, place their BOLD files under exps/stories/data/<NEW_ID>/

4 Example Usage

python -m sparseconcept.features.store_save_all_features – transform the stimuli into feature matrices
python -m sparseconcept.model_fitting.fit_banded_ridge_models S01 --do-nothing – fit a voxelwise encoding model using the raw features (no normalization). Run this once for each subject before plotting.
python scripts/figure_plotting/Fig_4_model_performance_scatter.py – reproduce one of the figures
python scripts/run_all_figures.py – generate every figure in one go

5 License & Citation

Code and documentation are released under CC BY 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
WordEmbVis @ ce5f9cc		WordEmbVis @ ce5f9cc
data/stimuli		data/stimuli
scripts		scripts
sparseconcept		sparseconcept
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disentangling Superpositions

Code & Data for “Interpretable Brain Encoding Model with Sparse Concept Atoms”

1 Project Scope

2 Installation

3 Data Download (anonymous)

Folder explanations

4 Example Usage

5 License & Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Disentangling Superpositions

Code & Data for “Interpretable Brain Encoding Model with Sparse Concept Atoms”

1 Project Scope

2 Installation

3 Data Download (anonymous)

Folder explanations

4 Example Usage

5 License & Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages