Skip to content

alicialitrtwe/sparseconcept

Repository files navigation

Disentangling Superpositions

Code & Data for “Interpretable Brain Encoding Model with Sparse Concept Atoms”

(anonymised companion repository for NeurIPS 2025 submission)


1 Project Scope

This repo hosts the exact pipeline used in the paper:

Stage Script / Module Purpose
Word-embedding pre-processing preprocessing/** Vector-Norm Re-parameterization (VNR) and related steps that reshape GloVe embeddings before any sparse coding (see paper §3.2).
Sparse dictionary training submodule WordEmbVis + scripts/train_sparse_dictionary.py Learns the 1 000 interpretable concept atoms used throughout the study.
Feature preprocessing sparseconcept/preprocessing/store_save_all_features.py Converts transcripts → sparse concept-atom matrices (uses the dictionary above).
Model fitting sparseconcept/model_fitting/fit_banded_ridge_models.py Trains voxel-wise Sparse Concept Encoding Models (banded-ridge).
Evaluation & figures scripts/figure_plotting/* Reproduces every plot and table in the manuscript.

Everything runs with Python 3.9 and open-source packages only.


2 Installation

git clone <repo_url> sparseconcept && cd sparseconcept

# pull the sparse-dictionary helper (≈2 MB)
git submodule update --init --recursive    # brings in WordEmbVis/

python3.9 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

3 Data Download (anonymous)

Dataset Size Anonymous link Destination
fMRI responses, stimulus features, subject mapper ≈ 2.5 GB https://osf.io/2pmrw/?view_only=ef353352b3fc4c3a84407925f32fb9de data/

data/
├── exps
│ └── stories
│ ├── data
│ │ ├── S01
│ │ │ ├── S01_exp-stories_run-onsets.npy
│ │ │ ├── S01_responses_test.npy
│ │ │ └── S01_responses_train.npy
│ │ └── S02
│ │ ├── S02_exp-stories_run-onsets.npy
│ │ ├── S02_responses_test.npy
│ │ └── S02_responses_train.npy
│ └── features
│    ├── stories-numphonemes.npz
│    ├── stories-numwords.npz
│    └── stories-phonemes.npz
├── mappers
│ ├── S01_mappers.hdf
│ └── S02_mappers.hdf
│   ...
│   # ⇨ required by all figure scripts
└── stimuli
├── grids
│ ├── BAD_life_jasedit_corr.TextGrid
│ ├── LonelyBetters2_WH_uncorr.TextGrid
│ ├── ...
└── trfiles
├── alternateithicatom.report
├── avatar.report
├── ...

The .hdf mapper files inside data/mappers/ are required for all plotting scripts. Ensure the OSF download is unpacked into the data/ folder exactly as shown above before running any of the scripts.

Folder explanations

Folder Contents Used by
exps/stories/data/<SUBJECT>/ Raw BOLD matrices (*_train.npy, *_test.npy) and a run-onset vector indicating which TRs belong to which story. Model-fitting scripts (fit_banded_ridge_models.py)
exps/stories/features/ Presaved feature matrix (NPZ). You can replace / extend with your own features. Optional baseline models
mappers/ Subject-specific .hdf files containing sparse matrices that map voxel indices → flat-map pixel positions, plus curvature, ROI masks, etc. All visualisation utilities (plot_flatmap_from_mapper, figure scripts)
stimuli/grids/ Praat TextGrid files with word- and phoneme-level alignments of each story’s audio. Feature-preprocessing step (store_save_all_features.py)
stimuli/trfiles/ Derived .report files with metadata . Feature-preprocessing step (store_save_all_features.py)

Tip: if you add more subjects, place their BOLD files under exps/stories/data/<NEW_ID>/

4 Example Usage

  • python -m sparseconcept.features.store_save_all_features – transform the stimuli into feature matrices
  • python -m sparseconcept.model_fitting.fit_banded_ridge_models S01 --do-nothing – fit a voxelwise encoding model using the raw features (no normalization). Run this once for each subject before plotting.
  • python scripts/figure_plotting/Fig_4_model_performance_scatter.py – reproduce one of the figures
  • python scripts/run_all_figures.py – generate every figure in one go

5 License & Citation

Code and documentation are released under CC BY 4.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages