Text2fMRI: Brain Encoding Models using LLMs

Text2fMRI is a lightweight, tutorial-driven framework for building brain encoding models. It demonstrates how to predict whole-brain fMRI responses solely from video transcripts using Large Language Models (LLMs).

Originally developed for the Advanced Neuroimaging course at the Max Planck Institute for Human Cognitive and Brain Sciences, this repository serves as both an educational guide and a competitive baseline model for the Algonauts 2025 Challenge.

🚀 Quick Start

The easiest way to explore this project is via Google Colab. No local installation is required, and you get access to free GPUs.

Click here to launch the interactive tutorial

In this notebook, you will:

Extract semantic features from text using a frozen LLM (e.g., Qwen-0.5B).
Align text features to fMRI Time Repetition (TR) windows.
Train a linear encoding model (or a lightweight Transformer) to map semantics to brain activity.
Visualize predictions across different brain regions (Visual, Auditory, Language networks).

🧠 Model Description

Despite its lightweight architecture (approx. 52M trainable parameters), Text2fMRI achieves SOTA-level performance in auditory and language-selective cortices.

Input: Text transcripts from movies/videos (No audio or pixel data used).
Backbone: Uses feature extraction from pre-trained causal LLMs.
Dataset: Trained on the CNeuroMods dataset (Friends and Movie10).
Efficiency: Designed to run on consumer hardware while beating standard baselines.

💻 Local Installation

If you prefer to run this locally instead of on Colab, this project uses a pyproject.toml for dependency management.

Clone the repository:

git clone [https://github.com/ShreyDixit/Text2fMRI.git](https://github.com/ShreyDixit/Text2fMRI.git)
cd Text2fMRI

Install dependencies: We recommend using pip or uv.
```
# Using uv
uv sync
```

📚 Course Context

This material was originally created for the Brain Encoding Models lecture/workshop (2025). It bridges the gap between modern NLP (Transformers) and Neuroscience, showing how artificial neural networks can serve as proxy models for biological brains.

Citation

If you use this notebook or code in your research or coursework, please cite:

@software{dixit_2026_text2fmri,
  author       = {Dixit, Shrey},
  title        = {{Text2fMRI: Brain Encoding Models using LLMs (Course Materials)}},
  year         = 2026,
  publisher    = {Zenodo},
  version      = {v0.1.1},
  doi          = {10.5281/zenodo.18369791},
  url          = {https://doi.org/10.5281/zenodo.18369791}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.python-version		.python-version
.zenodo.json		.zenodo.json
Brain_Encoding_Models_using_LLMs.ipynb		Brain_Encoding_Models_using_LLMs.ipynb
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2fMRI: Brain Encoding Models using LLMs

🚀 Quick Start

🧠 Model Description

💻 Local Installation

📚 Course Context

Citation

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Text2fMRI: Brain Encoding Models using LLMs

🚀 Quick Start

🧠 Model Description

💻 Local Installation

📚 Course Context

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 1

Languages

Packages