An open-source pipeline for neural decoding through structure-from-function inference on connectome data.
NEURALDECODE is a six-month, single-author research effort that builds a reproducible pipeline for inferring synaptic connectivity from population activity, benchmarking it against established baselines on open connectome datasets, and probing the latent geometry of mixed selectivity in cortical population codes. Memory is the long-term motivating target; the immediate scope is neural decoding methodology validated cross-species.
The pipeline progresses across four scales — C. elegans (302 neurons, fully resolved), Drosophila via FlyWire (139,000 neurons), mouse visual cortex via MICrONS (200,000 neurons + 523 million synapses + functional calcium imaging), and human macroscale via the Human Connectome Project — with a parallel translational track on sleep EEG (ANPHY) and place-cell decoding (CRCNS).
This repository documents the full plan, knowledge base, simulation code, source-vetting framework, and ongoing methodological reviews — open-source from day one. It is intentionally transparent about limitations: structure-from-function inference cannot validate without ground-truth wiring, animal models are required, and recent theoretical work (Locatello et al., 2019) constrains realistic expectations on disentanglement gain.
Modern neuroscience has accumulated open data that vastly exceeds what any individual lab can analyze:
- MICrONS Consortium 2025 — millimeter-scale cortical connectome with co-registered functional imaging
- FlyWire 2024 — complete adult Drosophila connectome
- Cook et al. 2019 — both-sex C. elegans connectome
- Human Connectome Project — 1,200-subject multimodal dataset
- OpenNeuro / ANPHY — full-night sleep EEG with expert-labeled hypnograms
Yet the field still lacks a clean, reproducible reference pipeline that demonstrates structure-from-function inference end-to-end on these datasets, with strong baselines, honest ablations, and explicit acknowledgment of theoretical constraints. This project aims to fill that gap and deliver the pipeline as a community resource, with publication as a tool/methodology paper.
The framing is deliberate. "Memory decoding" — the long-range scientific question — is presently constrained by the unavailability of cellular-resolution connectomes in humans. The realistic intermediate contribution is a neural decoding pipeline whose cross-species validation lays methodological groundwork for memory-specific extensions in subsequent work.
Link prediction on the 302-neuron connectome (Cook et al., 2019) using GCN, GraphSAGE, and GAT architectures, with formal testing of the like-to-like wiring rule reported by the MICrONS Consortium (2025) and a multifractal analysis exploring whether network topology can be partially recovered from single-neuron activity statistics.
Gate metric: mean held-out edge AUC ≥ 0.7 across three random seeds, with permutation-based test of the like-to-like rule yielding p < 0.001.
A variational graph autoencoder with attention-based message passing (Veličković et al., 2018; Kipf & Welling, 2016) trained on the 139,000-neuron FlyWire connectome, then transferred to the MICrONS subset where co-registered functional calcium imaging is available (≈1,500 neurons with quality-A annotation). The strong baseline is the Pillow et al. (2008) Generalized Linear Model with stimulus and coupling filters, which any new method must match or surpass.
Gate metric: edge AUC ≥ 0.65 on real biological data, with statistically significant preservation of the like-to-like correlation between predicted edge weight and functional similarity.
A four-step decoder pipeline addressing Rigotti et al.'s (2013) mixed-selectivity challenge:
- Theta-phase binning to separate phase-dependent codes (Lisman, 2005; Tort et al., 2010)
- β-VAE with controlled-capacity training (Burgess et al., 2018), augmented with weak supervision from trial labels to circumvent the unsupervised disentanglement impossibility theorem (Locatello et al., 2019)
- Contrastive embedding via NT-Xent loss with context as the discriminating dimension (Chen et al., 2020)
- A hierarchical decoder mirroring the V1 → V4 → IT progression of the visual cortex
Expected gain over the Pillow GLM strong baseline is 5–15 percentage points (revised downward from the original IDEAS document after deep reading of Locatello et al.). Each step is evaluated with explicit ablations on three or more random seeds.
Gate metric: category decoding accuracy ≥ 75 % against a strong baseline ≥ 60 %.
Sharp-wave ripple and sleep-spindle detection on the ANPHY ds005555 dataset using validated YASA tooling (Vallat & Walker, 2021), Bayesian decoding of place-cell activity from the Buzsaki Lab CRCNS recordings as a methodological "Rosetta Stone" with clean labels, and a generative model on resting-state fMRI from the HCP corpus to bridge cellular and macroscale principles.
Gate metric: transfer-learning improvement ≥ 5 pp from place-cell pretraining to mixed-selectivity tasks; HCP latent representations consistent with published Yeo network parcellations within ±10 %.
bioRxiv preprint, GitHub v1.0 release with reproducible Colab notebooks, and submission to a target journal — with the cascade calibrated to the Tool/Insight strategic decision (see notifications/pending/).
This project is run with explicit, machine-enforced commitments designed to preserve credibility:
- Strong baselines first. Every novel method is benchmarked against an established peer-reviewed alternative. The Pillow GLM coupling framework is the obligatory baseline for Phase 2 and 3.
- Pre-registration. Hypotheses, metrics, and decision thresholds are recorded on OSF.io before each phase begins. Statistical tests are specified in advance.
- Source vetting. Every reference cited in the knowledge base, manuscript, or correspondence undergoes a six-step vetting protocol (author affiliation, lab reputation, journal tier, citation pattern, replication, retraction-watch check). Tier 4–5 sources, including company press releases, are excluded by default — see
reading/05_source_vetting.md. - Non-destructive knowledge updates. Insights from new readings are appended to existing knowledge-base chapters as dated subsections; prior content is never overwritten. The full revision history lives in
knowledge_base/CHANGELOG.md. - Multi-level critical reviews. Out-of-the-box reviews at four scopes (experiment, module, phase, project-wide) are conducted on a fixed cadence and at trigger events, recorded in
journal/critical_reviews/with paired action entries inanalysis_journal/. - Honest ablations. All ablation tables report mean ± standard deviation across three or more seeds. Negative results — including cases where a pipeline step contributes negligibly — are reported in the manuscript rather than removed.
- Distributed cognition acknowledged. Following Hutchins (1995), the manuscript will explicitly state that decoding the neural component of cognition is one part of a larger system that includes tools, social interaction, and external scaffolding.
PLAN/ Phase plans (00-overview, 01-phase0 through 06-phase5),
risk register, parallel threads, metrics dashboard
knowledge_base/ 16 chapters of theoretical foundation + non-destructive CHANGELOG
reading/ Master queue of priority papers, textbook list,
source-vetting framework, paper notes, synthesis files
simulations/ Experimental code (exp_NNN_*) with MLflow tracking;
SIMULATIONS_LOG.md aggregates all runs
journal/ Daily logs, weekly reviews, phase checklists,
critical reviews (multi-level), templates
analysis_journal/ Post-review action records — KB updates, decisions,
notification creation
notifications/ Items requiring host-side intervention (install,
sign-in, send, decision)
outreach/ Cold-email templates and per-mentor drafts
skills/ Project-specific working instructions
GLOSSARY.md Plain-language definitions of technical terms
The project targets full reproducibility from a clean workstation:
# 1. Conda environment
brew install miniforge
conda init zsh && source ~/.zshrc
conda create -n neuraldecode python=3.11 -y
conda activate neuraldecode
# 2. Stack
pip install -r requirements.lock.txt # 193 pinned packages
# Includes: torch 2.11.0, torch-geometric 2.7.0, mne 1.12.1,
# mlflow 3.12.0, scikit-learn 1.8.0, caveclient 8.0.1
# 3. Verify
python -c "import torch, torch_geometric, mne, mlflow; \
print('OK', 'MPS:', torch.backends.mps.is_available())"
# Expected on Apple Silicon: OK MPS: True
# 4. MLflow tracking server (separate terminal)
bash simulations/_template/start_mlflow_server.sh
# → http://127.0.0.1:5000
# 5. MICrONS / CAVE access (one-time, requires Google account)
python -c "from caveclient import CAVEclient; \
CAVEclient(server_address='https://global.daf-apis.com').auth.get_new_token()"
# → follow returned URL, paste token into client.auth.save_token()
# → accept Terms of Service for microns_public
# 6. Run baseline experiment
cd simulations/exp_001_openworm_baseline
python run.py --task baselineAll experiments log parameters, metrics, and git commit hashes to MLflow automatically. Random seeds and architecture choices are preserved in config.yaml for each experiment.
Phase 0 (Hygiene & Outreach) is in progress. As of the latest commit:
- Conda environment is reproducible (193 packages locked); Apple Silicon MPS acceleration verified.
- The first end-to-end simulation (
exp_001_openworm_baseline) has completed on a placeholder graph across three architectures (GCN, GraphSAGE, GAT), demonstrating that the training, evaluation, and logging pipeline functions correctly. Numbers on placeholder data are at chance, as expected. - MICrONS / CAVE programmatic access is verified across all twelve materialization versions (releases from June 2021 through March 2026) with full table inventory.
- MLflow tracking is active with run-level git-hash tagging.
- Eight foundational papers have been read at depth, with cross-paper synthesis recorded in
reading/notes/. - Two project-wide critical reviews have been completed; both classify the current trajectory as "yellow — refinements required, no pivot needed."
Detailed run-by-run results live in simulations/SIMULATIONS_LOG.md. Daily progress is tracked in journal/master_checklist.md.
A short selection of papers anchoring the methodological choices. The full vetted reading list (Tier 1–2 sources, ~70 entries) is in reading/02_papers_essential.md.
- Burgess, C. P. et al. (2018). Understanding disentangling in β-VAE. NIPS Workshop. arXiv:1804.03599.
- Cook, S. J. et al. (2019). Whole-animal connectomes of both Caenorhabditis elegans sexes. Nature, 571, 63–71.
- Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. NeurIPS. arXiv:1706.02216.
- Hutchins, E. (1995). Cognition in the Wild. MIT Press.
- Kipf, T. N., & Welling, M. (2016). Variational graph auto-encoders. NIPS Workshop on Bayesian Deep Learning. arXiv:1611.07308.
- Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. ICLR. arXiv:1609.02907.
- Libet, B., Gleason, C. A., Wright, E. W., & Pearl, D. K. (1985). Time of conscious intention to act in relation to onset of cerebral activity. Brain, 106, 623–642.
- Locatello, F. et al. (2019). Challenging common assumptions in the unsupervised learning of disentangled representations. ICML. arXiv:1811.12359.
- MICrONS Consortium et al. (2021/2025). Functional connectomics spanning multiple areas of mouse visual cortex. Nature. doi:10.1101/2021.07.28.454025.
- Pillow, J. W. et al. (2008). Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature, 454, 995–999.
- Rigotti, M. et al. (2013). The importance of mixed selectivity in complex cognitive tasks. Nature, 497, 585–590.
- Schacter, D. L., Addis, D. R., & Buckner, R. L. (2007). Remembering the past to imagine the future: the prospective brain. Nature Reviews Neuroscience, 8, 657–661.
- Veličković, P. et al. (2018). Graph attention networks. ICLR. arXiv:1710.10903.
If this work informs your research, please cite the repository:
@misc{kamidenov2026neuraldecode,
author = {Kamidenov, Sanzhar},
title = {NEURALDECODE: An open-source pipeline for neural decoding
through structure-from-function inference on connectome data},
year = {2026},
url = {https://github.com/Observer7203/NEURALDECODE}
}A CITATION.cff file will be added with the v1.0 release at the close of Phase 5.
Substantive contributions are welcome — particularly methodology critique, strong-baseline implementations, replication attempts on additional datasets, and improvements to co-registration tooling. See CONTRIBUTING.md for the workflow, knowledge-base update protocol, and source-vetting requirements. Issues are the right place to raise methodological concerns before opening a pull request.
Sanzhar Kamidenov — independent researcher, Almaty, Kazakhstan. Email: sanzharkamidenov@gmail.com GitHub: Observer7203
This project is conducted without institutional affiliation, lab funding, or graduate students. Realistic expectations are calibrated accordingly and recorded transparently in memory/project_assessment.md.
MIT — see LICENSE.