English ยท ็ฎไฝไธญๆ ยท Espaรฑol ยท Franรงais ยท Deutsch ยท ๆฅๆฌ่ช ยท ํ๊ตญ์ด ยท Portuguรชs ยท ะ ัััะบะธะน
Give it a paper. It tells you whether the results actually reproduce.
ReproRun is a portable AI-agent skill โ usable by any compatible AI coding assistant โ that automates the painful path from "a paper claims X" to "X actually runs on my machine." Papers ship beautiful numbers; reproducing them usually dies in broken code, unbuildable environments, and version drift. ReproRun handles the whole pipeline end to end:
read paper โ find code & data โ build environment โ smoke test โ full run โ compare measured numbers against the paper's claims.
ReproRun is designed to adapt itself to whatever it's handed, instead of assuming one fixed setup:
- Cross-OS โ runs on Windows, macOS, and Linux
- Cross-language โ full pipeline for Python; minimal path for R / MATLAB / Julia
- Cross-domain โ single-cell biology, image ML, and more, with no per-domain rewiring
- Cross-mode โ tool mode (
pip install+ write a calling script) or experiment mode (clone the repo & run its scripts) โ auto-selected per paper
- Find a paper from just its title โ auto web-search for PDF & code repo
- Auto-diagnose environment bit-rot โ numpy ABI clashes, torchvision API changes, deprecated pandas methodsโฆ detected and fixed automatically
- No guessing parameters โ inspects function signatures after install
- 5-round dependency self-healing loop โ classify error โ targeted fix โ re-verify, up to 5 rounds
- Paper isolation โ every paper gets its own output namespace
One orchestrator (SKILL.md) drives 6 specialized agents:
| Agent | Role |
|---|---|
| A ยท paper-reader | Extract the numerical claims to reproduce |
| B ยท resource-finder | Locate code repo & datasets |
| C ยท environment-builder | Build & repair the runtime (most complex) |
| D ยท smoke-tester | Quick smoke test โ confirm it runs |
| E ยท full-runner | Full reproduction run |
| F ยท result-comparator | Compare measured vs. claimed, item by item |
ReproRun has been run end to end on real papers across domains. It doesn't just rubber-stamp "success" โ for every paper it returns an honest verdict: numbers reproduced, pipeline reproduced, or can't reproduce as-is โ always with the root cause.
| Paper | Domain | Result |
|---|---|---|
| UMAP (McInnes 2018, JOSS) | dimensionality reduction | โ Numbers reproduced โ 11/14 k-NN accuracy metrics match within ยฑ0.01; MNIST & Fashion-MNIST confirmed to 3 decimals |
| scVelo (Bergen 2020, Nat Biotech) | single-cell | โ Pipeline reproduced โ caught a numpy 2.x ABI bug causing 100% NaN, fixed by downgrading to 1.26.4 |
| Robust Stitching (Ruiz 2023, ICML) | image ML | โ
Pipeline reproduced โ repaired 5 bit-rot breakages (torchvision API, pandas append, missing deps) |
| Annotatability (Nitzan 2024, Nat Comp Sci) | single-cell | โ
Pipeline reproduced โ 6 API-debug rounds surfaced a missing pooch dependency |
| ScType (Ianevski 2022, Nat Comms) | single-cell (R) | โ Pipeline reproduced โ R path validated after a version downgrade |
| Cropformer (Wang 2025, Plant Communications) | crop genomics |
6 papers ยท 1 full-metric reproduction ยท 4 pipeline reproductions ยท 1 honest partial ยท 18 verified skill improvements
Honest by design. UMAP landed at 78.6% metric match โ just below our 80% "clean reproduce" bar โ and ReproRun reports it as a data contradiction rather than rounding up. For Cropformer, the framework runs end to end, but the published numbers need real crop-genome data the repo never ships โ so it's flagged Partial, not Pass.
Case study โ Cropformer (โ ๏ธ Partial reproduction)
Verified โ
- Repo found & cloned (
jiekesen/Cropformer; the paper's URL was wrong) - Environment built โ Python 3.10 + PyTorch 2.5.1 + CUDA 12.1
- Model architecture โ Conv1d + 8-head self-attention (2.6M params)
- GPU inference on RTX 4090; pretrained weights load & run
- Training loop converges โ loss 89,540 โ 23,771
Could not reproduce โ
- Paper metrics (PCC=0.92, โฆ) โ repo ships only 10 random demo samples, not real crop data
- Classification task โ
model_class.pyis missing key functions - Nested cross-validation, MIC feature selection, 0โ9 SNP encoding โ not implemented in the repo
Root cause: the public repo is demo-only; the full pipeline (MIC selection, nested CV, Optuna tuning) and the real datasets are not included. A faithful reproduction would need the real crop-genome data โ PLINK processing โ reimplementing the described pipeline (~1โ2 weeks of data + compute work).
ReproRun is an agent skill that works with any compatible AI coding assistant. To use it:
- Use an AI coding agent that can load skills.
- Place the
paper-reproduction/folder where your agent loads skills. - In a session, just ask โ e.g. "reproduce scVelo" or "reproduce Table 2 from this paper." The skill triggers automatically.
| Role | Member |
|---|---|
| Chief Architect | @WUBING2023 |
| Development Engineer | @TXZ-star |
| Test Engineer | @qaqcrane |
| Operations | @wanzi5872-oss |
Non-commercial use only. You are free to use and modify ReproRun for non-commercial purposes (research, study, personal projects). Commercial use is not permitted without prior permission.
License: PolyForm Noncommercial License 1.0.0 ยท ไธญๆ็๏ผๆฅ็ไธญๆๅ่ฎฎ โ
v1.0.0 ยท stable (maintenance mode)