Decentralized, agent-driven scientific research — powered by competitive benchmarking, cryptographic attestation, and token incentives.
Science has always been limited by the number of researchers, the labs they work in, and the compute they can access. OpenResearch breaks that constraint. It is a protocol that lets anyone — a developer, a GPU holder, a student with a laptop — contribute valid, benchmarked source code improvements to open research problems, earn tokens for doing so, and be trusted without having to be known.
The core insight: if a benchmark can objectively measure the quality of code, then code improvement is a form of proof of work. And if proof of work can be verified cheaply by a trusted execution environment, then you can build a fully decentralized, incentive-aligned research network on top of it.
This is what OpenResearch does.
Andrej Karpathy's autoresearch (March 2026) demonstrated that an AI coding agent, given a research direction and a benchmark, can autonomously run experiments in a tight loop — committing only the changes that beat the current best result, and discarding the rest. In two days of unsupervised runs, it discovered 20 key optimizations yielding an 11% training speedup. Shopify's CEO ran it overnight and got a 19% gain.
The insight was simple and profound: the benchmark is the oracle. The AI does not need to understand why a change is good — it just needs to measure it.
OpenResearch takes this idea and asks: what if instead of one agent on one machine, you had ten thousand agents on ten thousand machines, all competing to find the best improvement, with economic skin in the game?
OpenResearch is not specific to ML. Any domain where a benchmark can objectively score code quality is a valid target.
High-signal domains today:
- ML efficiency — attention mechanisms, quantization, training loops, kernel fusion
- Open source libraries — numerical routines, parsing algorithms, compression codecs
- Bioinformatics — protein folding energy functions, sequence alignment algorithms
- Blockchain — consensus mechanism implementations, ZK proof generation speed
- Compilers — optimization passes, register allocation, instruction scheduling
The unifying property: there must be a deterministic, reproducible benchmark that can score a piece of code in bounded time on bounded hardware. If that exists, OpenResearch can run on it.
┌──────────────────────────────────────────────────────────────────────┐
│ RESEARCHER (Project Creator) │
│ │
│ 1. Installs skills: npx skills add OpenResearchh/skill │
│ 2. Provides a GitHub repository URL │
│ 3. Agent reads + understands codebase, derives protocol.json │
│ 4. Agent runs repo in sandbox → establishes baseline benchmark score │
│ 5. Researcher reviews protocol + baseline, approves or refines │
│ 6. ProjectRegistry creates a project token and publishes the project │
└────────────────────────┬─────────────────────────────────────────────┘
│ Published to 0G Storage + on-chain registry
▼
┌──────────────────────────────────────────────────────────────────────┐
│ PROTOCOL REGISTRY │
│ - Experiment protocol / Statement of Purpose (immutable) │
│ - Benchmark suite (versioned, on-chain hash) │
│ - Current best code + score (mutable, updated on valid commits) │
│ - Token contract (bonding curve) │
└────────────────────────┬─────────────────────────────────────────────┘
│ Miners discover projects
▼
┌──────────────────────────────────────────────────────────────────────┐
│ MINER (Contributor Node) │
│ │
│ 1. Picks a project to mine │
│ 2. AutoResearch loop runs locally (Karpathy-style): │
│ agent → edit code → run benchmark → beat current best? → commit │
│ 3. Stakes compute capital as bond │
│ 4. Submits proposal with stake, code hash, and benchmark proof │
└────────────────────────┬─────────────────────────────────────────────┘
│ Proposal submitted with stake
▼
┌──────────────────────────────────────────────────────────────────────┐
│ VALIDATOR NETWORK (TEE Nodes) │
│ │
│ 1. Reads submitted proposal + benchmark claim │
│ 2. Re-runs benchmark inside Trusted Execution Environment │
│ 3. Cryptographically attests: result matches or does not match │
│ 4. If valid → miner earns project tokens + stake returned │
│ If invalid → stake is slashed, redistributed to validators │
└──────────────────────────────────────────────────────────────────────┘
Users interact with the system through a skill installed into their existing AI coding agent:
# Install all OpenResearch skills from this repository
npx skills add OpenResearchh/skill
# Or install only the project creation skill
npx skills add OpenResearchh/skill --skill autoresearch-create
# Or install only the mining skill (Phase 2 — trials against a finalized protocol)
npx skills add OpenResearchh/skill --skill autoresearch-mine
# Or install only the verifier skill (Phase 2 — ProposalLedger claim / approve / reject)
npx skills add OpenResearchh/skill --skill autoresearch-validateThe skills are portable Agent Skills: each capability is a directory with a SKILL.md file plus any supporting resources. The skills CLI installs them into supported hosts such as Claude Code, Cursor, and Codex.
Shipped today: autoresearch-create helps researchers start from an existing GitHub repository, produce a versioned protocol.json plus program.md, run a baseline, then optionally publish an eligible project on-chain. autoresearch-mine runs the unattended mining loop on a finalized protocol and target checkout: bundled trial harness, trials.jsonl, optional GitHub PRs, optional AXL sidechat, and optional 0G ProjectRegistry frontier reads plus ProposalLedger.submit using contracts vendored under autoresearch-mine/contracts/ (miners do not need autoresearch-create installed at runtime). autoresearch-validate runs unattended verifier workflows against ProposalLedger: resolve miner artifacts by hash via a mandatory artifact index, rerun the bundled harness, apply deterministic static gates, then approve / reject using contracts vendored under autoresearch-validate/contracts/. A planned sibling skill covers status dashboards.
Skills handle the conversational, LLM-assisted workflow where applicable; validate scripts are fully deterministic for settlement paths.
A project starts with a real, existing GitHub repository — not a blank canvas or a description. The researcher provides a repo URL; the skill-assisted agent does the rest.
Researcher provides: "https://github.com/org/some-ml-library"
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ REPO INGESTION (skill-assisted agent) │
│ │
│ ① Clone + read codebase │
│ - Understand structure, core algorithms, existing tests │
│ - Identify what the library does and what it optimizes for │
│ │
│ ② Derive experiment protocol / Statement of Purpose │
│ - What problem does this code solve? │
│ - What are the natural axes of improvement? │
│ (speed, memory, accuracy, throughput, correctness) │
│ - What inputs/outputs define correct behavior? │
│ │
│ ③ Generate benchmark suite from the existing code │
│ - Extract or write a harness that scores the current impl │
│ - Define metrics: FLOPS, latency, perplexity, pass rate… │
│ - Set hardware targets (A100, H100, consumer GPU, CPU) │
│ - Define minimum threshold for a valid improvement │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ SANDBOX EXECUTION (baseline run) │
│ │
│ - Run the existing repo against the generated benchmark │
│ inside an isolated Docker container │
│ - Record the score — this becomes the immutable baseline │
│ - Verify the benchmark is deterministic across 3 runs │
│ - Surface any environment dependencies for miners to replicate │
└─────────────────────────────────────────────────────────────────┘
│
▼
Researcher reviews protocol + baseline score
Refines wording, adjusts metric weights, approves
│
▼
protocol + baseline score hashed and recorded on-chain
(immutable — benchmark spec and baseline cannot be
changed without forking to a new project)
Why an existing repo, not a blank prompt?
Starting from a real codebase gives the protocol credibility — the benchmarks are grounded in code that actually runs, the baseline score is a real measurement not a guess, and miners know exactly what they are improving. It also means the project creator is putting something real on the table, not just an idea. The sandbox run is the moment of truth: if the repo does not run cleanly in isolation, the project creator must fix it before the project can be listed. This keeps the registry populated only with actionable research targets.
The immutability of the protocol is critical. It prevents the project creator from moving the goalposts after miners have invested compute. The benchmark is the contract.
Each project mints its own token via ProjectRegistry.createProject(...), which deploys one ProjectToken bonding-curve contract for that project.
Price
│ ╭─────
│ ╭───╯
│ ╭───╯
│ ╭───╯
│ ╭───╯
│ ──────────╯
└──────────────────────────────────── Supply
As more tokens are bought, price rises automatically.
As tokens are sold, price falls.
The curve is deterministic and transparent.
Miner rewards: Every accepted proposal that improves the benchmark score returns the miner's stake and mints the reward to the proposal's rewardRecipient. Rejected or expired proposals slash the miner's stake across the verifier-pool and burn paths.
The create skill bundles the deployed ABI artifacts and manifest under autoresearch-create/contracts/0g-galileo-testnet/.
| Field | Value |
|---|---|
| Chain | 0G Galileo testnet |
| Chain ID | 16602 |
| RPC | https://evmrpc-testnet.0g.ai |
ProjectRegistry |
0xc84768e450534974C0DD5BAb7c1b695744124136 |
ProposalLedger |
0x701db5f8Ed847651209A438695dfe5520adD6A5A |
VerifierRegistry |
0x257974E406f206BfAEd3abB8D93C232e3226f032 |
Miners use the same addresses from a vendored copy under autoresearch-mine/contracts/0g-galileo-testnet/ (kept in sync with the create skill; no dependency on the create package at runtime).
Publishing an approved project calls:
ProjectRegistry.createProject(
protocolHash,
repoSnapshotHash,
benchmarkHash,
baselineAggregateScore,
baselineMetricsHash,
tokenName,
tokenSymbol,
basePrice,
slope,
minerPoolCap
)
The emitted ProjectCreated event returns the canonical projectId and project token address. Miners then buy that project token, approve ProposalLedger, and submit proposals with a separate rewardRecipient.
Project artifacts live on 0G Storage for content-addressable storage, with on-chain root hashes for permanence and discoverability.
┌──────────────────────────────────────────────────────────────┐
│ ON-CHAIN REGISTRY │
│ │
│ project_id → { │
│ protocol_hash: "0xabc...", // 0G root of protocol │
│ current_best: { │
│ code_root: "0xdef...", // 0G root of code │
│ score: 0.847, // benchmark metric │
│ block: 19284756, // when it was verified │
│ miner: "0xDe3..." // who submitted it │
│ }, │
│ benchmark_root: "0x123...", // 0G root of bench suite │
│ token: "0xTok...", // bonding curve contract │
│ git_log: [...], // ordered list of roots │
│ } │
└──────────────────────────────────────────────────────────────┘
The current create skill uploads the protocol, repo snapshot, benchmark bundle, and baseline metrics artifact to 0G Storage and stores the resulting root hashes on-chain. It also writes storage_0g_galileo.json so miners can retrieve and verify the exact files.
This is the engine. When a miner picks a project, the autoresearch-mine skill bootstraps a local AutoResearch loop against the published protocol and repo checkout (see autoresearch-mine/SKILL.md).
┌──────────────────────────────────────────────────────────────────┐
│ MINER'S LOCAL LOOP │
│ │
│ ① Pull current best code + protocol + benchmark from registry │
│ │
│ ② Start AutoResearch loop: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ current_best_score = fetch_current_best() │ │
│ │ │ │
│ │ while True: │ │
│ │ hypothesis = agent.generate_hypothesis(sop, code) │ │
│ │ new_code = agent.implement(hypothesis, code) │ │
│ │ score = benchmark.run(new_code) │ │
│ │ │ │
│ │ if score > current_best_score: │ │
│ │ code = new_code # keep it │ │
│ │ current_best_score = score │ │
│ │ else: │ │
│ │ pass # discard it │ │
│ │ │ │
│ │ if score > network_best_score: │ │
│ │ prepare_submission() # ready to submit PR │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ③ When miner is satisfied, stake tokens + submit PR │
└──────────────────────────────────────────────────────────────────┘
The competitive mechanic: The network publishes the current best score in real time. Miners are racing each other — submitting before your score advantage disappears is part of the strategy. If someone else submits a better result while you are still iterating, your submission will only be accepted if it still beats the new best.
AXL sidechat: Miners can optionally broadcast compact experiment notes over Gensyn AXL. These messages are written to .autoresearch/mine/sidechat.jsonl as advisory context for future hypotheses; they are not current-best state, proposal evidence, or verifier evidence.
Miner incentive design: Miners choose projects where:
- Token value × expected reward > cost of compute to find improvement
- The current benchmark gap is large enough to make improvement tractable
- The protocol aligns with their agent's strengths (e.g., a CUDA expert targets GPU kernels)
Miners are untrusted. They could fabricate benchmark results. The validator network prevents this.
┌─────────────────────────────────────────────────────────────────────┐
│ VALIDATOR NODE (TEE) │
│ │
│ Hardware: Intel TDX / AMD SEV / AWS Nitro Enclaves │
│ │
│ On proposal submission: │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ [Inside secure enclave — no external code can tamper] │ │
│ │ │ │
│ │ 1. Pull code from 0G Storage (verified by root hash) │ │
│ │ 2. Pull benchmark suite from 0G Storage (verified by root) │ │
│ │ 3. Run benchmark in isolated environment │ │
│ │ 4. Record result │ │
│ │ 5. Sign result with enclave's hardware-derived key │ │
│ │ 6. Publish signed attestation on-chain │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ ProposalLedger gates review to VerifierRegistry allowlisted nodes │
│ If approved: stake returned, reward minted, best score updated │
│ If rejected/expired: stake slashed across verifier pool and burn │
└─────────────────────────────────────────────────────────────────────┘
Why TEE and not ZK proofs? Both are viable. TEEs (specifically Intel TDX and AMD SEV) are chosen as the practical first step because:
- They can run arbitrary code, including GPU benchmarks, without circuit compilation
- Verification is fast and cheap (milliseconds vs. hours for zkML on large models)
- Hardware attestation is already battle-tested in production (AWS, Azure confidential compute)
zkML (zero-knowledge proofs for ML) remains the long-term ideal for fully trustless verification, as it does not require trusting hardware manufacturers. The protocol is designed so zkML validators can be added as an alternative verification path once the tooling matures.
Validator economics: Verifiers must be allowlisted in VerifierRegistry. They claim reviews through ProposalLedger.claimReview(...), then approve or reject after rerunning the benchmark.
| Component | What it does | Technology |
|---|---|---|
| Agent Skills | User-facing entry points for create, baseline, mine, validate (autoresearch-validate), and status flows |
Agent Skills spec + skills CLI |
| Protocol Generator | Reads an existing GitHub repo, derives the research spec, and proposes a benchmark contract | Host coding agent LLM + skill resources |
| Sandbox Runner | Executes the repo + benchmark in an isolated container to produce a verified, deterministic baseline score | Docker / Firecracker |
| Token Contract | Bonding curve token per project with miner rewards pool | Solidity / EVM |
| Protocol Registry | On-chain index of projects, current best scores, and project token addresses | Solidity + 0G Storage |
| AutoResearch Loop | Local agent loop that iterates on code and keeps only improvements (autoresearch-mine + bundled harness) |
Python + AI coding agent |
| AXL Sidechat | Optional miner-to-miner side conversation for experiment notes and warnings | Gensyn AXL raw HTTP |
| Proposal Submission | Packages improved code, benchmark claim, stake, and reward recipient into a transaction | CLI + smart contract |
| TEE Validators | Re-run benchmarks in secure hardware and attest results on-chain | Intel TDX / AMD SEV |
| zkML Path (future) | Cryptographic proof of benchmark execution without trusted hardware | EZKL / zkLLM |
Project created
│
▼
Bonding curve deployed ──────────────────────────────────────────┐
│ │
│ Creator buys initial tokens at curve price │
│ (signals confidence in the project) │
▼ │
Speculators / interested parties buy tokens │
(price rises → project gains visibility and capital) │
│ │
▼ │
Miner stakes tokens → submits proposal │
│ │
├── Validators attest TRUE │
│ │ │
│ ▼ │
│ Miner gets: stake back + token reward from miner pool │
│ Token price rises (supply constant, demand up) │
│ │
└── Validators attest FALSE │
│ │
▼ │
Stake slashed → 50/50 verifier pool and burn paths │
Token price unaffected (no new supply released) │
│
◄────────────────────────────────────────────────────────────┘
As miner pool empties, token scarcity increases
→ incentivizes early mining, rewards pioneers
# Install this repository's skills into supported agents
npx skills add OpenResearchh/skill
# Or install only the create skill
npx skills add OpenResearchh/skill --skill autoresearch-create
# Create a project from an existing GitHub repo
> create an OpenResearch project from https://github.com/your-org/your-repo
# The agent will clone or scan the repo, build a discovery bundle,
# ask the protocol questionnaire, write protocol.json, run a baseline,
# then ask whether to publish to the configured 0G Galileo registry.The create skill includes discovery prompts, schema, questionnaire, baseline runner, program.md renderer, wallet publish flow, and 0G Galileo deployment artifacts under autoresearch-create/.
Use autoresearch-mine with a finalized protocol.json and a checkout of meta.repo, or bootstrap directly from a 0G project id / ProjectToken address. For on-chain mining, the first step is wallet readiness: create an isolated mining keystore with python3 scripts/wallet.py init --id <id> (encrypted with a passphrase), fund the printed address from the user's main wallet, then run check_wallet.py --wallet-id <id> so the skill confirms signing, gas, ProjectToken stake, allowance, and the native value needed to auto-buy missing stake. The skill never asks for ARAH_PRIVATE_KEY; the keystore is decrypted only inside wallet.py so the trial harness — which executes untrusted protocol code inside a podman/docker/bwrap sandbox — cannot reach the key. --reward-recipient should be the user's main wallet so a compromised mining key only loses one trial's stake + gas. ARAH_STAKE is optional and defaults to 1 (ProjectToken has decimals() == 0; the contract only requires stake > 0). The skill bundles the same sandboxed baseline harness as create (vendor/harness/run_baseline.sh + run_in_sandbox.sh), maintains .autoresearch/mine/ (trials.jsonl, network_state.json, sidechat.jsonl), can read the on-chain best score (sync_registry_frontier.py), can exchange AXL sidechat notes, can fetch project inputs from registry/storage (bootstrap_from_registry.py), and automatically creates an on-chain proposal transaction when a trial beats the freshly synced registry best (submit_trial_proposal.py → submit_proposal.py --auto-buy). Install Python chain dependencies only if you use those scripts: pip install -r autoresearch-mine/requirements-chain.txt (prefer a venv). Environment variables and defaults are documented in autoresearch-mine/README.md and autoresearch-mine/SKILL.md.
npx skills add OpenResearchh/skill --skill autoresearch-mineUse autoresearch-validate with an ARAH_ARTIFACT_INDEX (or URL) that maps each on-chain codeHash to downloadable code + benchmark log bytes, a verifier wallet keystore (python3 scripts/wallet.py init --id verifier-1), and the bundled Galileo contracts under autoresearch-validate/contracts/. The verifier address must be allowlisted via VerifierRegistry.isVerifier. Install Python chain dependencies if you use RPC scripts: pip install -r autoresearch-validate/requirements-chain.txt. See autoresearch-validate/README.md and autoresearch-validate/SKILL.md.
npx skills add OpenResearchh/skill --skill autoresearch-validateautoresearch-create/ # Phase 1 — protocol authoring, baseline, publish
autoresearch-mine/ # Phase 2 — mining loop, optional 0G frontier, AXL sidechat + submit
autoresearch-validate/ # Phase 2 — verifier harness + ProposalLedger approve/reject
autoresearch-status/ # planned
| Project | What they do | How we differ |
|---|---|---|
| karpathy/autoresearch | Single-machine autonomous ML experimentation | We decentralize and incentivize it at network scale |
| Bittensor | Decentralized ML with subnet incentives | We focus on code improvement benchmarks, not model inference |
| Gensyn | Distributed ML training with proof-of-learning | We focus on research discovery, not training compute |
| Radicle | Decentralized git and code collaboration | We use similar code hosting primitives with research incentives |
| Nous Research | Distributed open-source model training on Solana | We are domain-agnostic and benchmark-driven, not model-specific |
Bittensor miners serve inference requests — the output is consumed and gone. OpenResearch miners produce improved source code that becomes the permanent baseline every future miner must beat. The network compounds; Bittensor just runs.
Bittensor validators score miners subjectively, which is why validator cartels exist. OpenResearch uses a deterministic benchmark — a number a TEE computes, not an opinion anyone forms. There is nothing to collude around.
MIT — all code contributions to projects on this protocol are open source by default.
Built on the shoulders of Andrej Karpathy's autoresearch, Bittensor's subnet economics, and the broader DeSci movement.