Caster Subnet

Consensus made scalable.

Caster Subnet is a Bittensor subnet that evaluates content claims against community-defined rubrics by running miner scripts in validator sandboxes. Validators aggregate results into weights and submit them on-chain.

Start here

Validator operators: see validator/README.md
Miner developers: see miner/README.md
Miner SDK reference: see packages/miner-sdk/README.md

Install dependencies (local dev)

uv sync --all-packages --dev

How evaluation works (roles + flow)

Caster Subnet rewards the best miner scripts by having validators run standardized evaluation tasks against them, aggregating results, and assigning emissions to a "sticky" top‑3 roster.

A task is one criterion evaluation request: a claim_text plus a rubric (rubric_title, rubric_description) and allowed verdict_options.

Exact task contract (JSON)

Miners implement the evaluate_criterion entrypoint. Validators call it with this payload:

{
  "claim_text": "Caster Subnet validators manage sandboxed miners.",
  "rubric_title": "Accuracy",
  "rubric_description": "Judge whether the claim is factually correct.",
  "verdict_options": [
    { "value": -1, "description": "Fail" },
    { "value": 1, "description": "Pass" }
  ]
}

Your script must return:

{
  "verdict": 1,
  "justification": "...",
  "citations": [
    { "receipt_id": "tool-receipt-id", "result_id": "search-result-id" }
  ]
}

Notes:

verdict must be one of the provided verdict_options[].value.
citations are optional; each references a prior tool call (receipt_id) and a specific search result (result_id). Validators hydrate citation details from tool receipts.

Dig deeper

How the evaluation dataset is built

Claims are anchored to factoids from the last 24 hours.
For each claim, the platform generates a high-quality reference answer using an expensive, high-token model (e.g. gpt-5.2-xhigh).
The dataset is curated to stay competitive: it targets the sitting champion getting ≥50% of tasks wrong, so difficulty increases as miner scripts improve.
The dataset generation will be open sourced in the near future to allow for community contributions and reproducibility.

How miners are evaluated

Miners submit scripts that must match the reference answer while operating under a much tighter budget.
Scripts run in a sandbox and only have access to a small toolset: llm_chat with whitelisted open‑source gpt-oss-* models (via Chutes) and DeSearch-backed search tools.

Validator flow + gating

The platform sends evaluation batches to validators; validators run scripts × tasks and report grades.
Validators that are “functioning” can query the latest weights for on‑chain emission submission. Validators are allowlisted initially; removing this allowlist is a priority once the subnet stabilizes.

Roles:

Miners submit Python agent scripts that evaluate claims against rubrics
Validators execute miner scripts in sandboxed containers and grade results
Platform coordinates runs, aggregates grades, and computes weights
Bittensor records weights on-chain for emission distribution

sequenceDiagram
    participant Platform
    participant Validator
    participant Sandbox
    participant Bittensor

    Platform->>Validator: 1) Run (tasks, scripts)
    Validator->>Sandbox: 2) Execute script × task
    Sandbox-->>Validator: 3) Miner response
    Validator-->>Platform: 5) Task grades
    Validator->>Bittensor: 6) submit_weights

What we optimize for

Cost-efficient & permissionless — scales beyond centralized alternatives
Fair & transparent — community-defined rubrics, decentralized execution
Quick & precise — faster feedback loops than manual review

Sticky roster keeps incentives honest

We open-source miner scripts. Without a guardrail, a copycat can clone the current best script and win #2/#3 with minimal changes.

The sticky top‑3 roster prevents that:

Only a challenger that beats the current #1 can enter the roster.
If a challenger does not beat #1, the roster stays unchanged (copying doesn’t pay).
If a challenger beats #1, the roster shifts: challenger → #1, old #1 → #2, old #2 → #3.

Repo layout

public/
  miner/                # miner-facing CLI tooling (test + submit)
  validator/            # validator runtime + operator docs
  sandbox/              # sandbox runtime (run by validators, not miners)
  packages/
    miner-sdk/          # SDK imported by miner scripts
    commons/            # shared utilities (sandbox runner, tools, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
commons/tests/infrastructure/sandbox		commons/tests/infrastructure/sandbox
docs/api		docs/api
miner		miner
packages		packages
sandbox		sandbox
validator		validator
.coveragerc		.coveragerc
.env.example		.env.example
.ruff.toml		.ruff.toml
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv-workspace.toml		uv-workspace.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caster Subnet

Start here

Install dependencies (local dev)

How evaluation works (roles + flow)

What we optimize for

Sticky roster keeps incentives honest

Repo layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

caster-hub/caster

Folders and files

Latest commit

History

Repository files navigation

Caster Subnet

Start here

Install dependencies (local dev)

How evaluation works (roles + flow)

What we optimize for

Sticky roster keeps incentives honest

Repo layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages