Consensus made scalable.
Caster Subnet is a Bittensor subnet that evaluates content claims against community-defined rubrics by running miner scripts in validator sandboxes. Validators aggregate results into weights and submit them on-chain.
- Validator operators: see
validator/README.md - Miner developers: see
miner/README.md - Miner SDK reference: see
packages/miner-sdk/README.md
uv sync --all-packages --devCaster Subnet rewards the best miner scripts by having validators run standardized evaluation tasks against them, aggregating results, and assigning emissions to a "sticky" top‑3 roster.
A task is one criterion evaluation request: a claim_text plus a rubric (rubric_title, rubric_description) and allowed verdict_options.
Exact task contract (JSON)
Miners implement the evaluate_criterion entrypoint. Validators call it with this payload:
{
"claim_text": "Caster Subnet validators manage sandboxed miners.",
"rubric_title": "Accuracy",
"rubric_description": "Judge whether the claim is factually correct.",
"verdict_options": [
{ "value": -1, "description": "Fail" },
{ "value": 1, "description": "Pass" }
]
}Your script must return:
{
"verdict": 1,
"justification": "...",
"citations": [
{ "receipt_id": "tool-receipt-id", "result_id": "search-result-id" }
]
}Notes:
verdictmust be one of the providedverdict_options[].value.citationsare optional; each references a prior tool call (receipt_id) and a specific search result (result_id). Validators hydrate citation details from tool receipts.
Dig deeper
How the evaluation dataset is built
- Claims are anchored to factoids from the last 24 hours.
- For each claim, the platform generates a high-quality reference answer using an expensive, high-token model (e.g.
gpt-5.2-xhigh). - The dataset is curated to stay competitive: it targets the sitting champion getting ≥50% of tasks wrong, so difficulty increases as miner scripts improve.
- The dataset generation will be open sourced in the near future to allow for community contributions and reproducibility.
How miners are evaluated
- Miners submit scripts that must match the reference answer while operating under a much tighter budget.
- Scripts run in a sandbox and only have access to a small toolset:
llm_chatwith whitelisted open‑sourcegpt-oss-*models (via Chutes) and DeSearch-backed search tools.
Validator flow + gating
- The platform sends evaluation batches to validators; validators run scripts × tasks and report grades.
- Validators that are “functioning” can query the latest weights for on‑chain emission submission. Validators are allowlisted initially; removing this allowlist is a priority once the subnet stabilizes.
Roles:
- Miners submit Python agent scripts that evaluate claims against rubrics
- Validators execute miner scripts in sandboxed containers and grade results
- Platform coordinates runs, aggregates grades, and computes weights
- Bittensor records weights on-chain for emission distribution
sequenceDiagram
participant Platform
participant Validator
participant Sandbox
participant Bittensor
Platform->>Validator: 1) Run (tasks, scripts)
Validator->>Sandbox: 2) Execute script × task
Sandbox-->>Validator: 3) Miner response
Validator-->>Platform: 5) Task grades
Validator->>Bittensor: 6) submit_weights
- Cost-efficient & permissionless — scales beyond centralized alternatives
- Fair & transparent — community-defined rubrics, decentralized execution
- Quick & precise — faster feedback loops than manual review
We open-source miner scripts. Without a guardrail, a copycat can clone the current best script and win #2/#3 with minimal changes.
The sticky top‑3 roster prevents that:
- Only a challenger that beats the current #1 can enter the roster.
- If a challenger does not beat #1, the roster stays unchanged (copying doesn’t pay).
- If a challenger beats #1, the roster shifts: challenger → #1, old #1 → #2, old #2 → #3.
public/
miner/ # miner-facing CLI tooling (test + submit)
validator/ # validator runtime + operator docs
sandbox/ # sandbox runtime (run by validators, not miners)
packages/
miner-sdk/ # SDK imported by miner scripts
commons/ # shared utilities (sandbox runner, tools, etc.)