Agentic AI system to predict events in US federal courts — for example, whether a motion before a court of appeals or the Supreme Court will be granted or denied, the likely vote of each judge or justice, and a detailed prediction of the court's reasoning.
Status: early scaffold. The pipeline shape, data contract, and automation are in place; most feature work is done by AI coding agents via the label-driven workflows below.
No predictions have been published yet — the first target is the OT2026 long-conference cert release (see milestones).
Not legal advice. Outputs are experimental model predictions — they may be wrong, carry no affiliation with or endorsement by any court, and are not legal advice or a forecast you should rely on for any decision.
Predictions about how individual judges or justices may vote describe likely outcomes — they are not assertions of fact and not statements about how anyone should decide.
The project runs as a label-driven pipeline of GitHub Actions. Work is
represented as GitHub issues; applying a run:* label triggers the matching
workflow. When a stage needs to hand off, it opens (or labels) an issue to
trigger the next stage. Several stages delegate to agentic coding tools
(Claude Code and Codex), which branch, do the work, and open a pull request.
| Label | Workflow | Does | Engine |
|---|---|---|---|
run:dev |
run-dev |
Normal development on the pipeline codebase | Claude Code |
run:seed |
run-seed |
Ingest initial dockets from CourtListener into the corpus | Script |
run:pull |
run-pull |
Refresh tracked dockets (also runs on a daily schedule) | Script (agent only if ambiguous) |
run:reconcile |
run-reconcile |
Confirm a decided event's outcome.json from the docket when pull can't |
Claude Code |
run:predict |
run-predict |
Predict open events with multiple competing predictors (fan-out) | Claude Code + Codex |
run:evaluate |
run-evaluate |
Score past predictions against realized outcomes (evaluator × predictor) | Claude Code + Codex |
Plus run-ops, a read-only daily health & cost dashboard that has no run:*
label — it runs on a schedule (or manual dispatch). See docs/pipeline.md.
flowchart TD
seed["run:seed — seed dockets"] --> corpus[("corpus")]
pull["run:pull — refresh dockets<br/>(daily schedule)"] --> corpus
pull -->|"changed?"| predict["run:predict — predict open events<br/>(matrix over predictors)"]
predict --> ppr[/"pull requests"/]
predict -.->|"outcome lands via pull,<br/>or run:reconcile"| evaluate["run:evaluate — score every predictor"]
evaluate --> epr[/"pull requests"/]
Longer term, an automated-research harness (in the spirit of Anthropic's
automated alignment researchers)
proposes new predictor designs, registers them as new entries in the predictor
registry, and lets them compete — so run-predict tracks a growing field of
agents and run-evaluate is the tournament that ranks them.
State lives in two stores, split by kind. Raw facts — dockets, snapshots,
judges, case and event metadata — go into a packed corpus (SQLite under
DVC/S3), written identically by seed and pull. Derived artifacts are
versioned as files in git, organized case-centrically so everything we
conclude about a single predictable event lives together:
data/cases/<court_id>/<docket_id>/events/<event_id>/
outcome.json # ground truth, once the event resolves
predictions/<predictor_id>/<run_id>/
prediction.json # quantitative: granted 1/0, P(granted), votes
reasoning.md # qualitative: predicted reasoning
evaluations/<evaluator_id>/<predictor_id>/<run_id>/
evaluation.json
evaluation.md
Every git artifact validates against a pydantic model in fedcourtsai.schemas
(exported to schemas/*.schema.json). See docs/data-model.md
for the rationale and docs/data-pipeline.md for the
corpus.
Requires uv. A devcontainer is included
(.devcontainer/) and is the recommended way to work in Codespaces.
uv sync # install deps into .venv
uv run fedcourts --help # CLI (full reference: docs/cli.md)
uv run fedcourts export-schemas
uv run fedcourts validate data
# the local gate CI also runs:
uv run ruff format --check .
uv run ruff check .
uv run mypy
uv run pytestseed and pull are single-docket REST helpers that fetch one case from the
CourtListener REST API into the corpus through the shared ingestion core, so they
need a free API token. seed onboards a docket; pull refreshes one and reports
whether it changed:
export FEDCOURTS_COURTLISTENER_API_TOKEN=... # https://www.courtlistener.com/help/api/rest/
uv run fedcourts seed --court ca9 --docket <docket_id> # onboard one docket into the corpus
uv run fedcourts pull --court ca9 --docket <docket_id> # refresh one docket; report changesThe historical mass is loaded by seed-backfill, what the run-seed workflow
runs: deterministic, no-agent ingestion of CourtListener bulk data (no API
token, no API budget) into the same corpus through the same core. It loads
one chunk of the tracked courts per run against a resumable cursor
(config/seed-progress.yaml), chunked until complete:
uv run fedcourts seed-backfill --report seed-report.json # load the next bulk chunkSee docs/seed-backfill.md and
docs/data-pipeline.md.
Start with AGENTS.md — it is the canonical instruction file and
defines the branch-and-PR workflow every agent follows. CLAUDE.md points to it.
src/fedcourtsai/ library: CourtListener client, schemas, paths, registry, CLI
config/ predictor & evaluator registries, tracking settings
data/ tracked cases (versioned)
schemas/ JSON Schema exported from the pydantic models
docs/ architecture, data model, pipeline, security
.github/workflows/ the label-driven pipeline + CI + workflow linting
.github/prompts/ engine-agnostic prompts used by both Claude Code and Codex
- Architecture
- Data model · Data pipeline (the corpus)
- Data sources, terms & PII
- Pipeline & labels
- CLI reference
- Seed-backfill
- Budget
- Milestones
- Security · setup runbook
- Agent workflow · Testing · Contributing
Court data comes from CourtListener, a project of
the Free Law Project — via the CourtListener REST API and the
quarterly bulk-data exports. A great deal of this project rests on their work;
please review and support it. Use of their data is governed by
CourtListener's terms (CC BY-ND 4.0 for
CourtListener's own content; the underlying federal records are public domain), with
attribution also recorded in the top-level NOTICE.
The derived corpus is not publicly republished — it stays in an access-gated store; only our model-generated judgments over those public records reach public git. We ingest only public-record dockets and never sealed or privileged material. See docs/data-sources.md for the full position on terms, redistribution, the API budget, and PII.
FedCourtsAI is independent and is not affiliated with or endorsed by the Free Law Project or any court. Court records are public records of the U.S. federal courts; the predictions and evaluations in this repository are model-generated and are not official court records.
MIT — see LICENSE.