Skip to content

Releases: dl1683/irys-stateful-swarms

ant-irys v1.0 — Full Benchmark Results

03 Jun 21:11

Choose a tag to compare

Full Harvey LAB Benchmark Outputs

Complete outputs from the full 1,251-task Harvey Legal Agent Benchmark run.

Metric Result
Tasks completed 1,251 / 1,251
Criteria pass rate 62,800 / 74,990 = 83.74%
Strict all-pass 222 / 1,251 = 17.75%
Cost per task $1.30

Verification

Download any practice area archive, extract it, and score against the Harvey LAB scorer:

python -m src.cli score <extracted_dir> --bench-root /path/to/harvey-labs

Each task directory contains:

  • output/ — the generated deliverables (docx, xlsx)
  • swarm/ — full blackboard state showing how the system reasoned (see README for walkthrough)
  • scores.json — per-criterion scoring results
  • metrics.json — token usage and cost breakdown

Practice Area Archives

Archives will be uploaded as they are prepared. Each archive contains all tasks for one practice area.