Fast, parity-preserving evaluation for object detection, instance / panoptic / semantic segmentation, boundary IoU, OKS keypoints, and LVIS federated. Rust core, Python frontend, optional CLI.
pycocotools==2.0.11 is the de-facto reference for COCO evaluation —
slow, unmaintained, and full of edge-case quirks. Faster
reimplementations exist, but each silently fixes some quirks and not
others, so you discover the divergences empirically. vernier takes a
third path:
- Auditable parity. Every divergence from pycocotools is filed in
the quirks survey under
ADR-0002 as either
strict(bit-equal output, even when vernier's implementation is structurally different) orcorrected(opt-in opinionated fix). Strict is the default; corrected fixes are itemized so you always know when your numbers diverge from a reference run. A drop-in shim (vernier.patch_pycocotools()) keeps existing pycocotools-based scripts working with one line. - One toolkit instead of five. bbox / segm / boundary / keypoints
AP, panoptic PQ, semantic mIoU, and LVIS federated all live behind
one Python API and one CLI. Per-paradigm migration guides under
docs/migrate/show how to replacepycocotools,faster-coco-eval,panopticapi,lvis-api, andmmsegmentationone at a time. - Rust core, Python frontend. The matching kernel is pure Rust with runtime SIMD dispatch; the FFI layer is data conversion only. The CLI ships as a static binary, so CI pipelines call vernier without provisioning a Python interpreter.
Pre-1.0; public API is unstable. See docs/adr/ for the
design decisions shaping it. Per-paradigm parity status:
| Paradigm / metric | Oracle | Parity tier | Open caveat |
|---|---|---|---|
| Instance bbox / segm / keypoints AP | pycocotools==2.0.11 |
strict bit-equal | none |
| Instance boundary IoU | boundary-iou-api |
strict bit-equal | none |
Segm + boundary TIDE thresholds (t_b) |
none yet | corrected-only | ADR-0022 still proposed; defaults extrapolated, not measured |
| Panoptic PQ | panopticapi (single-core path) |
strict bit-equal | boundary=True raises NotImplementedError (ADR-0025 §Q3) |
| Semantic mIoU / FWIoU / pAcc / mAcc | mmseg.IoUMetric vendored at v1.2.2 (ADR-0036, still proposed); cityscapesScripts + ADE20K cross-impl bench externally blocked |
strict bit-equal on the four per-class u64 marginals at val2017 scale | ADR-0028; ADE20K-scale bench gated on license-cleared cache |
| LVIS federated AP | lvis-api (vendored at 031ac21f, ORACLE_LVIS_COMMIT_SHA) |
strict bit-equal on the (T, R, K, A) precision tensor at full LVIS v1 val |
bench paradigm wired; segm cell waits on evaluate_segm_grid_with_dataset |
Three-tier parity model: ADR-0002;
per-library comparison: docs/comparison.md.
| Workload | vernier median | Speedup vs alternatives |
|---|---|---|
| Instance — bbox AP (val2017) | 360 ms | 5.9× faster-coco-eval · 16.2× pycocotools |
| Instance — segm AP (val2017) | 968 ms | 3.7× faster-coco-eval · 7.1× pycocotools |
| Instance — boundary AP (val2017) | 3.1 s | 5.7× faster-coco-eval · 19.9× boundary-iou-api |
| Instance — keypoints AP (val2017, OKS) | 136 ms | 12.5× faster-coco-eval · 17.1× pycocotools |
| Panoptic — PQ (val2017) | 11.6 s | 3.04× panopticapi |
| Semantic — mIoU (val2017) | 5.1 s | 4.2× mmsegmentation |
| Instance — LVIS bbox AP (v1 val, perfect-DT) | 3.7 s | 56.9× lvis-api · 10× lower peak RSS (1.49 GiB vs 15.01 GiB) |
Median total-stage wall time on a KVM VPS (AMD EPYC-Milan, 4 cores ×
2 threads = 8 logical CPUs, x86_64 — not a bare-metal Milan box),
harness mode release (N=10 measurement reps + 2 warmup, randomised
impl order, 5% relative-IQR gate per impl), build profile = cargo
release defaults (opt-level=3, lto=thin, codegen-units=1, no
target-cpu) — same as the PyPI wheel. Full per-cell breakdown
(including IQRs), RSS, and methodology in
docs/benchmarks.md; per-library comparison of
when to pick which in docs/comparison.md.
Baselines pinned for these numbers —
pycocotools==2.0.11,
faster-coco-eval==1.7.2,
panopticapi @ 7bb4655,
boundary-iou-api @ 37d2558,
mmsegmentation @ c685fe6 (vendored),
lvis-api @ 031ac21
(PyPI lvis==0.5.3).
COCO and panoptic / semantic numbers were measured at HEAD 1fd5720bf56c;
the LVIS row was added at HEAD e9d9c4d71303 after the bench
paradigm landed. Each baseline is locked in its own uv-managed venv per
ADR-0017.
pip install vernier # Python wheel
cargo add vernier-core # Rust library
cargo install vernier-cli # `vernier` CLI binaryWheels ship for linux x86_64 / aarch64 (glibc + musl), macOS
x86_64 / arm64, and windows x64. The umbrella vernier crate name on
crates.io is held as a 0.0.0 placeholder; vernier-core is the real
Rust entry point — see
docs/engineering/registry-reservations.md.
One-shot — predictions already serialized to JSON (end-of-epoch checkpoint, CI gate, post-training inspection):
from pathlib import Path
from vernier.instance import Bbox, CocoDataset, Evaluator
gt_bytes = Path("instances_val2017.json").read_bytes()
dt_bytes = Path("detections.json").read_bytes()
dataset = CocoDataset.from_json(gt_bytes)
summary = Evaluator(iou=Bbox()).evaluate(dataset, dt_bytes)
for line in summary.pretty_lines():
print(line)In a training loop — overlap eval with the next training step. The
matching kernel runs on a worker thread, so submit(...) returns
immediately and the training thread keeps moving. Passing a
CocoDataset reuses the parsed-once GT and its per-kernel
derivation cache across every epoch (ADR-0020):
import json
from pathlib import Path
from vernier.instance import Bbox, CocoDataset, Evaluator
gt = CocoDataset.from_json(Path("instances_val2017.json").read_bytes())
evaluator = Evaluator(iou=Bbox())
with evaluator.background(gt) as bg:
for images, _ in val_loader:
detections = model(images) # list[{image_id, category_id, bbox, score}]
bg.submit(json.dumps(detections).encode())
summary = bg.finalize()
print("AP =", summary.stats[0])Both end in the same 12-line pycocotools-shaped Summary;
docs/tutorials/first-evaluation.md
walks each end-to-end.
Pick the submodule whose input shape matches your model's output — they have different data models, different matching rules, and different parity oracles:
vernier.instance— detections with scores → bbox / segm / boundary / keypoints AP.vernier.panoptic— RGB-encoded panoptic PNGs +segments_infoJSON → PQ.vernier.semantic— single-channel class-id label maps → mIoU / FWIoU / pAcc / mAcc.
See Three paradigms for when to use which.
- Tutorials —
docs/tutorials/ - Migration guides (from pycocotools, faster-coco-eval,
panopticapi, lvis-api, mmsegmentation) —
docs/migrate/ - How-to —
docs/how-to/ - Reference —
docs/reference/ - Design / ADRs —
docs/adr/ - Comparison vs pycocotools / faster-coco-eval / panopticapi /
boundary-iou-api / lvis-api / mmsegmentation —
docs/comparison.md
Local checks: just lint && just test && just audit. The full
contributor workflow (ADR lifecycle, vendoring policy, code style) is
in CONTRIBUTING.md. Repository layout and
common just recipes are in CLAUDE.md.
Dual-licensed under Apache-2.0 or MIT at your option.
vernier vendors a small number of test-only reference implementations
to support parity testing. None of this code is included in published
wheels or linked into the Rust binary. See
THIRD_PARTY_NOTICES.md for the full
inventory and license attributions.