vernier

Fast, parity-preserving evaluation for object detection, instance / panoptic / semantic segmentation, boundary IoU, OKS keypoints, and LVIS federated. Rust core, Python frontend, optional CLI.

pycocotools==2.0.11 is the de-facto reference for COCO evaluation — slow, unmaintained, and full of edge-case quirks. Faster reimplementations exist, but each silently fixes some quirks and not others, so you discover the divergences empirically. vernier takes a third path:

Auditable parity. Every divergence from pycocotools is filed in the quirks survey under ADR-0002 as either strict (bit-equal output, even when vernier's implementation is structurally different) or corrected (opt-in opinionated fix). Strict is the default; corrected fixes are itemized so you always know when your numbers diverge from a reference run. A drop-in shim (vernier.patch_pycocotools()) keeps existing pycocotools-based scripts working with one line.
One toolkit instead of five. bbox / segm / boundary / keypoints AP, panoptic PQ, semantic mIoU, and LVIS federated all live behind one Python API and one CLI. Per-paradigm migration guides under docs/migrate/ show how to replace pycocotools, faster-coco-eval, panopticapi, lvis-api, and mmsegmentation one at a time.
Rust core, Python frontend. The matching kernel is pure Rust with runtime SIMD dispatch; the FFI layer is data conversion only. The CLI ships as a static binary, so CI pipelines call vernier without provisioning a Python interpreter.

Status & validation

Pre-1.0; public API is unstable. See docs/adr/ for the design decisions shaping it. Per-paradigm parity status:

Paradigm / metric	Oracle	Parity tier	Open caveat
Instance bbox / segm / keypoints AP	`pycocotools==2.0.11`	strict bit-equal	none
Instance boundary IoU	`boundary-iou-api`	strict bit-equal	none
Segm + boundary TIDE thresholds (`t_b`)	none yet	corrected-only	ADR-0022 still `proposed`; defaults extrapolated, not measured
Panoptic PQ	`panopticapi` (single-core path)	strict bit-equal	`boundary=True` raises `NotImplementedError` (ADR-0025 §Q3)
Semantic mIoU / FWIoU / pAcc / mAcc	`mmseg.IoUMetric` vendored at v1.2.2 (ADR-0036, still `proposed`); cityscapesScripts + ADE20K cross-impl bench externally blocked	strict bit-equal on the four per-class u64 marginals at val2017 scale	ADR-0028; ADE20K-scale bench gated on license-cleared cache
LVIS federated AP	`lvis-api` (vendored at `031ac21f`, ORACLE_LVIS_COMMIT_SHA)	strict bit-equal on the `(T, R, K, A)` precision tensor at full LVIS v1 val	bench paradigm wired; segm cell waits on `evaluate_segm_grid_with_dataset`

Three-tier parity model: ADR-0002; per-library comparison: docs/comparison.md.

Benchmarks

Workload	vernier median	Speedup vs alternatives
Instance — bbox AP (val2017)	360 ms	5.9× faster-coco-eval · 16.2× pycocotools
Instance — segm AP (val2017)	968 ms	3.7× faster-coco-eval · 7.1× pycocotools
Instance — boundary AP (val2017)	3.1 s	5.7× faster-coco-eval · 19.9× boundary-iou-api
Instance — keypoints AP (val2017, OKS)	136 ms	12.5× faster-coco-eval · 17.1× pycocotools
Panoptic — PQ (val2017)	11.6 s	3.04× panopticapi
Semantic — mIoU (val2017)	5.1 s	4.2× mmsegmentation
Instance — LVIS bbox AP (v1 val, perfect-DT)	3.7 s	56.9× lvis-api · 10× lower peak RSS (1.49 GiB vs 15.01 GiB)

Median total-stage wall time on a KVM VPS (AMD EPYC-Milan, 4 cores × 2 threads = 8 logical CPUs, x86_64 — not a bare-metal Milan box), harness mode release (N=10 measurement reps + 2 warmup, randomised impl order, 5% relative-IQR gate per impl), build profile = cargo release defaults (opt-level=3, lto=thin, codegen-units=1, no target-cpu) — same as the PyPI wheel. Full per-cell breakdown (including IQRs), RSS, and methodology in docs/benchmarks.md; per-library comparison of when to pick which in docs/comparison.md.

Baselines pinned for these numbers — pycocotools==2.0.11, faster-coco-eval==1.7.2, panopticapi @ 7bb4655, boundary-iou-api @ 37d2558, mmsegmentation @ c685fe6 (vendored), lvis-api @ 031ac21 (PyPI lvis==0.5.3). COCO and panoptic / semantic numbers were measured at HEAD 1fd5720bf56c; the LVIS row was added at HEAD e9d9c4d71303 after the bench paradigm landed. Each baseline is locked in its own uv-managed venv per ADR-0017.

Install

pip install vernier                  # Python wheel
cargo add vernier-core               # Rust library
cargo install vernier-cli            # `vernier` CLI binary

Wheels ship for linux x86_64 / aarch64 (glibc + musl), macOS x86_64 / arm64, and windows x64. The umbrella vernier crate name on crates.io is held as a 0.0.0 placeholder; vernier-core is the real Rust entry point — see docs/engineering/registry-reservations.md.

60-second example

One-shot — predictions already serialized to JSON (end-of-epoch checkpoint, CI gate, post-training inspection):

from pathlib import Path
from vernier.instance import Bbox, CocoDataset, Evaluator

gt_bytes = Path("instances_val2017.json").read_bytes()
dt_bytes = Path("detections.json").read_bytes()

dataset = CocoDataset.from_json(gt_bytes)
summary = Evaluator(iou=Bbox()).evaluate(dataset, dt_bytes)
for line in summary.pretty_lines():
    print(line)

In a training loop — overlap eval with the next training step. The matching kernel runs on a worker thread, so submit(...) returns immediately and the training thread keeps moving. Passing a CocoDataset reuses the parsed-once GT and its per-kernel derivation cache across every epoch (ADR-0020):

import json
from pathlib import Path
from vernier.instance import Bbox, CocoDataset, Evaluator

gt = CocoDataset.from_json(Path("instances_val2017.json").read_bytes())
evaluator = Evaluator(iou=Bbox())
with evaluator.background(gt) as bg:
    for images, _ in val_loader:
        detections = model(images)  # list[{image_id, category_id, bbox, score}]
        bg.submit(json.dumps(detections).encode())
    summary = bg.finalize()
print("AP =", summary.stats[0])

Both end in the same 12-line pycocotools-shaped Summary; docs/tutorials/first-evaluation.md walks each end-to-end.

Three evaluation paradigms

Pick the submodule whose input shape matches your model's output — they have different data models, different matching rules, and different parity oracles:

vernier.instance — detections with scores → bbox / segm / boundary / keypoints AP.
vernier.panoptic — RGB-encoded panoptic PNGs + segments_info JSON → PQ.
vernier.semantic — single-channel class-id label maps → mIoU / FWIoU / pAcc / mAcc.

See Three paradigms for when to use which.

Documentation

Tutorials — docs/tutorials/
Migration guides (from pycocotools, faster-coco-eval, panopticapi, lvis-api, mmsegmentation) — docs/migrate/
How-to — docs/how-to/
Reference — docs/reference/
Design / ADRs — docs/adr/
Comparison vs pycocotools / faster-coco-eval / panopticapi / boundary-iou-api / lvis-api / mmsegmentation — docs/comparison.md

Contributing

Local checks: just lint && just test && just audit. The full contributor workflow (ADR lifecycle, vendoring policy, code style) is in CONTRIBUTING.md. Repository layout and common just recipes are in CLAUDE.md.

License

Dual-licensed under Apache-2.0 or MIT at your option.

Third-party code

vernier vendors a small number of test-only reference implementations to support parity testing. None of this code is included in published wheels or linked into the Rust binary. See THIRD_PARTY_NOTICES.md for the full inventory and license attributions.

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
.github/workflows		.github/workflows
bench		bench
crates		crates
docs		docs
python/vernier		python/vernier
tests		tests
tools		tools
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
deny.toml		deny.toml
dist-workspace.toml		dist-workspace.toml
justfile		justfile
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vernier

Status & validation

Benchmarks

Install

60-second example

Three evaluation paradigms

Documentation

Contributing

License

Third-party code

About

Licenses found

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vernier

Status & validation

Benchmarks

Install

60-second example

Three evaluation paradigms

Documentation

Contributing

License

Third-party code

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages