Agent Context Evals v0.11.0

Status: public fresh-clone reproducibility receipt for the v0.10 local
machine-evidence path. No public benchmark claim.

What Shipped

Fresh-clone reproducibility receipt:
release/v0.11-fresh-clone-reproducibility/fresh-clone-receipt.json.
Manual release-operation generator:
scripts/run_fresh_clone_reproducibility.py.
CI-safe receipt gate:
scripts/check_v11_fresh_clone_receipt.py.
5-minute local run feedback request:
https://github.com/ctxgov/agent-context-evals/issues/22.

Reproduce

python3 scripts/check_v11_fresh_clone_receipt.py
python3 -m unittest tests.test_v11_fresh_clone_reproducibility -v

To regenerate the receipt, run the explicit release-operation path. It uses
network access to clone the public repository into a temporary directory:

python3 scripts/run_fresh_clone_reproducibility.py --ref v0.10.0

The checked-in receipt records a fresh clone of the v0.10.0 public local
evidence path, then runs:

python3 scripts/check_v10_saved_trace_readiness.py
python3 -m unittest discover -s tests -v

Boundary

No public benchmark claim.
No security claim.
No provider/model call.
No adoption claim.
No human reviewer claim.
No package publication claim.
No stable protocol claim.
No hosted runtime or live adapter claim.

This release publishes a reproducibility receipt and a local receipt gate only.
It does not execute provider/model calls, package publication, hosted runtime
changes, target writes, reviewer outreach, adoption measurement, or public
benchmark reporting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Context Evals v0.11.0 - Fresh-Clone Reproducibility

Choose a tag to compare

Sorry, something went wrong.