Skip to content

Agent Context Evals v0.11.0 - Fresh-Clone Reproducibility

Latest

Choose a tag to compare

@ctxyao ctxyao released this 08 Jun 07:20

Agent Context Evals v0.11.0

Status: public fresh-clone reproducibility receipt for the v0.10 local
machine-evidence path. No public benchmark claim.

What Shipped

  • Fresh-clone reproducibility receipt:
    release/v0.11-fresh-clone-reproducibility/fresh-clone-receipt.json.
  • Manual release-operation generator:
    scripts/run_fresh_clone_reproducibility.py.
  • CI-safe receipt gate:
    scripts/check_v11_fresh_clone_receipt.py.
  • 5-minute local run feedback request:
    https://github.com/ctxgov/agent-context-evals/issues/22.

Reproduce

python3 scripts/check_v11_fresh_clone_receipt.py
python3 -m unittest tests.test_v11_fresh_clone_reproducibility -v

To regenerate the receipt, run the explicit release-operation path. It uses
network access to clone the public repository into a temporary directory:

python3 scripts/run_fresh_clone_reproducibility.py --ref v0.10.0

The checked-in receipt records a fresh clone of the v0.10.0 public local
evidence path, then runs:

  • python3 scripts/check_v10_saved_trace_readiness.py
  • python3 -m unittest discover -s tests -v

Boundary

  • No public benchmark claim.
  • No security claim.
  • No provider/model call.
  • No adoption claim.
  • No human reviewer claim.
  • No package publication claim.
  • No stable protocol claim.
  • No hosted runtime or live adapter claim.

This release publishes a reproducibility receipt and a local receipt gate only.
It does not execute provider/model calls, package publication, hosted runtime
changes, target writes, reviewer outreach, adoption measurement, or public
benchmark reporting.