Skip to content

Releases: acblabs/agent-assure

agent-assure v0.3.0

Choose a tag to compare

@github-actions github-actions released this 02 Jul 03:57
282208d

agent-assure v0.3.0

v0.3.0 prepares the adoption release packaging path. The package metadata now
targets PyPI/TestPyPI, the wheel build explicitly includes frozen schema
snapshots through schemas/v0.3.0 under agent_assure/schema_resources/, and
deterministic example suite resources are bundled under agent_assure.examples.

The persisted artifact schema_version remains 0.2.0; the v0.3.0 schema
directory is a frozen release snapshot for package inspection and release
gating, not a breaking artifact-shape change.

Release checks now inspect the built wheel for the required schema and example
paths, then install the local wheel into a clean virtual environment without
resolving agent-assure from a package index. The smoke test also asserts that
packaged example resources are visible through importlib.resources from the
installed wheel.

The Trusted Publishing path supports manual TestPyPI uploads from a selected
candidate ref and tag-gated PyPI uploads from the release workflow. Final PyPI
publishing uses the package files already present in the release bundle
artifact, so it does not run a second package build. The release workflow blocks
when a tag such as v0.3.0 does not match project.version = "0.3.0" and
agent_assure.__version__ = "0.3.0".

Top-level repository examples and bundled package example resources are checked
for parity to reduce fixture drift risk while the repository keeps both paths
for docs, tests, and installed-wheel demos.

OIDC publish and signing jobs use immutable third-party action SHAs, with
comments preserving the reviewed upstream tags for maintainability.

Package publishing jobs validate release-version input before use and recheck
downloaded package artifacts at the upload boundary. Final PyPI publishing also
replays the downloaded release bundle digests before staging upload files.

Sprint 5 hardening adds frozen-schema parity to make release-check, verifies
every frozen schema file in the built and installed wheel, pins remaining
GitHub Actions references to reviewed SHAs, and moves release-bundle cosign
signing and verification artifact selection out of workflow shell snippets into
a Python helper.

agent-assure v0.2.0

Choose a tag to compare

@github-actions github-actions released this 28 Jun 23:00

agent-assure v0.2.0

This release adds protocol-bound live evaluation while preserving the
deterministic fixture-mode assurance surface introduced in v0.1.0. Fixture-mode
commands remain offline and reproducible; live commands require an explicit
adapter configuration and a frozen live-protocol-record.

New live artifacts include live-protocol-record, live-evaluation-report,
live-comparison-report, live-drift-report, live-trajectory-report, and
emergency-process-record, with exported JSON Schemas under schemas/v0.2.0.
The v0.1 schema set remains available under schemas/v0.1.0 for historical
replay.

Live evaluation now supports repeated provider observations, cluster-aware
pass/outcome/reason-code/exclusion rates, pooled and cluster-mean interval
metadata, paired or fixed-reference comparisons, cost and latency summaries,
provider-version capture, budget and rate-limit accounting, incomplete-run
status, and static JSONL tests that keep the live path reproducible without
network access. Optional OpenAI-compatible live execution still requires
explicit network opt-in.

Advanced protocol-bound analysis can report rare-event upper bounds, observed
cluster-correlation summaries, Bonferroni-controlled endpoint families, and
paired exact or Monte Carlo randomization tests when the frozen protocol and
observed design meet the declared prerequisites. Cross-window drift monitoring
adds comparability checks, ordered trend and adjacent-step summaries,
dependence diagnostics, and EWMA governance-health or control-reliability
review signals. Trajectory reports derive privacy-filtered observable state
paths, transition profiles, sequence-invariant findings, history-dependent
checks, and operational event-process summaries from structured live artifacts.

Runtime support now includes an external-script live adapter backed by a
no-shell subprocess harness, declared environment passing, redacted emergency
process records, W3C trace-context propagation, privacy-filtered span plans,
and optional OpenTelemetry SDK or OTLP HTTP export.

Final pre-tag security hardening makes live producer-supplied failing policy
results verdict-bearing, confines live prompt/JSONL/script/cwd paths to the
live config directory, requires HTTPS plus explicit host allowlisting for
non-default OpenAI-compatible endpoints, bounds external-script stdout/stderr
capture, and expands recursive persisted-artifact redaction for common secret
token patterns while preserving schema-owned structural identifiers. The bundled
fixture HMAC key is now accepted only for repository synthetic examples; other
fixture runs must provide an explicit key.

Pre-release hardening unified decimal rendering across protocol and report
calculations, preserved unclamped cost and latency comparison deltas, enforced
cumulative total and generated token budgets after live responses, exposed
rare-event Poisson upper bounds as one-sided artifacts, and constrained paired
randomization comparison protocols to the expectation pass-rate endpoint they
actually test.

Release assets include the evidence packet, release artifact manifest, digest
replay file, SBOM, source distribution, wheel, and keyless cosign bundles when
built by the release workflow. Replay cross-checks manifest-listed artifact
digests, including SBOM and distribution bytes, while stable projections keep
environment-bearing review artifacts reproducible.

Synthetic calibration and regression coverage is summarized in
docs/live_calibration.md. The v0.2 live reports are time-bound operational
evidence for declared protocols, data boundaries, provider/model
configurations, and execution windows. This release does not establish safety
assurance, validate clinical use, prove regulatory compliance, provide general
provider-quality evidence, or claim OpenTelemetry adoption.

agent-assure v0.1.0

Choose a tag to compare

@github-actions github-actions released this 27 Jun 17:40

agent-assure v0.1.0

This release introduces deterministic, fixture-mode assurance for AI agent
governance pipelines. It includes strict schemas, suite compilation, fixture
manifests, deterministic fixture runs, expectation evaluation, comparison
reports, CI gates, evidence packets, release digest replay, and
OpenTelemetry-aligned span-plan preview.

The flagship synthetic prior-authorization showcase demonstrates a candidate
that preserves the visible recommendation while losing a material evidence
link under equivalent fixtures. The minimal expense-approval example shows the
same method in a small non-healthcare setting.

Release assets include the evidence packet, release artifact manifest, digest
replay file, SBOM, Python source distribution, wheel, and keyless cosign bundles
for exact blob verification. Replay also cross-checks manifest-listed digests
against available local artifact bytes while retaining stable replay
projections for environment-bearing review artifacts.

This release does not evaluate live model quality, certify safety, validate
clinical use, prove regulatory compliance, or claim OpenTelemetry adoption.