Skip to content

v0.7.0

Choose a tag to compare

@github-actions github-actions released this 21 Jun 01:53

Changed

  • Deepened event-log append handling behind a transaction interface so session start, stop, provider import, manual markers, and filesystem watcher appends share one locked replay-and-append path.
  • Deepened Provider Evidence handling behind a typed module so Codex and Claude adapters construct the shared event-log shape in one place, while review, session confidence, and watch token baselines read provider commands, results, risk signals, labels, and token totals through one interface.
  • Refactored replay-safe evidence extraction into internal/evidence so reviewer replay and future verifier-facing replay can reuse deterministic event-derived summary, confidence, risk, gaps, and timeline logic without invoking git commands.
  • Added artifact-only receipt verification in internal/receipt so bundle and local verification share a single artifact-hash/signature validation path while local checks continue to include workspace diff parity validation.
  • Documented the production replay evaluator contract in README and docs/replay-evaluator-contract.md, covering verification, trust, quality gates, policy checks, privacy, claims, and outcome semantics.

Added

  • Added evaluator-loop replay implementation tracking (PLAN.md Step 0).

  • Added local replay signer trust policy support (PLAN.md Step 2): configuration-level trust.trusted_signer_key_ids, agentreceipt replay --trusted-signer-key-id, and deterministic trust status reporting (trust_status, signer_trusted, policy_valid).

  • Added replay evaluator scoring signals (PLAN.md Step 4): additive evaluator_signals counters for command activity, risk-relevant command classes, and changed-file category signals (read_command_count, network_command_count, changed_test_file_count, and related fields).

  • Added replay quality gate evidence (PLAN.md Step 5): top-level quality_gates summarizing command-classified quality checks (format/lint/tests/race_tests/typecheck/security/coverage/build/smoke/verify), failed_command_details for failed commands with redacted outputs and evidence, and command metadata (cwd, time) for richer verifier context.

  • Added replay patch semantic summaries (PLAN.md Step 6): top-level patch_summary with category counts, additions/deletions, semantic changed-file entries, Go symbol hints, and test/production relationship signals for final patch review.

  • Added replay policy checks and review focus prompts (PLAN.md Step 7): top-level policy_checks with deterministic pass/fail/warn/not_applicable/unknown statuses, and review_focus prompts synthesized from verification gaps, quality gates, patch summary, policy checks, and failed commands.

  • Added replay privacy reporting, claim confidence, and outcome classification (PLAN.md Step 8): top-level privacy redaction metadata, claims for verification/authenticity/trust/gates/policies/outcome, and outcome states for completed, completed_with_gaps, failed, abandoned, committed, and needs_human_review sessions.

  • Added replay implementation progress tracking (PROGRESS.md) and committed the first planning-control milestone for verifier-facing replay work.

  • Added replay evaluator characterization coverage to ensure replay output does not leak raw provider risk_signals.

  • Added verifier-facing replay report construction in internal/replay, including command pairing, command risk mapping, evidence gaps, risk-to-evidence references, and artifact hash metadata.

  • Added agentreceipt replay CLI command to emit machine-readable verifier JSON with required --session validation and JSON output mode.

  • Added portable replay bundle generation for agentreceipt replay via --bundle, including required artifact packaging, normalized Codex trace copying, and replay.json manifest emission.

  • Added smoke-level replay coverage for agentreceipt replay JSON and bundle outputs, plus validation that replay requires --session and emits machine-readable output without raw provider logs.

  • Added replay workflow documentation updates in README and PRD/TECH_SPEC for verifier-only usage, artifact requirements, explicit-session behavior, and privacy constraints.

  • Added replay acceptance coverage in internal/replay for tampered events.jsonl, manifest.json, receipt.json, and final.patch to keep replay verification invalidation behavior explicit.

  • Added component-level replay verification fields in verifier output (event_chain_valid, final_patch_hash_valid, manifest_hash_valid, receipt_hash_valid) plus stable signature failure context (signature_error_code) for actionable replay review.

  • Added factual replay contract and smoke assertions clarifying that agentreceipt replay reports evidence facts only; no policy recommendations or scoring.

  • Split replay verification output into explicit integrity/authenticity and outcome verdict signals (integrity_valid, authenticity_valid, authenticity_status, overall_verdict, component_results) to support evaluator-safe consumption without overloading valid.

  • Hardened signer portability for replay verification by ensuring embedded public-key metadata is treated as the canonical path for signature checks and by codifying legacy behavior when signer material is missing (legacy_missing_embedded_signer).

  • Fixed filesystem watcher shutdown robustness so stale or already-exited watcher processes no longer produce filesystem watcher did not stop cleanly.