Skip to content

[codex] Add evidence matrix to release packet#104

Merged
pengfei-threemoonslab merged 3 commits into
mainfrom
codex/evidence-matrix-light
May 21, 2026
Merged

[codex] Add evidence matrix to release packet#104
pengfei-threemoonslab merged 3 commits into
mainfrom
codex/evidence-matrix-light

Conversation

@pengfei-threemoonslab
Copy link
Copy Markdown
Contributor

@pengfei-threemoonslab pengfei-threemoonslab commented May 21, 2026

Summary

  • Adds packet-only Evidence Matrix Light as a compact domain review summary derived from public report JSON.
  • Bumps the packet contract to v0.6 and updates schemas, docs, discovery surfaces, and packet goldens.
  • Renders the matrix in packet Markdown/HTML and keeps older packet JSON loads compatible with an unavailable-matrix note.
  • Review follow-up: documents intentional domain overlap, pins row-identity stability, and treats source-only rows as medium confidence because source presence is declared coverage, not runtime proof.
  • Action-surface choice: findings[].blocks_release is surfaced as an Action-surface policy source only for findings already classified by SHIP-ACTION-* or category == "action_surface"; non-action blockers do not cross-classify into that row.

Validation

  • python scripts/generate_schemas.py --check
  • git diff --check
  • python -m ruff check .
  • python -m pytest tests/test_evidence_packet.py -q
  • python -m pytest
  • PYTHONPATH=src python -m agents_shipgate self-check --json
  • PYTHONPATH=src python -m agents_shipgate contract --json

@pengfei-threemoonslab pengfei-threemoonslab marked this pull request as ready for review May 21, 2026 06:27
@pengfei-threemoonslab pengfei-threemoonslab merged commit 7101d74 into main May 21, 2026
1 check passed
pengfei-threemoonslab added a commit that referenced this pull request May 21, 2026
…, fingerprint-safe enrichment, packet v0.6 prose

Merges origin/main (PR #104 evidence_matrix) into reviewer-grade-provenance
and addresses the round-4 review.

Merge resolution:

- packet_schema_version stays "0.6". The bump now covers BOTH
  additive extensions on v0.5: PR #104's top-level evidence_matrix
  section AND PR #103's ReleaseDecisionItem.{source,
  policy_evidence_source} pointers. Schema comment in schemas/packet.py
  and the docs (STABILITY, agent-contract-current, INDEX, faq, AGENTS,
  README, SKILL) describe both.
- packet/json_packet.py upgrade chain merges both v0.5→v0.6
  upgrades (HEAD's bare bump for PR #103 + main's
  _upgrade_evidence_matrix_v06).
- docs/packet-schema.v0.6.json regenerated from the combined model.
- Sample goldens (report.{md,json}, packet.{md,json,html}) regenerated.
- llms-full.txt regenerated.

P1 — Stop leaking line numbers into action-finding identity:

- cli/scan.py no longer enriches the INTERNAL action_surface_diff.
  evaluate_action_surface_policies serializes ActionSurfaceChange.model_dump()
  into finding evidence, and finding_fingerprint hashes evidence.
  Mutating the row before policy evaluation would leak (source: path:line)
  into baseline identity and a tool moving lines would churn fingerprints.
  The PUBLIC diff (rendered into report.json / packet) is still
  enriched separately from public_tools.

P2 — Structured source on every tool-surface diff change row:

- schemas/surfaces.py: ToolSurfaceToolChange, ToolSurfaceHighRiskEffectChange,
  ToolSurfaceControlChange, ToolSurfaceMetadataChange, and
  ActionSurfaceChange each gain optional source_path / source_start_line
  fields (default None, additive).
- enrich_tool_surface_diff_with_source now covers all four
  tool-surface row families (tools, high_risk_effects, controls,
  metadata_changes), not just controls.
- enrich_action_surface_diff_with_source now populates the structured
  fields instead of suffixing reason. ActionSurfaceChange.reason
  stays byte-stable so policy-finding fingerprints don't churn.
- Renderers that previously read source via tool_source_index keep
  working; the structured fields are an additional canonical surface
  for post-scan consumers reading report.json directly.

Tests:

- test_enrich_action_surface_diff_populates_structured_source_fields
  replaces the earlier reason-suffix test.
- test_enrich_action_surface_diff_does_not_mutate_reason is a
  regression for the fingerprint-stability rule.
- Full suite: 1676 passed, 4 skipped.
- run_id + fingerprint identical across two scans of the same fixture
  with the structured source fields populated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant