Add scenario suggest CLI for static-to-dynamic bridge#46
Closed
pengfei-threemoonslab wants to merge 4 commits intomainfrom
Closed
Add scenario suggest CLI for static-to-dynamic bridge#46pengfei-threemoonslab wants to merge 4 commits intomainfrom
pengfei-threemoonslab wants to merge 4 commits intomainfrom
Conversation
Closes the loop between static findings (report.json) and dynamic adversarial validation: `agents-shipgate scenario suggest --from <report.json> --out <scenarios.yaml>` emits a deterministic YAML file with one scenario per (scenario_type, tool) pair, derived from report.findings + report.misalignments without any model calls or tool execution. Why: scenario authoring was ad-hoc and unreproducible. Reviewers had to hand-translate findings into adversarial tests. This command makes the static-to-dynamic transition deterministic, traceable (source_findings links back into report.json), and CI-enforceable via --strict (exit 20 when any critical/high mapped finding has no covering scenario). Implementation: - report/scenario_export.py: pure derivation (per-tool fan-out, adversarial-goal templates per SuggestedScenarioType, deterministic yaml.safe_dump with explicit key ordering). - cli/scenario.py: Typer shell with --from/--out/--min-severity /--json/--strict and a local _emit_input_error matching the apply_patches pattern (no circular import). - capability_diff.py: extracted scenario_type_for_finding() so the in-report grouping and the standalone YAML export share one predicate and cannot drift. - Coverage gate is honest: findings whose predicate yields None (e.g. SHIP-API-RETRY-WITHOUT-IDEMPOTENCY today) are out of scope, not silent failures. Test #9 documents this contract. Adds: - 13 tests covering golden YAML, determinism, six-category coverage, severity filtering, suppressed-finding skip, agent-level scenarios (tool=None), --strict exit 20, predicate-parity with the in-report grouping, and JSON/YAML envelope agreement. - STABILITY.md and AGENTS.md entries documenting the YAML envelope shape ({scenarios, coverage_gaps?}) as part of the 0.x contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups on PR #46: - Slug collisions (e.g. tools `a.b` and `a/b` both slugging to `a_b`) no longer produce duplicate scenario IDs. `_disambiguate_ids()` detects collisions at composition time and appends a short deterministic hash of the original tool name only when needed, so the common case stays clean. Adds a regression test. - `_index_findings()` now also indexes by `check_id`, matching `_finding_ref()`'s `id or fingerprint or check_id` fallback chain. Older/hand-built reports with `id=None, fingerprint=None, finding_refs=[check_id]` now resolve correctly instead of becoming silent strict-mode coverage gaps. Adds a regression test. - Agent-mode `next_action` hint pointed to the wrong scan flags (`--json --out report.json`); corrected to the actual stable surface (`-c shipgate.yaml --format json --out agents-shipgate-reports`). Golden YAML is unchanged — the disambiguator only fires on real collisions, which the support_refund_agent fixture doesn't have. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 0.x contract said `id` is exactly `<scenario_type>__<tool_slug>` or `<scenario_type>__agent`, which is no longer true after the collision-disambiguation fix in c81fe55. Note the optional `__<hash8>` suffix that's appended only when distinct tool names slug to the same value, so downstream parsers/validators know to expect it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The exporter intentionally supports finding_refs that fall back to `check_id` when both `id` and `fingerprint` are absent (c81fe55), which means `source_findings` may contain a check ID like `SHIP-POLICY-APPROVAL-MISSING`. The stability text only mentioned id/fingerprint, which would lead downstream consumers to reject valid YAML from older or hand-built reports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
agents-shipgate scenario suggest --from <report.json> --out <scenarios.yaml>command turns static findings into deterministic adversarial scenario suggestions, one row per (scenario_type, tool) pair, traceable back to source findings.--strictmakes coverage holes a CI failure (exit 20).scenario_type_for_finding()fromcapability_diff._diff_spec()so the in-report grouping (apply_capability_diff) and the new YAML export share one predicate and cannot drift.Type
report.jsonschema unchanged)Verification
CI is authoritative for
python -m ruff check .,python -m compileall -q src tests, andpython -m pytest.Additional local checks run:
pytest tests/test_scenario_suggest.py -v— 13/13 passingpytest -q(full suite) — green, no regressionsruff check src/agents_shipgate tests/test_scenario_suggest.py— cleansamples/support_refund_agent/expected/report.json→ 16 scenarios,--strictexit 0samples/simple_openai_api_agent/expected/report.json→ 12 scenarios,--strictexit 0 (the two unmappedSHIP-API-RETRY-WITHOUT-IDEMPOTENCYhigh findings are correctly excluded from the predicate, not silently swallowed — covered by test [codex] Implement v0.3 ADK and SARIF support #9)Release-readiness notes
docs/checks.md(no new check IDs — this command consumes existing findings)STABILITY.md(YAML envelope{scenarios, coverage_gaps?}documented as a 0.x stable contract;report.jsonschema unchanged)Reviewer notes
scenario_type_for_finding()isNone(e.g.SHIP-API-RETRY-WITHOUT-IDEMPOTENCY, currently inMISSING_CONTROL_CHECKSbut absent fromCHECK_DIFF_MAP) are explicitly outside the--strictpredicate's scope. This is documented and tested rather than papered over.SHIP-API-RETRY-WITHOUT-IDEMPOTENCY→idempotency_retrytoCHECK_DIFF_MAPis a one-line follow-up, but it changes the baseline of which findings produce scenarios — better in its own PR.tests/fixtures/scenario_suggest/support_refund_agent.expected.yaml. If wording in_ADVERSARIAL_GOAL_TEMPLATESor_scenario_textchanges, regenerate viaagents-shipgate scenario suggest --from samples/support_refund_agent/expected/report.json --out tests/fixtures/scenario_suggest/support_refund_agent.expected.yamland review the diff.🤖 Generated with Claude Code