Add scenario suggest CLI for static-to-dynamic bridge by pengfei-threemoonslab · Pull Request #46 · ThreeMoonsLab/agents-shipgate

pengfei-threemoonslab · 2026-05-07T06:24:40Z

Summary

New agents-shipgate scenario suggest --from <report.json> --out <scenarios.yaml> command turns static findings into deterministic adversarial scenario suggestions, one row per (scenario_type, tool) pair, traceable back to source findings.
Closes the static-to-dynamic loop: every critical/high mapped finding gets at least one concrete sandbox scenario; --strict makes coverage holes a CI failure (exit 20).
Extracts scenario_type_for_finding() from capability_diff._diff_spec() so the in-report grouping (apply_capability_diff) and the new YAML export share one predicate and cannot drift.

Type

CLI or GitHub Action behavior
Report, schema, or SARIF output (new YAML export — report.json schema unchanged)
Documentation only (additive — STABILITY.md + AGENTS.md)

Verification

CI is authoritative for python -m ruff check ., python -m compileall -q src tests, and python -m pytest.

Additional local checks run:

pytest tests/test_scenario_suggest.py -v — 13/13 passing
pytest -q (full suite) — green, no regressions
ruff check src/agents_shipgate tests/test_scenario_suggest.py — clean
Smoke against samples/support_refund_agent/expected/report.json → 16 scenarios, --strict exit 0
Smoke against samples/simple_openai_api_agent/expected/report.json → 12 scenarios, --strict exit 0 (the two unmapped SHIP-API-RETRY-WITHOUT-IDEMPOTENCY high findings are correctly excluded from the predicate, not silently swallowed — covered by test [codex] Implement v0.3 ADK and SARIF support #9)

Release-readiness notes

No user-code import added to default scan paths
No network access added to default scan paths
New or changed check IDs are documented in docs/checks.md (no new check IDs — this command consumes existing findings)
Report/schema changes are additive or documented in STABILITY.md (YAML envelope {scenarios, coverage_gaps?} documented as a 0.x stable contract; report.json schema unchanged)

Reviewer notes

The coverage predicate is honest: findings whose scenario_type_for_finding() is None (e.g. SHIP-API-RETRY-WITHOUT-IDEMPOTENCY, currently in MISSING_CONTROL_CHECKS but absent from CHECK_DIFF_MAP) are explicitly outside the --strict predicate's scope. This is documented and tested rather than papered over.
A latent gap surfaced by this work: adding SHIP-API-RETRY-WITHOUT-IDEMPOTENCY → idempotency_retry to CHECK_DIFF_MAP is a one-line follow-up, but it changes the baseline of which findings produce scenarios — better in its own PR.
The golden YAML lives at tests/fixtures/scenario_suggest/support_refund_agent.expected.yaml. If wording in _ADVERSARIAL_GOAL_TEMPLATES or _scenario_text changes, regenerate via agents-shipgate scenario suggest --from samples/support_refund_agent/expected/report.json --out tests/fixtures/scenario_suggest/support_refund_agent.expected.yaml and review the diff.
Exit code 20 reuses the existing strict-gate convention from STABILITY.md rather than inventing a new code.

🤖 Generated with Claude Code

Closes the loop between static findings (report.json) and dynamic adversarial validation: `agents-shipgate scenario suggest --from <report.json> --out <scenarios.yaml>` emits a deterministic YAML file with one scenario per (scenario_type, tool) pair, derived from report.findings + report.misalignments without any model calls or tool execution. Why: scenario authoring was ad-hoc and unreproducible. Reviewers had to hand-translate findings into adversarial tests. This command makes the static-to-dynamic transition deterministic, traceable (source_findings links back into report.json), and CI-enforceable via --strict (exit 20 when any critical/high mapped finding has no covering scenario). Implementation: - report/scenario_export.py: pure derivation (per-tool fan-out, adversarial-goal templates per SuggestedScenarioType, deterministic yaml.safe_dump with explicit key ordering). - cli/scenario.py: Typer shell with --from/--out/--min-severity /--json/--strict and a local _emit_input_error matching the apply_patches pattern (no circular import). - capability_diff.py: extracted scenario_type_for_finding() so the in-report grouping and the standalone YAML export share one predicate and cannot drift. - Coverage gate is honest: findings whose predicate yields None (e.g. SHIP-API-RETRY-WITHOUT-IDEMPOTENCY today) are out of scope, not silent failures. Test #9 documents this contract. Adds: - 13 tests covering golden YAML, determinism, six-category coverage, severity filtering, suppressed-finding skip, agent-level scenarios (tool=None), --strict exit 20, predicate-parity with the in-report grouping, and JSON/YAML envelope agreement. - STABILITY.md and AGENTS.md entries documenting the YAML envelope shape ({scenarios, coverage_gaps?}) as part of the 0.x contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three follow-ups on PR #46: - Slug collisions (e.g. tools `a.b` and `a/b` both slugging to `a_b`) no longer produce duplicate scenario IDs. `_disambiguate_ids()` detects collisions at composition time and appends a short deterministic hash of the original tool name only when needed, so the common case stays clean. Adds a regression test. - `_index_findings()` now also indexes by `check_id`, matching `_finding_ref()`'s `id or fingerprint or check_id` fallback chain. Older/hand-built reports with `id=None, fingerprint=None, finding_refs=[check_id]` now resolve correctly instead of becoming silent strict-mode coverage gaps. Adds a regression test. - Agent-mode `next_action` hint pointed to the wrong scan flags (`--json --out report.json`); corrected to the actual stable surface (`-c shipgate.yaml --format json --out agents-shipgate-reports`). Golden YAML is unchanged — the disambiguator only fires on real collisions, which the support_refund_agent fixture doesn't have. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 0.x contract said `id` is exactly `<scenario_type>__<tool_slug>` or `<scenario_type>__agent`, which is no longer true after the collision-disambiguation fix in c81fe55. Note the optional `__<hash8>` suffix that's appended only when distinct tool names slug to the same value, so downstream parsers/validators know to expect it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The exporter intentionally supports finding_refs that fall back to `check_id` when both `id` and `fingerprint` are absent (c81fe55), which means `source_findings` may contain a check ID like `SHIP-POLICY-APPROVAL-MISSING`. The stability text only mentioned id/fingerprint, which would lead downstream consumers to reject valid YAML from older or hand-built reports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pengfei-threemoonslab and others added 4 commits May 6, 2026 23:23

pengfei-threemoonslab closed this May 7, 2026

pengfei-threemoonslab deleted the claude/nervous-elbakyan-b1e17e branch May 7, 2026 06:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scenario suggest CLI for static-to-dynamic bridge#46

Add scenario suggest CLI for static-to-dynamic bridge#46
pengfei-threemoonslab wants to merge 4 commits intomainfrom
claude/nervous-elbakyan-b1e17e

pengfei-threemoonslab commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengfei-threemoonslab commented May 7, 2026

Summary

Type

Verification

Release-readiness notes

Reviewer notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant