Skip to content

fix(ci): make /security-review fail loudly when the model never runs#1482

Open
tejaskash wants to merge 2 commits into
aws:mainfrom
tejaskash:fix/security-review-merge-base
Open

fix(ci): make /security-review fail loudly when the model never runs#1482
tejaskash wants to merge 2 commits into
aws:mainfrom
tejaskash:fix/security-review-merge-base

Conversation

@tejaskash

Copy link
Copy Markdown
Contributor

Summary

  • Fix the silent failure mode that produced a false "no high-confidence findings" comment on fix: resolve create harness Dockerfile paths from command cwd #1474.
  • Root cause: the bundled /security-review skill's SessionStart hook runs git diff --name-only origin/HEAD... as its first command. actions/checkout@v6 with ref: <fork-head-sha> and fetch-depth: 0 fetches the PR head's history but does not always fetch origin/<base>, so the diff resolves to fatal: no merge base. The hook errors, the SDK exits cleanly with num_turns=0 and zero tokens, the inline-comment buffer is empty, and the workflow's summary step still posts "no findings."
  • Verified in run 27159048605 — the SDK transcript shows <local-command-stderr>fatal: origin/HEAD...HEAD: no merge base</local-command-stderr> followed by subtype: success, num_turns: 0.

Changes

  1. Make the merge base actually exist. Prepare base ref for /security-review skill now explicitly git fetches origin/<base> and runs git merge-base origin/<base> HEAD as a sanity check. Fails the step if the merge base can't be resolved, instead of letting the skill silently bail downstream.
  2. Catch silent skill bailouts in general. New Verify model actually ran step reads the action's execution_file (a JSON array; final element is the SDK result envelope) and requires num_turns > 0, output_tokens > 0, and is_error == false. If the model never productively ran, the step fails and the summary comment posts "the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup)" instead of "no high-confidence findings."

Together: change 1 fixes the immediate bug; change 2 ensures any future cause of "SDK exits clean without the model doing real work" no longer reports a false-clean review.

Test plan

  • Re-run this workflow on a fork PR (e.g. fix: resolve create harness Dockerfile paths from command cwd #1474) and confirm the Prepare base ref step logs the merge base + file count.
  • Confirm Verify model actually ran reports a sensible num_turns / output_tokens on a successful run.
  • Manually break it (delete the git fetch line locally) and confirm the workflow now fails with the explicit error and the summary comment posts the failure message rather than "no findings."

PR aws#1474 surfaced a silent failure mode: the bundled /security-review
skill's SessionStart hook runs `git diff --name-only origin/HEAD...` as
its first command. actions/checkout@v6 with `ref: <fork-head-sha>` and
fetch-depth: 0 fetches the PR head's history but does not always fetch
origin/<base>, so the diff resolves to `fatal: no merge base`. The hook
errors, the SDK exits cleanly with num_turns=0 and zero tokens, the
inline-comment buffer is empty, and the workflow falsely reports
"no high-confidence findings."

Two changes, both narrow:

1. Explicitly fetch origin/<base> before invoking the action and verify
   `git merge-base` resolves. Fail the step if it doesn't.
2. After the action runs, read its execution_file transcript and require
   num_turns > 0 / output_tokens > 0 / is_error == false. If the model
   never productively ran, fail the step and have the summary comment
   say so instead of pretending the review was clean.
@tejaskash tejaskash requested a review from a team June 8, 2026 20:31
@github-actions github-actions Bot added the size/s PR size: S label Jun 8, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 8, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.18.0.tgz

How to install

gh release download pr-1482-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.18.0.tgz

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — this is a clean fix to the silent-failure bug.

Quick summary of what I checked:

  1. git fetch + merge-base sanity: confirmed the diagnosis. actions/checkout@v6 with an explicit ref: only fetches that ref's history; on fork PRs it doesn't necessarily land origin/<base> in the local clone, so git diff origin/HEAD... fails the SessionStart hook with "no merge base". The explicit refspec fetch + git merge-base precondition closes that gap and fails loudly when it can't resolve.
  2. execution_file shape: verified against anthropics/claude-code-action@v1 (base-action/src/run-claude-sdk.ts). The action writes the full SDK message array via writeExecutionFile(messages) on both the happy path and the SDK-error catch block, and the result envelope is the last type: 'result' message in the stream. jq '.[-1].num_turns // 0' is the right shape, and usage.output_tokens matches SDKResultSuccess / SDKResultError in @anthropic-ai/claude-agent-sdk's type definitions.
  3. Failure-mode flow: when model-ran exits 1, the implicit success() on Count buffered findings skips it, leaving FINDING_COUNT empty → defaults to 0 in the summary script. The if (modelRan === 'false') branch wins and posts the "did not actually analyze" message instead of "no findings". Remove claude-security-reviewing label runs under if: always(), so the label still gets cleaned up. ✓
  4. ran=unknown path: only triggers if claude-execution-output.json is missing entirely, which the action only does on catastrophic failure before the SDK loop — at which point steps.review.conclusion is almost certainly 'failure', and the summary correctly falls through to the "review run failed" branch rather than lying about findings.

No blocking issues found.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 8, 2026
@tejaskash tejaskash deployed to e2e-testing June 8, 2026 21:53 — with GitHub Actions Active
@github-actions github-actions Bot added size/s PR size: S and removed size/s PR size: S labels Jun 8, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants