fix(ci): make /security-review fail loudly when the model never runs by tejaskash · Pull Request #1482 · aws/agentcore-cli

tejaskash · 2026-06-08T20:31:20Z

Summary

Fix the silent failure mode that produced a false "no high-confidence findings" comment on fix: resolve create harness Dockerfile paths from command cwd #1474.
Root cause: the bundled /security-review skill's SessionStart hook runs git diff --name-only origin/HEAD... as its first command. actions/checkout@v6 with ref: <fork-head-sha> and fetch-depth: 0 fetches the PR head's history but does not always fetch origin/<base>, so the diff resolves to fatal: no merge base. The hook errors, the SDK exits cleanly with num_turns=0 and zero tokens, the inline-comment buffer is empty, and the workflow's summary step still posts "no findings."
Verified in run 27159048605 — the SDK transcript shows <local-command-stderr>fatal: origin/HEAD...HEAD: no merge base</local-command-stderr> followed by subtype: success, num_turns: 0.

Changes

Make the merge base actually exist. Prepare base ref for /security-review skill now explicitly git fetches origin/<base> and runs git merge-base origin/<base> HEAD as a sanity check. Fails the step if the merge base can't be resolved, instead of letting the skill silently bail downstream.
Catch silent skill bailouts in general. New Verify model actually ran step reads the action's execution_file (a JSON array; final element is the SDK result envelope) and requires num_turns > 0, output_tokens > 0, and is_error == false. If the model never productively ran, the step fails and the summary comment posts "the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup)" instead of "no high-confidence findings."

Together: change 1 fixes the immediate bug; change 2 ensures any future cause of "SDK exits clean without the model doing real work" no longer reports a false-clean review.

Test plan

Re-run this workflow on a fork PR (e.g. fix: resolve create harness Dockerfile paths from command cwd #1474) and confirm the Prepare base ref step logs the merge base + file count.
Confirm Verify model actually ran reports a sensible num_turns / output_tokens on a successful run.
Manually break it (delete the git fetch line locally) and confirm the workflow now fails with the explicit error and the summary comment posts the failure message rather than "no findings."

PR aws#1474 surfaced a silent failure mode: the bundled /security-review skill's SessionStart hook runs `git diff --name-only origin/HEAD...` as its first command. actions/checkout@v6 with `ref: <fork-head-sha>` and fetch-depth: 0 fetches the PR head's history but does not always fetch origin/<base>, so the diff resolves to `fatal: no merge base`. The hook errors, the SDK exits cleanly with num_turns=0 and zero tokens, the inline-comment buffer is empty, and the workflow falsely reports "no high-confidence findings." Two changes, both narrow: 1. Explicitly fetch origin/<base> before invoking the action and verify `git merge-base` resolves. Fail the step if it doesn't. 2. After the action runs, read its execution_file transcript and require num_turns > 0 / output_tokens > 0 / is_error == false. If the model never productively ran, fail the step and have the summary comment say so instead of pretending the review was clean.

agentcore-devx-automation · 2026-06-08T20:32:07Z

Claude Security Review: no high-confidence findings. (run)

github-actions · 2026-06-08T20:33:29Z

Package Tarball

aws-agentcore-0.18.0.tgz

How to install

gh release download pr-1482-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.18.0.tgz

agentcore-cli-automation

LGTM — this is a clean fix to the silent-failure bug.

Quick summary of what I checked:

git fetch + merge-base sanity: confirmed the diagnosis. actions/checkout@v6 with an explicit ref: only fetches that ref's history; on fork PRs it doesn't necessarily land origin/<base> in the local clone, so git diff origin/HEAD... fails the SessionStart hook with "no merge base". The explicit refspec fetch + git merge-base precondition closes that gap and fails loudly when it can't resolve.
execution_file shape: verified against anthropics/claude-code-action@v1 (base-action/src/run-claude-sdk.ts). The action writes the full SDK message array via writeExecutionFile(messages) on both the happy path and the SDK-error catch block, and the result envelope is the last type: 'result' message in the stream. jq '.[-1].num_turns // 0' is the right shape, and usage.output_tokens matches SDKResultSuccess / SDKResultError in @anthropic-ai/claude-agent-sdk's type definitions.
Failure-mode flow: when model-ran exits 1, the implicit success() on Count buffered findings skips it, leaving FINDING_COUNT empty → defaults to 0 in the summary script. The if (modelRan === 'false') branch wins and posts the "did not actually analyze" message instead of "no findings". Remove claude-security-reviewing label runs under if: always(), so the label still gets cleaned up. ✓
ran=unknown path: only triggers if claude-execution-output.json is missing entirely, which the action only does on catastrophic failure before the SDK loop — at which point steps.review.conclusion is almost certainly 'failure', and the summary correctly falls through to the "review run failed" branch rather than lying about findings.

No blocking issues found.

agentcore-devx-automation · 2026-06-08T21:53:49Z

Claude Security Review: no high-confidence findings. (run)

tejaskash requested a review from a team June 8, 2026 20:31

github-actions Bot added the size/s PR size: S label Jun 8, 2026

tejaskash temporarily deployed to e2e-testing June 8, 2026 20:31 — with GitHub Actions Inactive

github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 8, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026

agentcore-cli-automation approved these changes Jun 8, 2026

View reviewed changes

github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 8, 2026

style: prettier

76ef2e4

tejaskash deployed to e2e-testing June 8, 2026 21:53 — with GitHub Actions Active

github-actions Bot added size/s PR size: S and removed size/s PR size: S labels Jun 8, 2026

agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026

agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): make /security-review fail loudly when the model never runs#1482

fix(ci): make /security-review fail loudly when the model never runs#1482
tejaskash wants to merge 2 commits into
aws:mainfrom
tejaskash:fix/security-review-merge-base

tejaskash commented Jun 8, 2026

Uh oh!

agentcore-devx-automation Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

agentcore-cli-automation left a comment

Uh oh!

agentcore-devx-automation Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tejaskash commented Jun 8, 2026

Summary

Changes

Test plan

Uh oh!

agentcore-devx-automation Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Package Tarball

How to install

Uh oh!

agentcore-cli-automation left a comment

Choose a reason for hiding this comment

Uh oh!

agentcore-devx-automation Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants