fix(ci): make /security-review fail loudly when the model never runs#1482
Open
tejaskash wants to merge 2 commits into
Open
fix(ci): make /security-review fail loudly when the model never runs#1482tejaskash wants to merge 2 commits into
tejaskash wants to merge 2 commits into
Conversation
PR aws#1474 surfaced a silent failure mode: the bundled /security-review skill's SessionStart hook runs `git diff --name-only origin/HEAD...` as its first command. actions/checkout@v6 with `ref: <fork-head-sha>` and fetch-depth: 0 fetches the PR head's history but does not always fetch origin/<base>, so the diff resolves to `fatal: no merge base`. The hook errors, the SDK exits cleanly with num_turns=0 and zero tokens, the inline-comment buffer is empty, and the workflow falsely reports "no high-confidence findings." Two changes, both narrow: 1. Explicitly fetch origin/<base> before invoking the action and verify `git merge-base` resolves. Fail the step if it doesn't. 2. After the action runs, read its execution_file transcript and require num_turns > 0 / output_tokens > 0 / is_error == false. If the model never productively ran, fail the step and have the summary comment say so instead of pretending the review was clean.
Contributor
|
Claude Security Review: no high-confidence findings. (run) |
Contributor
Package TarballHow to installgh release download pr-1482-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.18.0.tgz |
agentcore-cli-automation
approved these changes
Jun 8, 2026
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
LGTM — this is a clean fix to the silent-failure bug.
Quick summary of what I checked:
git fetch+ merge-base sanity: confirmed the diagnosis.actions/checkout@v6with an explicitref:only fetches that ref's history; on fork PRs it doesn't necessarily landorigin/<base>in the local clone, sogit diff origin/HEAD...fails the SessionStart hook with "no merge base". The explicit refspec fetch +git merge-baseprecondition closes that gap and fails loudly when it can't resolve.execution_fileshape: verified againstanthropics/claude-code-action@v1(base-action/src/run-claude-sdk.ts). The action writes the full SDK message array viawriteExecutionFile(messages)on both the happy path and the SDK-error catch block, and the result envelope is the lasttype: 'result'message in the stream.jq '.[-1].num_turns // 0'is the right shape, andusage.output_tokensmatchesSDKResultSuccess/SDKResultErrorin@anthropic-ai/claude-agent-sdk's type definitions.- Failure-mode flow: when
model-ranexits 1, the implicitsuccess()onCount buffered findingsskips it, leavingFINDING_COUNTempty → defaults to0in the summary script. Theif (modelRan === 'false')branch wins and posts the "did not actually analyze" message instead of "no findings".Remove claude-security-reviewing labelruns underif: always(), so the label still gets cleaned up. ✓ ran=unknownpath: only triggers ifclaude-execution-output.jsonis missing entirely, which the action only does on catastrophic failure before the SDK loop — at which pointsteps.review.conclusionis almost certainly'failure', and the summary correctly falls through to the "review run failed" branch rather than lying about findings.
No blocking issues found.
Contributor
|
Claude Security Review: no high-confidence findings. (run) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/security-reviewskill's SessionStart hook runsgit diff --name-only origin/HEAD...as its first command.actions/checkout@v6withref: <fork-head-sha>andfetch-depth: 0fetches the PR head's history but does not always fetchorigin/<base>, so the diff resolves tofatal: no merge base. The hook errors, the SDK exits cleanly withnum_turns=0and zero tokens, the inline-comment buffer is empty, and the workflow's summary step still posts "no findings."<local-command-stderr>fatal: origin/HEAD...HEAD: no merge base</local-command-stderr>followed bysubtype: success, num_turns: 0.Changes
Prepare base ref for /security-review skillnow explicitlygit fetchesorigin/<base>and runsgit merge-base origin/<base> HEADas a sanity check. Fails the step if the merge base can't be resolved, instead of letting the skill silently bail downstream.Verify model actually ranstep reads the action'sexecution_file(a JSON array; final element is the SDK result envelope) and requiresnum_turns > 0,output_tokens > 0, andis_error == false. If the model never productively ran, the step fails and the summary comment posts "the review did not actually analyze this PR (model took 0 turns — the skill likely failed during setup)" instead of "no high-confidence findings."Together: change 1 fixes the immediate bug; change 2 ensures any future cause of "SDK exits clean without the model doing real work" no longer reports a false-clean review.
Test plan
Prepare base refstep logs the merge base + file count.Verify model actually ranreports a sensiblenum_turns/output_tokenson a successful run.git fetchline locally) and confirm the workflow now fails with the explicit error and the summary comment posts the failure message rather than "no findings."