Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 26 additions & 37 deletions .github/workflows/bot-ai-review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -168,50 +168,40 @@ jobs:
gh api repos/${{ github.repository }}/issues/${{ steps.metadata.outputs.pr_number }}/reactions \
-f content=eyes

- name: Trigger Claude Quality Check
- name: Run Claude AI Quality Review
if: steps.check.outputs.should_run == 'true' && steps.pr.outputs.skip != 'true' && steps.attempts.outputs.count != '3'
uses: actions/github-script@v8
timeout-minutes: 30
uses: anthropics/claude-code-action@v1
Comment on lines +171 to +174
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing step to increment the attempt label. The workflow reads the attempt count from PR labels (lines 136-152), but there's no step that adds the ai-attempt-X label to track that this attempt has been performed.

In the gen-update-plot.yml workflow, this is done at line 222:

gh pr edit $PR_NUMBER --remove-label "ai-rejected" --add-label "ai-attempt-${ATTEMPT}"

Without this step, the attempt counter will always remain at 0, causing the following issues:

  1. The AI review will always show "Attempt 0/3" (or "Attempt 1/3" if the off-by-one bug is fixed)
  2. The workflow will never reach the "Mark as failed after 3 attempts" step
  3. Multiple AI reviews could run in parallel on the same PR

Suggested fix: Add a new step before the AI review to add the attempt label:

- name: Update attempt label
  if: steps.check.outputs.should_run == 'true' && steps.pr.outputs.skip != 'true' && steps.attempts.outputs.count != '3'
  env:
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  run: |
    NEXT_ATTEMPT=$((${{ steps.attempts.outputs.count }} + 1))
    gh pr edit ${{ steps.metadata.outputs.pr_number }} --add-label "ai-attempt-${NEXT_ATTEMPT}"

Copilot uses AI. Check for mistakes.
with:
script: |
const specId = '${{ steps.pr.outputs.spec_id }}';
const library = '${{ steps.pr.outputs.library }}';
const attempt = parseInt('${{ steps.attempts.outputs.count }}') + 1;
const prNumber = ${{ steps.metadata.outputs.pr_number }};
const subIssueNumber = '${{ steps.pr.outputs.sub_issue }}';
const mainIssueNumber = '${{ steps.metadata.outputs.issue_number }}';
claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
claude_args: "--model opus"
prompt: |
## Task: AI Quality Review for **${{ steps.pr.outputs.library }}** (Attempt ${{ steps.attempts.outputs.count }}/3)
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attempt count displayed in the prompt is off by one. The step "Check attempt count" (lines 136-152) returns the current attempt count (0, 1, 2, or 3), but the prompt should show the next attempt number being performed.

For example, when steps.attempts.outputs.count is "0", this is actually the first attempt, so the prompt should show "Attempt 1/3", not "Attempt 0/3".

The previous implementation (using github-script) handled this correctly by calculating parseInt('${{ steps.attempts.outputs.count }}') + 1.

Suggested fix:

## Task: AI Quality Review for **${{ steps.pr.outputs.library }}** (Attempt ${{ steps.attempts.outputs.count == '0' && '1' || steps.attempts.outputs.count == '1' && '2' || steps.attempts.outputs.count == '2' && '3' || steps.attempts.outputs.count }}/3)

Or better yet, add a new step before this one to calculate the next attempt number:

- name: Calculate next attempt
  id: next_attempt
  run: echo "number=$((${{ steps.attempts.outputs.count }} + 1))" >> $GITHUB_OUTPUT

Then use ${{ steps.next_attempt.outputs.number }} in the prompt.

Copilot uses AI. Check for mistakes.

await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: `@claude

## Task: AI Quality Review for **${library}** (Attempt ${attempt}/3)

Tests passed and preview images are ready. Evaluate if the **${library}** implementation matches the specification.
Tests passed and preview images are ready. Evaluate if the **${{ steps.pr.outputs.library }}** implementation matches the specification.

### Your Task

1. **Read the spec file**: \`specs/${specId}.md\`
1. **Read the spec file**: `specs/${{ steps.pr.outputs.spec_id }}.md`
- Note all quality criteria listed
- Understand the expected visual output

2. **Read the ${library} implementation**:
- \`plots/${library}/*/${specId}/default.py\`
2. **Read the ${{ steps.pr.outputs.library }} implementation**:
- `plots/${{ steps.pr.outputs.library }}/*/${{ steps.pr.outputs.spec_id }}/default.py`

3. **Read library-specific rules**:
- \`prompts/library/${library}.md\`
- `prompts/library/${{ steps.pr.outputs.library }}.md`

4. **View the plot images** in \`plot_images/\` directory
4. **View the plot images** in `plot_images/` directory
- Use your vision capabilities to analyze each image
- Compare with the spec requirements

5. **Evaluate against quality criteria** from \`prompts/quality-criteria.md\`
5. **Evaluate against quality criteria** from `prompts/quality-criteria.md`

6. **Post your verdict to Sub-Issue #${subIssueNumber}** using this EXACT format:
6. **Post your verdict to Sub-Issue #${{ steps.pr.outputs.sub_issue }}** using this EXACT format:

\`\`\`markdown
## AI Review - Attempt ${attempt}/3
```markdown
## AI Review - Attempt ${{ steps.attempts.outputs.count }}/3
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here - the attempt count is off by one. This line references steps.attempts.outputs.count which contains the current attempt count (0, 1, or 2), but should display the next attempt being performed (1, 2, or 3).

This inconsistency will cause confusion when the AI posts its review with an incorrect attempt number.

Copilot uses AI. Check for mistakes.

### Quality Evaluation
| Evaluator | Score | Verdict |
Expand All @@ -230,25 +220,24 @@ jobs:
2. **CQ-002 PARTIAL**: Docstring missing return type

### AI Feedback for Next Attempt
> Move legend outside plot area with \\\`bbox_to_anchor=(1.05, 1)\\\`
> Move legend outside plot area with `bbox_to_anchor=(1.05, 1)`
> Add return type to docstring

### Verdict: APPROVED / REJECTED
\`\`\`
```

7. **Take action based on result**:
- **APPROVED** (score >= 85):
- Run: \`gh pr edit ${prNumber} --add-label ai-approved\`
- Run: \`gh issue edit ${subIssueNumber} --remove-label reviewing --add-label ai-approved\`
- Run: `gh pr edit ${{ steps.metadata.outputs.pr_number }} --add-label ai-approved`
- Run: `gh issue edit ${{ steps.pr.outputs.sub_issue }} --remove-label reviewing --add-label ai-approved`
- **REJECTED** (score < 85):
- Run: \`gh pr edit ${prNumber} --add-label ai-rejected\`
- Run: \`gh issue edit ${subIssueNumber} --remove-label reviewing --add-label ai-rejected\`
- Run: `gh pr edit ${{ steps.metadata.outputs.pr_number }} --add-label ai-rejected`
- Run: `gh issue edit ${{ steps.pr.outputs.sub_issue }} --remove-label reviewing --add-label ai-rejected`

**IMPORTANT:**
- This is a **${library}-only** review - focus only on this library
- Post feedback to **Sub-Issue #${subIssueNumber}**, NOT the main issue
- Include the generated code in your review comment for documentation`
});
- This is a **${{ steps.pr.outputs.library }}-only** review - focus only on this library
- Post feedback to **Sub-Issue #${{ steps.pr.outputs.sub_issue }}**, NOT the main issue
- Include the generated code in your review comment for documentation

- name: Mark as failed after 3 attempts
if: steps.check.outputs.should_run == 'true' && steps.pr.outputs.skip != 'true' && steps.attempts.outputs.count == '3'
Expand Down
Loading