feat(seaborn): implement marimekko-basic#5476
Conversation
AI Review - Attempt 1/3Image Description
Score: 78/100
Visual Quality (27/30)
Design Excellence (10/20)
Spec Compliance (14/15)
Data Quality (14/15)
Code Quality (9/10)
Library Mastery (4/10)
Score Caps Applied
Strengths
Weaknesses
Issues Found
AI Feedback for Next Attempt
Verdict: REJECTED |
Attempt 1/3 - fixes based on AI review
🔧 Repair Attempt 1/3Applied fixes based on AI review feedback. Status: Repair completed, re-triggering review... |
🔧 AI Review Produced No Score — Auto-RetryingThe Claude Code Action ran but didn't write |
❌ AI Review Failed (auto-retry exhausted)The AI review action completed but did not produce valid output files. Auto-retry already tried once. What happened:
Manual rerun: |
❌ AI Review Failed (auto-retry exhausted)The AI review action completed but did not produce valid output files. Auto-retry already tried once. What happened:
Manual rerun: |
…ompts (#5520) ## Summary Three workflows (`impl-review.yml`, `spec-create.yml`, `report-validate.yml`) used shell-style `$VAR` inside `with: prompt: |` blocks of `claude-code-action`. That block is a YAML string handed to a Node/Bun action — **no shell ever runs**, so `$VAR` was sent to Claude as a literal placeholder instead of the actual value. Result: Claude couldn't reliably identify the PR / spec / library to review and silently produced no `quality_score.txt`, which the validate step turns into `ai-review-failed`. ## Symptoms observed today (2026-04-29) 5 stuck implementation PRs from 2026-04-27, all with `ai-review-failed` despite the prior fixes branch (#5410) and the audit branch (#5515) landing in between: | PR | Branch | Pre-fix labels | |----|--------|----------------| | #5476 | seaborn/marimekko-basic | `ai-review-failed`, `quality:78` | | #5480 | altair/marimekko-basic | `ai-review-failed`, `quality:82` | | #5481 | letsplot/marimekko-basic | `ai-rejected`, `quality:76` | | #5483 | plotnine/marimekko-basic | `ai-review-failed` | | #5486 | plotly/line-basic | `ai-review-failed` | Re-dispatching review on each confirmed the bug: the run log of `Run AI Quality Review` shows the prompt being passed verbatim: ``` PROMPT: Read prompts/workflow-prompts/ai-quality-review.md and follow those instructions. Variables for this run: - LIBRARY: $LIBRARY # ← literal, never expanded - SPEC_ID: $SPEC_ID - PR_NUMBER: $PR_NUMBER - ATTEMPT: $ATTEMPT ``` Claude's review then either ran for ~20s and exited with no `quality_score.txt` (4 PRs failed), or recovered by inferring values from cwd (1 PR succeeded with `quality:82`). The intermittent pattern is exactly what you'd expect from "the prompt is ambiguous and Claude has to guess from context." ## Root cause Commit `252977cf3` ("chore: fix critical audit findings", 2026-04-28 22:46) routed several `${{ github.event.* }}` and step-output values through step-level `env:` and rewrote the in-prompt references as `$VAR`. That is the correct mitigation for `run:` shell steps and Python heredocs in the same workflows (and those changes stay in place). Inside `with: prompt: |` it is the wrong tool: the value is consumed by a JS action, not a shell, so there is no injection surface to mitigate and `$VAR` does not interpolate. `spec-create.yml` and `report-validate.yml` carry the identical anti-pattern in their `prompt:` blocks. They haven't surfaced as failures yet only because no triggering issue has come in since 2026-04-28. ## The fix Revert **only** the descriptive header lines of each `prompt:` block back to GitHub Actions Expression syntax (`${{ ... }}`), which the runner substitutes into the YAML string before the action receives it. Keep: - All `env:` blocks (harmless; lets future prompt content reference env vars if useful) - All `$VAR` references inside **embedded bash code samples** in the prompt (e.g. `gh issue edit $ISSUE_NUMBER`). Those are executed by Claude's Bash tool which inherits the step `env:` and expands them correctly — and rewriting them would re-enable the injection vector the audit was right to close. ```diff Variables for this run: - - LIBRARY: $LIBRARY - - SPEC_ID: $SPEC_ID - - PR_NUMBER: $PR_NUMBER - - ATTEMPT: $ATTEMPT + - LIBRARY: ${{ steps.pr.outputs.library }} + - SPEC_ID: ${{ steps.pr.outputs.specification_id }} + - PR_NUMBER: ${{ steps.pr.outputs.pr_number }} + - ATTEMPT: ${{ steps.attempts.outputs.display }} ``` (analogous 8-line revert in `spec-create.yml` × 2 prompt blocks and 4-line revert in `report-validate.yml`). Diff total: **3 files, 16 ±**. ## Test plan - [ ] After merge, redispatch `impl-review.yml` for the 4 stuck PRs (`gh workflow run impl-review.yml -f pr_number=<N>` for 5476, 5483, 5486; 5480 already got a 82 in the redispatch and should now stabilize) - [ ] Verify each run's `Run AI Quality Review` step log shows real values (e.g. `- LIBRARY: plotly`) in the PROMPT echo, not `$LIBRARY` - [ ] Verify `quality_score.txt` is produced and `ai-review-failed` label is removed - [ ] On next `spec-request`-labeled issue, verify the spec-create prompt sees the issue title/body - [ ] On next `report-pending`-labeled issue, verify the report-validate prompt sees the issue title/body 🤖 Generated with [Claude Code](https://claude.com/claude-code)
AI Review - Attempt 2/3Image Description
Score: 86/100
Visual Quality (29/30)
Design Excellence (13/20)
Spec Compliance (15/15)
Data Quality (15/15)
Code Quality (10/10)
Library Mastery (4/10)
Score Caps Applied
Strengths
Weaknesses
Issues Found
AI Feedback for Next Attempt
Verdict: REJECTED |
) ## Summary The 3 AI-approved implementation PRs from today (#5476, #5480, #5481) all hit `gh pr merge` failures with `the base branch policy prohibits the merge`. Root cause: the branch ruleset on `main` requires three status checks (`Run Linting`, `Run Tests`, `Run Frontend Tests`) — and impl-PRs created by `impl-generate.yml` never get those checks. ## Why CI doesn't run on impl-PRs `impl-generate.yml` (and `impl-repair.yml`, `impl-review.yml`) push commits to PR branches using `GITHUB_TOKEN`. By GitHub's anti-recursion design, pushes / PRs created with `GITHUB_TOKEN` do **not** trigger downstream `pull_request` or `workflow_run` events. Verified across all 5 stuck PRs: | PR | Branch | `Run Linting` ever ran? | |----|--------|--------------------------| | #5476 seaborn/marimekko-basic | yes (once, on a 04-27 impl-repair commit; newer score commits invalidated it) | | #5480 altair/marimekko-basic | no | | #5481 letsplot/marimekko-basic | no | | #5483 plotnine/marimekko-basic | no | | #5486 plotly/line-basic | no | So the merge is gated on a check that structurally cannot complete. ## The fix Add `--admin` to the `gh pr merge` call inside `impl-merge.yml`. This lets the pipeline complete autonomously without weakening main's protection for human PRs. ```diff + # --admin bypasses the branch ruleset's required-status-check + # gate. Required because impl-generate.yml pushes via GITHUB_TOKEN, + # which by GitHub's anti-recursion design does not trigger + # downstream CI workflows (Run Linting / Run Tests / Run Frontend + # Tests), so impl PRs never get those checks. The pipeline already + # gates merge behind the AI quality review threshold. if gh pr merge "$PR_NUM" \ --repo "$REPOSITORY" \ --squash \ + --admin \ --delete-branch; then ``` The merge is still gated by: - AI quality threshold (cascading 90 / 80 / 70 / 60 / 50 across initial review + 4 repair attempts) - `impl-merge.yml`'s own pre-merge "Validate PR completeness" step - The label-based trigger requiring `ai-approved` So `--admin` only bypasses the structurally-missing CI artifact, not the substantive review gates. ## Considered alternative Push from `impl-generate` / `impl-repair` / `impl-review` via a PAT instead of `GITHUB_TOKEN` so CI triggers naturally. Cleaner long-term but needs a maintained secret and a broader review of which workflows touch which branches; deferred. ## Test plan - [ ] After merge, dispatch `impl-merge.yml` (or trust the `ai-approved` label trigger) for the 3 stuck approved PRs (#5476, #5480, #5481) - [ ] Verify merge succeeds without retries on attempt 1 - [ ] Verify post-merge: metadata file created, GCS staging→production promotion done, `impl:{library}:done` label on parent issue 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…et (#5523) ## Summary Follow-up to #5521 (which added `--admin` to `gh pr merge`). That change alone wasn't enough — verified just now: 3 dispatched merges (#5476, #5480, #5481) all failed identically with: ``` GraphQL: Repository rule violations found 3 of 3 required status checks are expected. (mergePullRequest) ``` ## Why --admin alone didn't work The `main` ruleset's bypass list contains only `RepositoryRole admin` (mode: `pull_request`). Default `GITHUB_TOKEN` runs as `github-actions[bot]` with `write` role — not admin — so the API rejects the bypass. ```bash gh api repos/MarkusNeusinger/anyplot/rulesets/10578859 --jq '.bypass_actors' # [{"actor_id":5,"actor_type":"RepositoryRole","bypass_mode":"pull_request"}] ``` ## The fix Route **only the merge step** through a repo-admin PAT (`ADMIN_TOKEN`). All other steps in `impl-merge.yml` and the rest of the impl-* workflows keep using `GITHUB_TOKEN`. Bypass scope is therefore exactly one step, not the whole pipeline. ```diff - name: Merge PR to main (with retry) if: steps.check.outputs.should_run == 'true' env: - GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GH_TOKEN: ${{ secrets.ADMIN_TOKEN || secrets.GITHUB_TOKEN }} PR_NUM: ${{ steps.check.outputs.pr_number }} REPOSITORY: ${{ github.repository }} + HAS_ADMIN_TOKEN: ${{ secrets.ADMIN_TOKEN != '' }} run: | + if [ "$HAS_ADMIN_TOKEN" != "true" ]; then + echo "::warning::ADMIN_TOKEN secret is not set..." + fi ``` The fallback `secrets.ADMIN_TOKEN || secrets.GITHUB_TOKEN` and the warning preserve the previous behavior if `ADMIN_TOKEN` isn't set yet — workflow still runs, fails with the same ruleset error as before, but the log says clearly what's missing instead of an opaque auth error. ## Required after merge 1. **Create PAT**: Settings → Developer settings → Personal access tokens → Fine-grained - Repository: `anyplot` - Permissions: - Contents: Read+Write - Pull requests: Read+Write - Administration: Read+Write - Metadata: Read 2. **Set secret**: Settings → Secrets and variables → Actions → New repository secret - Name: `ADMIN_TOKEN` - Value: the PAT ## Considered alternatives | Option | Verdict | |--------|---------| | Add `github-actions[bot]` as bypass actor on ruleset | broader blast radius — *every* workflow run could bypass main | | Remove the 3 required checks from ruleset | weakens protection for human PRs too | | Push from impl-generate via PAT so CI triggers naturally | cleanest semantically but needs PAT in 3 workflows + same maintenance overhead | | **Scope PAT to merge step only (this PR)** | smallest blast radius, matches the actual permission gap | ## Test plan - [ ] Merge this PR - [ ] Create the fine-grained PAT and add as `ADMIN_TOKEN` repo secret - [ ] Re-dispatch `impl-merge.yml` for the 3 stuck approved PRs (#5476 seaborn, #5480 altair, #5481 letsplot) - [ ] Verify each merges successfully on attempt 1 (no ruleset error in run log) - [ ] Verify metadata file created, GCS staging→production promotion done, parent issue gets `impl:{library}:done` label 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Implementation:
marimekko-basic- python/seabornImplements the python/seaborn version of
marimekko-basic.File:
plots/marimekko-basic/implementations/python/seaborn.pyParent Issue: #1002
🤖 impl-generate workflow