Skip to content

feat(seaborn): implement marimekko-basic#5476

Merged
MarkusNeusinger merged 7 commits intomainfrom
implementation/marimekko-basic/seaborn
Apr 29, 2026
Merged

feat(seaborn): implement marimekko-basic#5476
MarkusNeusinger merged 7 commits intomainfrom
implementation/marimekko-basic/seaborn

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Implementation: marimekko-basic - python/seaborn

Implements the python/seaborn version of marimekko-basic.

File: plots/marimekko-basic/implementations/python/seaborn.py

Parent Issue: #1002


🤖 impl-generate workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 27, 2026

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): A Marimekko (mosaic) chart on a warm off-white #FAF8F1 background showing market share by region and product line. Five variable-width bars represent North America, Europe, Asia Pacific, Latin America, and Middle East — Asia Pacific is the widest (largest total revenue), Middle East the narrowest. Each bar is stacked with four product categories: Electronics (teal-green, Okabe-Ito #009E73 at bottom), Apparel (orange, #D55E00), Food & Beverage (blue, #0072B2), and Home Goods (pink-purple, #CC79A7). White segment borders provide clear visual separation. Dollar-value labels appear on larger segments (e.g., $85B for Asia Pacific Electronics). X-axis reads "Region (width ∝ total revenue)", Y-axis reads "Product Mix (%)" with percentage ticks 0–100%. A horizontal legend "Product Line" sits below the chart. Title is "marimekko-basic · seaborn · anyplot.ai" in bold. All text is clearly readable against the light background — no light-on-light issues. Legibility verdict: PASS.

Dark render (plot-dark.png): Identical chart on a warm near-black #1A1A17 background. All four data colors are identical to the light render (Okabe-Ito positions 1–4) confirming correct theme isolation. Title, axis labels, and tick labels render in light-colored text clearly visible against the dark background. Legend box has a dark fill with light text. No dark-on-dark failures observed. Grid lines and spine are subtly visible. Brand green (#009E73, Electronics) remains fully legible on the dark surface. Legibility verdict: PASS.

Both paragraphs required — both renders inspected. Critical note: The images show "anyplot.ai" branding and proper light/dark theming, but the implementation code at plots/marimekko-basic/implementations/python/seaborn.py still uses "pyplots.ai" branding, has no ANYPLOT_THEME env-var handling, and saves to plot.png rather than plot-{THEME}.png. The images appear to have been generated from a newer version of the code not yet committed to the repository.

Score: 78/100

Category Score Max
Visual Quality 27 30
Design Excellence 10 20
Spec Compliance 14 15
Data Quality 14 15
Code Quality 9 10
Library Mastery 4 10
Total 78 100

Visual Quality (27/30)

  • VQ-01: Text Legibility (7/8) — Title 24pt, axis labels 20pt, tick labels 16pt all explicitly set and readable in both themes; value labels at 14pt and legend at 14pt are slightly below the recommended 16pt minimum
  • VQ-02: No Overlap (6/6) — No overlapping text; region labels well-centered, value labels correctly placed within segments
  • VQ-03: Element Visibility (6/6) — All Marimekko segments clearly visible with white edge borders for definition
  • VQ-04: Color Accessibility (2/2) — Okabe-Ito palette (CVD-safe); white edges provide additional segment definition
  • VQ-05: Layout & Canvas (3/4) — Chart fills canvas well; horizontal legend at bottom adds minor whitespace below chart
  • VQ-06: Axis Labels & Title (2/2) — "Region (width ∝ total revenue)" and "Product Mix (%)" are descriptive with context
  • VQ-07: Palette Compliance (1/2) — Images show Okabe-Ito first series (#009E73) and correct theme backgrounds; however, the code uses sns.color_palette("colorblind") (seaborn's colorblind palette, first color is blue #0173b2, not #009E73) rather than the explicit Okabe-Ito list, and contains no ANYPLOT_THEME logic — images appear to have come from a different code version

Design Excellence (10/20)

  • DE-01: Aesthetic Sophistication (4/8) — Clean, professional Marimekko with appropriate color palette and white segment borders; looks like a well-configured implementation without exceptional sophistication
  • DE-02: Visual Refinement (4/6) — Top/right spines removed via sns.despine(); y-axis-only grid applied; white segment edges add polish; grid is dashed rather than the preferred solid thin style
  • DE-03: Data Storytelling (2/6) — Data is displayed clearly with value labels but no visual emphasis guides the viewer to any particular insight; all regions treated equally

Spec Compliance (14/15)

  • SC-01: Plot Type (5/5) — Correct Marimekko/mosaic chart: bar widths proportional to region totals, segments proportional to product share within each region
  • SC-02: Required Features (4/4) — Variable-width bars, stacked proportional segments, value labels on larger segments, legend with product categories all present
  • SC-03: Data Mapping (3/3) — Regions correctly on x-axis with variable widths; product mix 0–100% on y-axis; all data visible
  • SC-04: Title & Legend (2/3) — Images show correct marimekko-basic · seaborn · anyplot.ai; code has outdated pyplots.ai branding; legend labels correct

Data Quality (14/15)

  • DQ-01: Feature Coverage (5/6) — Shows 5 regions with meaningfully different widths and 4 product categories with varying proportions; variation is present but not extreme
  • DQ-02: Realistic Context (5/5) — Market share by region and product line: real-world, business-neutral, comprehensible scenario
  • DQ-03: Appropriate Scale (4/4) — Revenue in billions with realistic values; region-size differences are plausible

Code Quality (9/10)

  • CQ-01: KISS Structure (3/3) — Linear flow: imports → data → compute widths → draw patches → configure axes → save
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set (data is deterministic, seed is defensive)
  • CQ-03: Clean Imports (2/2) — All five imports (mpatches, plt, np, pd, sns) are used
  • CQ-04: Code Elegance (2/2) — Clean, readable, no over-engineering; acknowledges seaborn's Marimekko limitation honestly in comment
  • CQ-05: Output & API (0/1) — Saves to plot.png instead of plot-{THEME}.png; no ANYPLOT_THEME env-var handling; sns.set_style()/sns.set_context() are used instead of the current recommended sns.set_theme(style="ticks", rc={...}) pattern with theme-adaptive tokens

Library Mastery (4/10)

  • LM-01: Idiomatic Usage (3/5) — Uses seaborn appropriately for what it can do (set_style, set_context, color_palette, despine); since Marimekko requires matplotlib patches seaborn usage is necessarily limited; however does not use the recommended sns.set_theme() with theme-adaptive rc dict pattern
  • LM-02: Distinctive Features (1/5) — Seaborn used only for styling utilities; the chart itself is entirely matplotlib patches since no native seaborn Marimekko function exists; minimal distinctive seaborn contribution

Score Caps Applied

  • None applied — DE-01=4, DE-02=4 both exceed cap threshold of ≤2

Strengths

  • Correct Marimekko implementation: variable-width bars and proportional stacking are both accurate
  • White segment borders between colored rectangles provide crisp visual separation
  • Descriptive axis labels with context (proportional symbol, percentage unit)
  • Clean code structure with honest comment about seaborn's limitation
  • Realistic, neutral market-share scenario with varied data across regions

Weaknesses

  • Critical: Code has no ANYPLOT_THEME environment variable handling — both plot-light.png and plot-dark.png cannot be produced from this code; sns.set_theme(style="ticks", rc={...}) with PAGE_BG, INK, INK_SOFT tokens must be added per the seaborn library guide
  • Critical: Saves to plot.png instead of plot-{THEME}.png — the pipeline requires plot-light.png and plot-dark.png
  • Critical: Title uses outdated "pyplots.ai" branding; must be "anyplot.ai"
  • Palette uses sns.color_palette("colorblind") — first color is blue (#0173b2), not the required brand green (#009E73); must use the explicit OKABE_ITO = ['#009E73', '#D55E00', '#0072B2', '#CC79A7'] list
  • Value labels and legend font size set to 14pt; should be 16pt per style guide
  • Grid uses dashed linestyle (linestyle="--"); style guide prefers solid thin lines
  • Design Excellence is low: no visual hierarchy or emphasis to guide the viewer to the key insight

Issues Found

  1. CQ-05 / THEME CRITICAL: No ANYPLOT_THEME support, saves to wrong filename
    • Fix: Add THEME = os.getenv("ANYPLOT_THEME", "light"), compute PAGE_BG/ELEVATED_BG/INK/INK_SOFT, use sns.set_theme(style="ticks", rc={...}), save as plt.savefig(f'plot-{THEME}.png', ...)
  2. SC-04 / VQ-07 BRANDING: Title has "pyplots.ai" — must be "anyplot.ai"
    • Fix: ax.set_title("marimekko-basic · seaborn · anyplot.ai", ...)
  3. VQ-07 PALETTE: Uses sns.color_palette("colorblind") — first color is blue, not #009E73
    • Fix: OKABE_ITO = ['#009E73', '#D55E00', '#0072B2', '#CC79A7']; colors = OKABE_ITO
  4. VQ-01 / LEGEND SIZE: Value labels and legend fontsize=14 are below the 16pt minimum
    • Fix: Set fontsize=16 for legend and value labels
  5. DE-03 LOW: No visual hierarchy or storytelling emphasis
    • Fix: Consider subtle size variation or annotation highlighting the largest market or most dominant product category

AI Feedback for Next Attempt

Three critical fixes required before any design improvements: (1) Add full theme support — read ANYPLOT_THEME, compute token variables (PAGE_BG, ELEVATED_BG, INK, INK_SOFT), apply via sns.set_theme(style="ticks", rc={...}), and save as plot-{THEME}.png. (2) Update title to "anyplot.ai". (3) Switch palette to explicit OKABE_ITO = ['#009E73', '#D55E00', '#0072B2', '#CC79A7']. After those: raise fontsize for value labels and legend to 16pt, change grid linestyle from dashed to solid, and add subtle emphasis (e.g., slightly bolder border or annotation on the largest segment) to improve storytelling.

Verdict: REJECTED

@github-actions github-actions Bot added quality:78 Quality score 78/100 ai-rejected Quality not OK, triggers update labels Apr 27, 2026
@github-actions github-actions Bot added ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels Apr 27, 2026
Attempt 1/3 - fixes based on AI review
@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 Repair Attempt 1/3

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 AI Review Produced No Score — Auto-Retrying

The Claude Code Action ran but didn't write quality_score.txt. Auto-retrying review once...


🤖 impl-review

@github-actions
Copy link
Copy Markdown
Contributor Author

❌ AI Review Failed (auto-retry exhausted)

The AI review action completed but did not produce valid output files. Auto-retry already tried once.

What happened:

  • The Claude Code Action ran
  • No quality_score.txt file was created

Manual rerun:

gh workflow run impl-review.yml -f pr_number=5476

🤖 impl-review

@github-actions github-actions Bot added the ai-review-failed AI review action failed or timed out label Apr 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

❌ AI Review Failed (auto-retry exhausted)

The AI review action completed but did not produce valid output files. Auto-retry already tried once.

What happened:

  • The Claude Code Action ran
  • No quality_score.txt file was created

Manual rerun:

gh workflow run impl-review.yml -f pr_number=5476

🤖 impl-review

MarkusNeusinger added a commit that referenced this pull request Apr 29, 2026
…ompts (#5520)

## Summary

Three workflows (`impl-review.yml`, `spec-create.yml`,
`report-validate.yml`) used shell-style `$VAR` inside `with: prompt: |`
blocks of `claude-code-action`. That block is a YAML string handed to a
Node/Bun action — **no shell ever runs**, so `$VAR` was sent to Claude
as a literal placeholder instead of the actual value. Result: Claude
couldn't reliably identify the PR / spec / library to review and
silently produced no `quality_score.txt`, which the validate step turns
into `ai-review-failed`.

## Symptoms observed today (2026-04-29)

5 stuck implementation PRs from 2026-04-27, all with `ai-review-failed`
despite the prior fixes branch (#5410) and the audit branch (#5515)
landing in between:

| PR | Branch | Pre-fix labels |
|----|--------|----------------|
| #5476 | seaborn/marimekko-basic | `ai-review-failed`, `quality:78` |
| #5480 | altair/marimekko-basic | `ai-review-failed`, `quality:82` |
| #5481 | letsplot/marimekko-basic | `ai-rejected`, `quality:76` |
| #5483 | plotnine/marimekko-basic | `ai-review-failed` |
| #5486 | plotly/line-basic | `ai-review-failed` |

Re-dispatching review on each confirmed the bug: the run log of `Run AI
Quality Review` shows the prompt being passed verbatim:

```
PROMPT: Read prompts/workflow-prompts/ai-quality-review.md and follow those instructions.

Variables for this run:
- LIBRARY: $LIBRARY    # ← literal, never expanded
- SPEC_ID: $SPEC_ID
- PR_NUMBER: $PR_NUMBER
- ATTEMPT: $ATTEMPT
```

Claude's review then either ran for ~20s and exited with no
`quality_score.txt` (4 PRs failed), or recovered by inferring values
from cwd (1 PR succeeded with `quality:82`). The intermittent pattern is
exactly what you'd expect from "the prompt is ambiguous and Claude has
to guess from context."

## Root cause

Commit `252977cf3` ("chore: fix critical audit findings", 2026-04-28
22:46) routed several `${{ github.event.* }}` and step-output values
through step-level `env:` and rewrote the in-prompt references as
`$VAR`. That is the correct mitigation for `run:` shell steps and Python
heredocs in the same workflows (and those changes stay in place). Inside
`with: prompt: |` it is the wrong tool: the value is consumed by a JS
action, not a shell, so there is no injection surface to mitigate and
`$VAR` does not interpolate.

`spec-create.yml` and `report-validate.yml` carry the identical
anti-pattern in their `prompt:` blocks. They haven't surfaced as
failures yet only because no triggering issue has come in since
2026-04-28.

## The fix

Revert **only** the descriptive header lines of each `prompt:` block
back to GitHub Actions Expression syntax (`${{ ... }}`), which the
runner substitutes into the YAML string before the action receives it.
Keep:

- All `env:` blocks (harmless; lets future prompt content reference env
vars if useful)
- All `$VAR` references inside **embedded bash code samples** in the
prompt (e.g. `gh issue edit $ISSUE_NUMBER`). Those are executed by
Claude's Bash tool which inherits the step `env:` and expands them
correctly — and rewriting them would re-enable the injection vector the
audit was right to close.

```diff
             Variables for this run:
-            - LIBRARY: $LIBRARY
-            - SPEC_ID: $SPEC_ID
-            - PR_NUMBER: $PR_NUMBER
-            - ATTEMPT: $ATTEMPT
+            - LIBRARY: ${{ steps.pr.outputs.library }}
+            - SPEC_ID: ${{ steps.pr.outputs.specification_id }}
+            - PR_NUMBER: ${{ steps.pr.outputs.pr_number }}
+            - ATTEMPT: ${{ steps.attempts.outputs.display }}
```

(analogous 8-line revert in `spec-create.yml` × 2 prompt blocks and
4-line revert in `report-validate.yml`).

Diff total: **3 files, 16 ±**.

## Test plan

- [ ] After merge, redispatch `impl-review.yml` for the 4 stuck PRs (`gh
workflow run impl-review.yml -f pr_number=<N>` for 5476, 5483, 5486;
5480 already got a 82 in the redispatch and should now stabilize)
- [ ] Verify each run's `Run AI Quality Review` step log shows real
values (e.g. `- LIBRARY: plotly`) in the PROMPT echo, not `$LIBRARY`
- [ ] Verify `quality_score.txt` is produced and `ai-review-failed`
label is removed
- [ ] On next `spec-request`-labeled issue, verify the spec-create
prompt sees the issue title/body
- [ ] On next `report-pending`-labeled issue, verify the report-validate
prompt sees the issue title/body

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

AI Review - Attempt 2/3

Image Description

Light render (plot-light.png): The plot displays a Marimekko (mosaic) chart on a warm off-white background (~#FAF8F1). Five variable-width vertical bars represent regions (North America, Europe, Asia Pacific, Latin America, Middle East) with widths proportional to their total revenue. Each bar is subdivided into four colored segments representing product lines: Electronics (green #009E73, bottom), Apparel (orange #D55E00), Food & Beverage (blue #0072B2), and Home Goods (pink #CC79A7, top). White borders separate segments cleanly. Dollar value labels (e.g., "$45B", "$48B") appear in bold white text on all segments above 12% height. The title "marimekko-basic · seaborn · anyplot.ai" is bold and dark at the top; below it sits an italic subtitle "Asia Pacific leads with $203B total revenue — Electronics is the top product line globally" in soft gray. Axis labels "Region (width ∝ total revenue)" (x) and "Product Mix (%)" (y) are clearly visible. A horizontal legend box sits below the chart. All text is readable against the light background.

Dark render (plot-dark.png): The same chart on a warm near-black background (~#1A1A17). Title, subtitle, axis labels, and tick labels all render in light-colored text, clearly visible against the dark surface. The elevated-dark legend box has dark fill with light text. All four Okabe-Ito data colors (Electronics=#009E73, Apparel=#D55E00, Food&Bev=#0072B2, HomeGoods=#CC79A7) are identical to the light render — only chrome flips. White segment edges remain effective on the dark background. No dark-on-dark failures observed. All text is readable in dark theme.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 86/100

Category Score Max
Visual Quality 29 30
Design Excellence 13 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 4 10
Total 86 100

Visual Quality (29/30)

  • VQ-01: Text Legibility (8/8) — All sizes explicitly set: title=24pt, axis labels=20pt, tick+value+legend labels=16pt; readable in both themes
  • VQ-02: No Overlap (5/6) — Mostly clean; "Latin America" and "Middle East" x-tick labels are close on their narrow bars (bars occupy ~11.5% and ~7.7% of width) — minor crowding but still legible
  • VQ-03: Element Visibility (6/6) — All segments clearly visible; white edges provide crisp separation
  • VQ-04: Color Accessibility (2/2) — Okabe-Ito palette is CVD-safe; white edges add additional contrast
  • VQ-05: Layout & Canvas (4/4) — Marimekko fills canvas well within the 50–80% target; balanced margins; compact legend at bottom
  • VQ-06: Axis Labels & Title (2/2) — "Region (width ∝ total revenue)" and "Product Mix (%)" are descriptive with units/context
  • VQ-07: Palette Compliance (2/2) — First series is #009E73; full Okabe-Ito order; light bg #FAF8F1 and dark bg #1A1A17 both correct; theme-adaptive chrome properly applied

Design Excellence (13/20)

  • DE-01: Aesthetic Sophistication (5/8) — Above a default configured library output: Okabe-Ito palette, italic subtitle, white segment edges, and proper text hierarchy. Does not yet reach "strong design" (6) — lacks exceptional sophistication such as regional total annotations on x-axis or visual emphasis on the dominant bar.
  • DE-02: Visual Refinement (4/6) — Top/right spines removed; solid y-axis-only grid at alpha=0.10; white edges polish segment boundaries; italic subtitle styling. Not all details are polished (e.g., no region totals on axis labels, no visual distinction for the key bar).
  • DE-03: Data Storytelling (4/6) — Subtitle explicitly calls out the main insight. Visual hierarchy is present (Asia Pacific bar is visibly widest). However, no further visual emphasis guides the viewer to the focal point — other bars are treated identically to the dominant one.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct Marimekko: bar widths proportional to region totals, segment heights proportional to product share within each region
  • SC-02: Required Features (4/4) — Variable-width bars, stacked proportional segments, value labels on larger segments, color-coded legend all present
  • SC-03: Data Mapping (3/3) — Regions on x-axis with correct variable widths; product mix 0–100% on y-axis; all data visible
  • SC-04: Title & Legend (3/3) — Title format "marimekko-basic · seaborn · anyplot.ai" correct; "Product Line" legend with correct four labels

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Five regions with meaningfully different widths (AP ~35% vs ME ~8%) and four product lines with varying proportional mixes across regions
  • DQ-02: Realistic Context (5/5) — Global market-share by region and product line: real-world, business-neutral, readily comprehensible scenario
  • DQ-03: Appropriate Scale (4/4) — Revenue in billions; Asia Pacific at $203B vs Middle East at $45B is geographically plausible

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Linear flow: imports → tokens → data → compute widths → draw patches → configure axes → save
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set; data is deterministic
  • CQ-03: Clean Imports (2/2) — All six imports (os, mpatches, plt, np, pd, sns) are actively used
  • CQ-04: Code Elegance (2/2) — Clean, readable; nested loops are appropriate for Marimekko geometry; comment about seaborn limitation is honest
  • CQ-05: Output & API (1/1) — Saves as plot-{THEME}.png; current sns.set_theme API with theme-adaptive rc dict

Library Mastery (4/10)

  • LM-01: Idiomatic Usage (3/5) — Uses seaborn's recommended sns.set_theme(rc={...}) pattern with full theme-adaptive tokens, and sns.despine(). Correct but majority of visualization work is matplotlib patches (seaborn has no native Marimekko).
  • LM-02: Distinctive Features (1/5) — Seaborn used only for theme/styling utilities. No seaborn plot functions (sns.barplot, sns.heatmap, etc.) are possible for this chart type. Minimal seaborn contribution beyond matplotlib.

Score Caps Applied

  • None

Strengths

  • All Attempt 1 regressions fixed: theme-adaptive chrome, plot-{THEME}.png output, correct Okabe-Ito palette starting at #009E73, correct branding, solid grid lines
  • Subtitle storytelling ("Asia Pacific leads with $203B") adds a clear narrative entry point
  • White segment borders between colored rectangles provide excellent visual separation
  • Perfect spec compliance and data quality: realistic market-share scenario with natural variation
  • Clean code with honest comment about seaborn's Marimekko limitation

Weaknesses

  • Design Excellence gap (13/20): DE-01 is 5/8 — needs visual hierarchy beyond the subtitle to guide viewer to the Asia Pacific focal point; consider adding regional total revenue to x-axis tick labels (e.g., "Asia Pacific\n$203B") so bar widths are self-describing; DE-03 is 4/6 — no visual emphasis differentiates the dominant bar
  • Library Mastery low (4/10): inherent to Marimekko requiring matplotlib patches — improve by maximizing seaborn-idiomatic pattern usage (e.g., use sns.set_palette(OKABE_ITO) before patching, expose any seaborn color utility that adds value)
  • "Latin America" / "Middle East" x-tick labels sit close on narrow bars — minor crowding risk

Issues Found

  1. DE-01/DE-03 LOW: No visual emphasis on the dominant Asia Pacific bar; all bars styled identically
    • Fix: Add regional totals to x-axis tick labels ("Asia Pacific\n$203B" etc.) — makes bar widths self-explaining and rewards the viewer for the chart's variable-width feature; optionally reduce alpha slightly on non-dominant bars to create a focal point
  2. LM-02 LOW: Only seaborn styling utilities used; actual chart is all matplotlib
    • Fix: Add sns.set_palette(OKABE_ITO) explicitly (shows awareness of seaborn's palette API); where feasible consider using sns.color_palette() to derive the patch colors rather than a raw list

AI Feedback for Next Attempt

This is a strong Attempt 2 — all critical Attempt 1 regressions are fixed and the implementation is technically solid. To reach 90+, focus on Design Excellence: (1) add regional total revenue to each x-axis tick label (e.g., multi-line "Asia Pacific\n$203B") so the variable-width encoding is self-documented and the chart reads independently without the axis footnote; (2) visually emphasize the Asia Pacific bar as the focal point — a bold tick label, a subtle brightness difference, or an annotation arrow — to complement the subtitle rather than leaving all bars identical; (3) optionally refine the legend: move it back to the right side (outside the axes) to reclaim vertical space and use the full canvas height for the Marimekko. Library Mastery (4/10) is structurally limited by seaborn's lack of a native Marimekko, but score it as high as possible by using sns.set_palette(OKABE_ITO) and sns.color_palette() to derive patch colors idiomatically.

Verdict: REJECTED

@github-actions github-actions Bot added quality:86 Quality score 86/100 ai-approved Quality OK, ready for merge labels Apr 29, 2026
MarkusNeusinger added a commit that referenced this pull request Apr 29, 2026
)

## Summary

The 3 AI-approved implementation PRs from today (#5476, #5480, #5481)
all hit `gh pr merge` failures with `the base branch policy prohibits
the merge`. Root cause: the branch ruleset on `main` requires three
status checks (`Run Linting`, `Run Tests`, `Run Frontend Tests`) — and
impl-PRs created by `impl-generate.yml` never get those checks.

## Why CI doesn't run on impl-PRs

`impl-generate.yml` (and `impl-repair.yml`, `impl-review.yml`) push
commits to PR branches using `GITHUB_TOKEN`. By GitHub's anti-recursion
design, pushes / PRs created with `GITHUB_TOKEN` do **not** trigger
downstream `pull_request` or `workflow_run` events. Verified across all
5 stuck PRs:

| PR | Branch | `Run Linting` ever ran? |
|----|--------|--------------------------|
| #5476 seaborn/marimekko-basic | yes (once, on a 04-27 impl-repair
commit; newer score commits invalidated it) |
| #5480 altair/marimekko-basic | no |
| #5481 letsplot/marimekko-basic | no |
| #5483 plotnine/marimekko-basic | no |
| #5486 plotly/line-basic | no |

So the merge is gated on a check that structurally cannot complete.

## The fix

Add `--admin` to the `gh pr merge` call inside `impl-merge.yml`. This
lets the pipeline complete autonomously without weakening main's
protection for human PRs.

```diff
+            # --admin bypasses the branch ruleset's required-status-check
+            # gate. Required because impl-generate.yml pushes via GITHUB_TOKEN,
+            # which by GitHub's anti-recursion design does not trigger
+            # downstream CI workflows (Run Linting / Run Tests / Run Frontend
+            # Tests), so impl PRs never get those checks. The pipeline already
+            # gates merge behind the AI quality review threshold.
             if gh pr merge "$PR_NUM" \
               --repo "$REPOSITORY" \
               --squash \
+              --admin \
               --delete-branch; then
```

The merge is still gated by:
- AI quality threshold (cascading 90 / 80 / 70 / 60 / 50 across initial
review + 4 repair attempts)
- `impl-merge.yml`'s own pre-merge "Validate PR completeness" step
- The label-based trigger requiring `ai-approved`

So `--admin` only bypasses the structurally-missing CI artifact, not the
substantive review gates.

## Considered alternative

Push from `impl-generate` / `impl-repair` / `impl-review` via a PAT
instead of `GITHUB_TOKEN` so CI triggers naturally. Cleaner long-term
but needs a maintained secret and a broader review of which workflows
touch which branches; deferred.

## Test plan

- [ ] After merge, dispatch `impl-merge.yml` (or trust the `ai-approved`
label trigger) for the 3 stuck approved PRs (#5476, #5480, #5481)
- [ ] Verify merge succeeds without retries on attempt 1
- [ ] Verify post-merge: metadata file created, GCS staging→production
promotion done, `impl:{library}:done` label on parent issue

🤖 Generated with [Claude Code](https://claude.com/claude-code)
MarkusNeusinger added a commit that referenced this pull request Apr 29, 2026
…et (#5523)

## Summary

Follow-up to #5521 (which added `--admin` to `gh pr merge`). That change
alone wasn't enough — verified just now: 3 dispatched merges (#5476,
#5480, #5481) all failed identically with:

```
GraphQL: Repository rule violations found
3 of 3 required status checks are expected.
(mergePullRequest)
```

## Why --admin alone didn't work

The `main` ruleset's bypass list contains only `RepositoryRole admin`
(mode: `pull_request`). Default `GITHUB_TOKEN` runs as
`github-actions[bot]` with `write` role — not admin — so the API rejects
the bypass.

```bash
gh api repos/MarkusNeusinger/anyplot/rulesets/10578859 --jq '.bypass_actors'
# [{"actor_id":5,"actor_type":"RepositoryRole","bypass_mode":"pull_request"}]
```

## The fix

Route **only the merge step** through a repo-admin PAT (`ADMIN_TOKEN`).
All other steps in `impl-merge.yml` and the rest of the impl-* workflows
keep using `GITHUB_TOKEN`. Bypass scope is therefore exactly one step,
not the whole pipeline.

```diff
       - name: Merge PR to main (with retry)
         if: steps.check.outputs.should_run == 'true'
         env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          GH_TOKEN: ${{ secrets.ADMIN_TOKEN || secrets.GITHUB_TOKEN }}
           PR_NUM: ${{ steps.check.outputs.pr_number }}
           REPOSITORY: ${{ github.repository }}
+          HAS_ADMIN_TOKEN: ${{ secrets.ADMIN_TOKEN != '' }}
         run: |
+          if [ "$HAS_ADMIN_TOKEN" != "true" ]; then
+            echo "::warning::ADMIN_TOKEN secret is not set..."
+          fi
```

The fallback `secrets.ADMIN_TOKEN || secrets.GITHUB_TOKEN` and the
warning preserve the previous behavior if `ADMIN_TOKEN` isn't set yet —
workflow still runs, fails with the same ruleset error as before, but
the log says clearly what's missing instead of an opaque auth error.

## Required after merge

1. **Create PAT**: Settings → Developer settings → Personal access
tokens → Fine-grained
   - Repository: `anyplot`
   - Permissions:
     - Contents: Read+Write
     - Pull requests: Read+Write
     - Administration: Read+Write
     - Metadata: Read
2. **Set secret**: Settings → Secrets and variables → Actions → New
repository secret
   - Name: `ADMIN_TOKEN`
   - Value: the PAT

## Considered alternatives

| Option | Verdict |
|--------|---------|
| Add `github-actions[bot]` as bypass actor on ruleset | broader blast
radius — *every* workflow run could bypass main |
| Remove the 3 required checks from ruleset | weakens protection for
human PRs too |
| Push from impl-generate via PAT so CI triggers naturally | cleanest
semantically but needs PAT in 3 workflows + same maintenance overhead |
| **Scope PAT to merge step only (this PR)** | smallest blast radius,
matches the actual permission gap |

## Test plan

- [ ] Merge this PR
- [ ] Create the fine-grained PAT and add as `ADMIN_TOKEN` repo secret
- [ ] Re-dispatch `impl-merge.yml` for the 3 stuck approved PRs (#5476
seaborn, #5480 altair, #5481 letsplot)
- [ ] Verify each merges successfully on attempt 1 (no ruleset error in
run log)
- [ ] Verify metadata file created, GCS staging→production promotion
done, parent issue gets `impl:{library}:done` label

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@MarkusNeusinger MarkusNeusinger merged commit 942f33b into main Apr 29, 2026
3 checks passed
@MarkusNeusinger MarkusNeusinger deleted the implementation/marimekko-basic/seaborn branch April 29, 2026 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge ai-attempt-1 First repair attempt ai-review-failed AI review action failed or timed out quality:78 Quality score 78/100 quality:86 Quality score 86/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant