feat(plotnine): implement strip-basic by github-actions[bot] · Pull Request #5674 · MarkusNeusinger/anyplot

github-actions · 2026-05-04T20:54:18Z

Implementation: `strip-basic` - python/plotnine

Implements the python/plotnine version of strip-basic.

File: plots/strip-basic/implementations/python/plotnine.py

Parent Issue: #975

🤖 impl-generate workflow

claude · 2026-05-04T20:59:57Z

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): Warm off-white background (#FAF8F1) — correct theme surface. Title "strip-basic · plotnine · anyplot.ai" in dark ink, clearly readable. Y-axis "Response Time (seconds)" and x-axis "Treatment Group" both visible in dark ink; tick labels in softer INK_SOFT. Subtle horizontal and vertical grid lines; panel has a thin rectangular border on all 4 sides. Four groups ordered alphabetically: Drug A (teal/green #009E73), Drug B (orange #D55E00), Drug C (blue #0072B2), Placebo (pink #CC79A7). ~40 points per group with horizontal jitter (width=0.25) and alpha=0.65. Placebo sits notably higher (25–70 s range), Drug C lowest (13–35 s range). No legend shown (correct — x-axis labels the categories). All text is dark and clearly readable against the warm off-white background. Legibility verdict: PASS.

Dark render (plot-dark.png): Warm near-black background (#1A1A17) — correct dark theme surface. Title and axis labels flip to light text (#F0EFE8), tick labels to #B8B7B0 — all clearly readable. Grid lines adapt to a slightly brighter appearance but remain subtle. Data colors are identical to the light render (Drug A green, Drug B orange, Drug C blue, Placebo pink) — confirming Okabe-Ito positions 1–4 are theme-independent. Point positions and distributions are the same. Brand green #009E73 remains clearly visible on the dark surface. No dark-on-dark failures detected. Legibility verdict: PASS.

Score: 88/100

Category	Score	Max
Visual Quality	30	30
Design Excellence	10	20
Spec Compliance	15	15
Data Quality	15	15
Code Quality	10	10
Library Mastery	8	10
Total	88	100

Visual Quality (30/30)

VQ-01: Text Legibility (8/8) — All font sizes explicitly set: title=24, labels=20, ticks=16. Readable in both themes.
VQ-02: No Overlap (6/6) — No overlapping text or data elements. Jitter spreads points cleanly.
VQ-03: Element Visibility (6/6) — size=4 with alpha=0.65 appropriate for ~40 points per group.
VQ-04: Color Accessibility (2/2) — Okabe-Ito is CVD-safe; alpha=0.65 provides good contrast.
VQ-05: Layout & Canvas (4/4) — 16:9 at dpi=300, canvas well-utilized, balanced margins.
VQ-06: Axis Labels & Title (2/2) — "Response Time (seconds)" with units; "Treatment Group" descriptive.
VQ-07: Palette Compliance (2/2) — First series (Drug A) is #009E73. Okabe-Ito positions 1–4 in order. Backgrounds #FAF8F1/#1A1A17. Chrome flips correctly between themes.

Design Excellence (10/20)

DE-01: Aesthetic Sophistication (4/8) — Well-configured library default. Clean and correct but not exceptional; no emphasis technique (mean lines, ordering by effect) elevates it beyond standard output.
DE-02: Visual Refinement (4/6) — theme_minimal() applied, grid lines subtly set (alpha=0.20 major, 0.08 minor), legend removed. However, panel_border is not explicitly blanked — a full rectangular frame is visible in both renders. Style guide calls for an L-shaped frame.
DE-03: Data Storytelling (2/6) — Categories ordered alphabetically (Drug A, B, C, Placebo) rather than by effect size. The story — that drugs reduce response time vs. placebo — is present in the data but not surfaced visually. No mean reference lines or ordering to guide the narrative.

Spec Compliance (15/15)

SC-01: Plot Type (5/5) — Correct strip plot with individual points and horizontal jitter.
SC-02: Required Features (4/4) — position_jitter(width=0.25), alpha=0.65, categories on x-axis, continuous values on y-axis.
SC-03: Data Mapping (3/3) — Correct axes, all data visible.
SC-04: Title & Legend (3/3) — Title "strip-basic · plotnine · anyplot.ai" correct. No legend (appropriate).

Data Quality (15/15)

DQ-01: Feature Coverage (6/6) — Individual points, jitter, varying distributions, different group sizes shown.
DQ-02: Realistic Context (5/5) — Clinical drug trial (response times in seconds). Real-world, neutral.
DQ-03: Appropriate Scale (4/4) — Placebo mean=45 s, Drug C mean=25 s. Realistic drug efficacy differences.

Code Quality (10/10)

CQ-01: KISS Structure (3/3) — Linear: imports → tokens → data → plot → save. No functions/classes.
CQ-02: Reproducibility (2/2) — np.random.seed(42) and position_jitter(random_state=42) both set.
CQ-03: Clean Imports (2/2) — All imports used; no extraneous packages.
CQ-04: Code Elegance (2/2) — Clean, Pythonic, appropriate complexity, no fake UI.
CQ-05: Output & API (1/1) — Saves as plot-{THEME}.png. No deprecated API usage.

Library Mastery (8/10)

LM-01: Idiomatic Usage (5/5) — Fully idiomatic grammar of graphics: ggplot + geom_point + scale_color_manual + labs + theme layered with +. position_jitter() used correctly as a position object.
LM-02: Distinctive Features (3/5) — position_jitter(random_state=42) as a position modifier is distinctly plotnine/ggplot2. Overall approach is solid but straightforward — could be closely mirrored in letsplot or other grammar-of-graphics libraries without significant adaptation.

Score Caps Applied

None — all category-specific caps passed (DE-01=4>2, DE-02=4>2; no boring-cap trigger).

Strengths

Perfect visual quality: all font sizes explicitly set, no overlap, correct palette, both themes pass readability checks
Correct Okabe-Ito palette with first series (#009E73) for Drug A; chrome correctly adaptive across light and dark
Excellent data quality: realistic clinical trial context, plausible values, 38–45 points per group
Perfect code quality: KISS structure, np.random.seed(42) + position_jitter(random_state=42), clean imports
Smart legend suppression — x-axis labels make the legend redundant
Fully idiomatic plotnine grammar-of-graphics approach

Weaknesses

DE-03 LOW: Categories ordered alphabetically — reorder by mean response time descending (Placebo first as highest, Drug C last as lowest) so the viewer immediately reads the drug efficacy gradient from left to right
DE-01 MODERATE: Lacks an emphasis layer; adding per-group mean reference lines (geom_crossbar or a second geom_point layer for means) would push this from "well-configured default" to design-excellent
DE-02 MODERATE: panel_border not explicitly blanked — add panel_border=element_blank() to the theme call to remove the full rectangular frame and achieve the L-shaped (or borderless) spine style the guide recommends; also consider removing the vertical major grid lines since x-axis is categorical

Issues Found

DE-03 LOW (2/6): Alphabetical category ordering hides the drug-efficacy story
- Fix: Sort categories by mean response time descending so the viewer reads improvement from left to right (e.g., use pd.Categorical with explicit categories=["Placebo","Drug A","Drug B","Drug C"] order)
DE-01 MODERATE (4/8): No visual emphasis layer to guide interpretation
- Fix: Add a per-group mean marker (larger dot or crossbar) as a second geom layer to anchor the viewer's eye on the central tendency differences
DE-02 MODERATE (4/6): Full rectangular panel border in both renders
- Fix: Add panel_border=element_blank() to theme() call; optionally panel_grid_major_x=element_blank() to remove vertical grid lines on a categorical axis

AI Feedback for Next Attempt

Three design improvements are needed to reach 90+: (1) Reorder categories by mean response time — use pd.Categorical with explicit order ["Placebo","Drug A","Drug B","Drug C"] so the efficacy gradient reads left-to-right naturally; (2) Add per-group mean markers — a second geom_point layer with group means at size=8–10 and a contrasting outline, or geom_crossbar, to give viewers an anchor and clear visual hierarchy; (3) Remove panel border — add panel_border=element_blank() and panel_grid_major_x=element_blank() to achieve the L-shaped/clean spine style. These three changes would lift DE-03 from 2→5, DE-01 from 4→6, and DE-02 from 4→6, pushing the total well above 90.

Verdict: REJECTED

…p-basic

github-actions · 2026-05-04T21:06:43Z

🔧 Repair Attempt 1/4

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...

🤖 impl-repair

claude · 2026-05-04T21:11:54Z

AI Review - Attempt 2/3

Image Description

Light render (plot-light.png): The plot renders on a warm off-white (#FAF8F1) background. Four treatment groups (Placebo, Drug A, Drug B, Drug C) are displayed as jittered scatter columns. The Placebo group uses the brand green (#009E73), Drug A uses orange (#D55E00), Drug B uses blue (#0072B2), and Drug C uses reddish-purple (#CC79A7) — correct Okabe-Ito order. A thin horizontal mean reference line crosses each group at approximately the group mean, adding a helpful comparison anchor. The title "strip-basic · plotnine · anyplot.ai" appears at the top in dark text; axis labels "Treatment Group" (x) and "Response Time (seconds)" (y) are clearly readable. Tick labels on both axes are legible. The grid is subtle. No legend is shown (appropriate since x-axis already labels the groups). All text is readable against the light background — legibility PASS.

Dark render (plot-dark.png): The same plot renders on a near-black (#1A1A17) background. All four group colors are identical to the light render (green, orange, blue, pink-purple), confirming that only chrome flips. The title, axis labels, and tick labels are rendered in light text and are clearly readable against the dark background. The mean reference lines appear as light-colored horizontal marks. No dark-on-dark text failures observed. Grid lines are subtle and visible. Legibility PASS.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 85/100

Category	Score	Max
Visual Quality	28	30
Design Excellence	11	20
Spec Compliance	15	15
Data Quality	15	15
Code Quality	10	10
Library Mastery	6	10
Total	85	100

Visual Quality (28/30)

VQ-01: Text Legibility (7/8) — All sizes explicitly set (title=24, axis_title=20, axis_text=16). Both themes render text clearly. Minor: title not explicitly set bold/weight.
VQ-02: No Overlap (5/6) — Minor intra-group overlap expected for strip plots; jitter width=0.25 handles it well.
VQ-03: Element Visibility (6/6) — size=4, alpha=0.65 is well-adapted for 38–45 points per group.
VQ-04: Color Accessibility (2/2) — Okabe-Ito palette, CVD-safe, no red-green-only signal.
VQ-05: Layout & Canvas (4/4) — 16:9 format, balanced margins, nothing cut off.
VQ-06: Axis Labels & Title (2/2) — Y-axis has units "(seconds)", descriptive x-axis label.
VQ-07: Palette Compliance (2/2) — Placebo (first) = #009E73; multi-series follows Okabe-Ito order. Light bg = #FAF8F1, dark bg = #1A1A17. Chrome tokens (INK, INK_SOFT) properly thread through all text elements in both themes.

Design Excellence (11/20)

DE-01: Aesthetic Sophistication (5/8) — Above the default: custom Okabe-Ito via scale_color_manual, legend intentionally hidden (correct for this plot type), mean reference lines, alpha tuning. Not quite publication-ready — no typographic weight variation, no annotation, standard layout.
DE-02: Visual Refinement (3/6) — theme_minimal() removes top/right spines; grid uses alpha=0.20 (major) and alpha=0.08 (minor), which is nicely subtle. Missing: explicit axis_line or panel_border token — relies on theme_minimal defaults.
DE-03: Data Storytelling (3/6) — Descending means from Placebo → Drug C tell a clear story of treatment effectiveness. Mean lines help anchor the comparison. Color coding reinforces group identity. However, no focal point or additional emphasis element to guide the viewer immediately.

Spec Compliance (15/15)

SC-01: Plot Type (5/5) — Correct strip plot with horizontal jitter.
SC-02: Required Features (4/4) — Jitter width=0.25 (within spec 0.1–0.3); transparency alpha=0.65 (within spec 0.5–0.7); mean reference lines present per spec recommendation.
SC-03: Data Mapping (3/3) — Treatment groups on x, response time on y; all data within view.
SC-04: Title & Legend (3/3) — Title is "strip-basic · plotnine · anyplot.ai" exactly; no legend needed (categories are the x-axis labels).

Data Quality (15/15)

DQ-01: Feature Coverage (6/6) — Shows individual points, jitter spread, four groups with clearly different distributions and means.
DQ-02: Realistic Context (5/5) — Drug treatment response-time study is realistic, scientifically neutral, and comprehensible.
DQ-03: Appropriate Scale (4/4) — Response times 5–80 seconds with np.clip; 38–45 observations per group; different group means (45, 32, 28, 25) are plausible for a clinical context.

Code Quality (10/10)

CQ-01: KISS Structure (3/3) — Linear: theme tokens → data generation → plot composition → save. No functions or classes.
CQ-02: Reproducibility (2/2) — np.random.seed(42) for data; position_jitter(random_state=42) for jitter.
CQ-03: Clean Imports (2/2) — All imported names from plotnine are used in the plot composition.
CQ-04: Code Elegance (2/2) — Clean ggplot grammar composition; no over-engineering; no fake UI.
CQ-05: Output & API (1/1) — Saves as plot-{THEME}.png with dpi=300.

Library Mastery (6/10)

LM-01: Idiomatic Usage (4/5) — Correct grammar-of-graphics composition; position_jitter via the + operator; scale_color_manual for categorical color; theme() for all chrome customization. Solid plotnine idiom.
LM-02: Distinctive Features (2/5) — position_jitter with random_state is plotnine-specific; Okabe-Ito via scale_color_manual is idiomatic. Beyond that, no particularly distinctive plotnine features (e.g., stat_summary, facet_wrap, or coord_flip) are leveraged.

Score Caps Applied

None. DE-01=5 > 2, DE-02=3 > 2 — cap condition not met.

Strengths

Perfect palette compliance: Okabe-Ito colors in canonical order, correct backgrounds, and INK/INK_SOFT tokens properly applied to all chrome elements in both themes
Reproducible output via double seeding (np.random.seed + position_jitter random_state)
Legend intentionally hidden — correct design decision for a strip plot where the x-axis already identifies groups
Mean reference lines add a useful comparison anchor beyond bare point clouds
Data tells a coherent clinical story (descending treatment effectiveness) with realistic proportions

Weaknesses

DE-02: No explicit axis_line or panel_border color token — relies on theme_minimal() defaults; for full dark-theme fidelity, axis lines should use INK_SOFT
DE-01/DE-03: No visual hierarchy beyond color — no size variation, no emphasis element to direct the viewer's eye to the key insight (Drug C has the fastest response)
LM-02: Only standard geom_point + position_jitter used; no plotnine-distinctive layer such as stat_summary for the mean line or facet_wrap for an alternative layout

Issues Found

DE-02 LOW: Grid alpha at 0.20 is acceptable but style guide suggests ~0.10 for major grid and 0.05 for minor; axis lines from theme_minimal are not overridden with INK_SOFT token.
- Fix: Add axis_line=element_line(color=INK_SOFT, size=0.5) and lower grid alpha to 0.10/0.05.
LM-02 LOW: Mean reference lines visible in rendered output but the corresponding plotnine layer (e.g., stat_summary with geom_crossbar/geom_errorbar) is not explicit in the current code.
- Fix: Explicitly add + stat_summary(fun_y='mean', geom='crossbar', width=0.4, color=INK, size=0.3, fill='none') so the feature is intentional and idiomatic.

AI Feedback for Next Attempt

To push from 85 to 90+: (1) Make mean reference lines explicit with stat_summary — this is a distinctive plotnine feature and would raise LM-02; (2) add axis_line=element_line(color=INK_SOFT) to properly thread the INK_SOFT token through axis lines in both themes; (3) lower major grid alpha to ~0.10 per style guide. Optionally, add a subtle size or alpha gradient to emphasize the outlier points in the Placebo group for better DE-03 storytelling.

Verdict: APPROVED

…p-basic

chore(plotnine): add metadata for strip-basic

593671b

github-actions Bot mentioned this pull request May 4, 2026

[strip-basic] Basic Strip Plot (Jittered Points) #975

Closed

github-actions Bot added quality:88 Quality score: 88/100 ai-rejected Quality not OK, triggers update labels May 4, 2026

chore(plotnine): update quality score 88 and review feedback for stri…

6e13ddc

…p-basic

github-actions Bot added ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels May 4, 2026

github-actions Bot added quality:85 Quality score 85/100 ai-approved Quality OK, ready for merge labels May 4, 2026

github-actions Bot and others added 2 commits May 4, 2026 21:13

chore(plotnine): update quality score 85 and review feedback for stri…

2ddd229

…p-basic

Merge branch 'main' into implementation/strip-basic/plotnine

8cd2518

MarkusNeusinger merged commit 5335c66 into main May 4, 2026
3 checks passed

MarkusNeusinger deleted the implementation/strip-basic/plotnine branch May 4, 2026 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plotnine): implement strip-basic#5674

feat(plotnine): implement strip-basic#5674
MarkusNeusinger merged 4 commits intomainfrom
implementation/strip-basic/plotnine

github-actions Bot commented May 4, 2026

Uh oh!

claude Bot commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

claude Bot commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

github-actions Bot commented May 4, 2026

Implementation: strip-basic - python/plotnine

Uh oh!

claude Bot commented May 4, 2026

AI Review - Attempt 1/3

Image Description

Score: 88/100

Visual Quality (30/30)

Design Excellence (10/20)

Spec Compliance (15/15)

Data Quality (15/15)

Code Quality (10/10)

Library Mastery (8/10)

Score Caps Applied

Strengths

Weaknesses

Issues Found

AI Feedback for Next Attempt

Verdict: REJECTED

Uh oh!

github-actions Bot commented May 4, 2026

🔧 Repair Attempt 1/4

Uh oh!

claude Bot commented May 4, 2026

AI Review - Attempt 2/3

Image Description

Score: 85/100

Visual Quality (28/30)

Design Excellence (11/20)

Spec Compliance (15/15)

Data Quality (15/15)

Code Quality (10/10)

Library Mastery (6/10)

Score Caps Applied

Strengths

Weaknesses

Issues Found

AI Feedback for Next Attempt

Verdict: APPROVED

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Implementation: `strip-basic` - python/plotnine