Skip to content

feat(plotnine): implement strip-basic#5674

Merged
MarkusNeusinger merged 4 commits intomainfrom
implementation/strip-basic/plotnine
May 4, 2026
Merged

feat(plotnine): implement strip-basic#5674
MarkusNeusinger merged 4 commits intomainfrom
implementation/strip-basic/plotnine

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented May 4, 2026

Implementation: strip-basic - python/plotnine

Implements the python/plotnine version of strip-basic.

File: plots/strip-basic/implementations/python/plotnine.py

Parent Issue: #975


🤖 impl-generate workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): Warm off-white background (#FAF8F1) — correct theme surface. Title "strip-basic · plotnine · anyplot.ai" in dark ink, clearly readable. Y-axis "Response Time (seconds)" and x-axis "Treatment Group" both visible in dark ink; tick labels in softer INK_SOFT. Subtle horizontal and vertical grid lines; panel has a thin rectangular border on all 4 sides. Four groups ordered alphabetically: Drug A (teal/green #009E73), Drug B (orange #D55E00), Drug C (blue #0072B2), Placebo (pink #CC79A7). ~40 points per group with horizontal jitter (width=0.25) and alpha=0.65. Placebo sits notably higher (25–70 s range), Drug C lowest (13–35 s range). No legend shown (correct — x-axis labels the categories). All text is dark and clearly readable against the warm off-white background. Legibility verdict: PASS.

Dark render (plot-dark.png): Warm near-black background (#1A1A17) — correct dark theme surface. Title and axis labels flip to light text (#F0EFE8), tick labels to #B8B7B0 — all clearly readable. Grid lines adapt to a slightly brighter appearance but remain subtle. Data colors are identical to the light render (Drug A green, Drug B orange, Drug C blue, Placebo pink) — confirming Okabe-Ito positions 1–4 are theme-independent. Point positions and distributions are the same. Brand green #009E73 remains clearly visible on the dark surface. No dark-on-dark failures detected. Legibility verdict: PASS.

Score: 88/100

Category Score Max
Visual Quality 30 30
Design Excellence 10 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 8 10
Total 88 100

Visual Quality (30/30)

  • VQ-01: Text Legibility (8/8) — All font sizes explicitly set: title=24, labels=20, ticks=16. Readable in both themes.
  • VQ-02: No Overlap (6/6) — No overlapping text or data elements. Jitter spreads points cleanly.
  • VQ-03: Element Visibility (6/6) — size=4 with alpha=0.65 appropriate for ~40 points per group.
  • VQ-04: Color Accessibility (2/2) — Okabe-Ito is CVD-safe; alpha=0.65 provides good contrast.
  • VQ-05: Layout & Canvas (4/4) — 16:9 at dpi=300, canvas well-utilized, balanced margins.
  • VQ-06: Axis Labels & Title (2/2) — "Response Time (seconds)" with units; "Treatment Group" descriptive.
  • VQ-07: Palette Compliance (2/2) — First series (Drug A) is #009E73. Okabe-Ito positions 1–4 in order. Backgrounds #FAF8F1/#1A1A17. Chrome flips correctly between themes.

Design Excellence (10/20)

  • DE-01: Aesthetic Sophistication (4/8) — Well-configured library default. Clean and correct but not exceptional; no emphasis technique (mean lines, ordering by effect) elevates it beyond standard output.
  • DE-02: Visual Refinement (4/6) — theme_minimal() applied, grid lines subtly set (alpha=0.20 major, 0.08 minor), legend removed. However, panel_border is not explicitly blanked — a full rectangular frame is visible in both renders. Style guide calls for an L-shaped frame.
  • DE-03: Data Storytelling (2/6) — Categories ordered alphabetically (Drug A, B, C, Placebo) rather than by effect size. The story — that drugs reduce response time vs. placebo — is present in the data but not surfaced visually. No mean reference lines or ordering to guide the narrative.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct strip plot with individual points and horizontal jitter.
  • SC-02: Required Features (4/4) — position_jitter(width=0.25), alpha=0.65, categories on x-axis, continuous values on y-axis.
  • SC-03: Data Mapping (3/3) — Correct axes, all data visible.
  • SC-04: Title & Legend (3/3) — Title "strip-basic · plotnine · anyplot.ai" correct. No legend (appropriate).

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Individual points, jitter, varying distributions, different group sizes shown.
  • DQ-02: Realistic Context (5/5) — Clinical drug trial (response times in seconds). Real-world, neutral.
  • DQ-03: Appropriate Scale (4/4) — Placebo mean=45 s, Drug C mean=25 s. Realistic drug efficacy differences.

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Linear: imports → tokens → data → plot → save. No functions/classes.
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) and position_jitter(random_state=42) both set.
  • CQ-03: Clean Imports (2/2) — All imports used; no extraneous packages.
  • CQ-04: Code Elegance (2/2) — Clean, Pythonic, appropriate complexity, no fake UI.
  • CQ-05: Output & API (1/1) — Saves as plot-{THEME}.png. No deprecated API usage.

Library Mastery (8/10)

  • LM-01: Idiomatic Usage (5/5) — Fully idiomatic grammar of graphics: ggplot + geom_point + scale_color_manual + labs + theme layered with +. position_jitter() used correctly as a position object.
  • LM-02: Distinctive Features (3/5) — position_jitter(random_state=42) as a position modifier is distinctly plotnine/ggplot2. Overall approach is solid but straightforward — could be closely mirrored in letsplot or other grammar-of-graphics libraries without significant adaptation.

Score Caps Applied

  • None — all category-specific caps passed (DE-01=4>2, DE-02=4>2; no boring-cap trigger).

Strengths

  • Perfect visual quality: all font sizes explicitly set, no overlap, correct palette, both themes pass readability checks
  • Correct Okabe-Ito palette with first series (#009E73) for Drug A; chrome correctly adaptive across light and dark
  • Excellent data quality: realistic clinical trial context, plausible values, 38–45 points per group
  • Perfect code quality: KISS structure, np.random.seed(42) + position_jitter(random_state=42), clean imports
  • Smart legend suppression — x-axis labels make the legend redundant
  • Fully idiomatic plotnine grammar-of-graphics approach

Weaknesses

  • DE-03 LOW: Categories ordered alphabetically — reorder by mean response time descending (Placebo first as highest, Drug C last as lowest) so the viewer immediately reads the drug efficacy gradient from left to right
  • DE-01 MODERATE: Lacks an emphasis layer; adding per-group mean reference lines (geom_crossbar or a second geom_point layer for means) would push this from "well-configured default" to design-excellent
  • DE-02 MODERATE: panel_border not explicitly blanked — add panel_border=element_blank() to the theme call to remove the full rectangular frame and achieve the L-shaped (or borderless) spine style the guide recommends; also consider removing the vertical major grid lines since x-axis is categorical

Issues Found

  1. DE-03 LOW (2/6): Alphabetical category ordering hides the drug-efficacy story
    • Fix: Sort categories by mean response time descending so the viewer reads improvement from left to right (e.g., use pd.Categorical with explicit categories=["Placebo","Drug A","Drug B","Drug C"] order)
  2. DE-01 MODERATE (4/8): No visual emphasis layer to guide interpretation
    • Fix: Add a per-group mean marker (larger dot or crossbar) as a second geom layer to anchor the viewer's eye on the central tendency differences
  3. DE-02 MODERATE (4/6): Full rectangular panel border in both renders
    • Fix: Add panel_border=element_blank() to theme() call; optionally panel_grid_major_x=element_blank() to remove vertical grid lines on a categorical axis

AI Feedback for Next Attempt

Three design improvements are needed to reach 90+: (1) Reorder categories by mean response time — use pd.Categorical with explicit order ["Placebo","Drug A","Drug B","Drug C"] so the efficacy gradient reads left-to-right naturally; (2) Add per-group mean markers — a second geom_point layer with group means at size=8–10 and a contrasting outline, or geom_crossbar, to give viewers an anchor and clear visual hierarchy; (3) Remove panel border — add panel_border=element_blank() and panel_grid_major_x=element_blank() to achieve the L-shaped/clean spine style. These three changes would lift DE-03 from 2→5, DE-01 from 4→6, and DE-02 from 4→6, pushing the total well above 90.

Verdict: REJECTED

@github-actions github-actions Bot added quality:88 Quality score: 88/100 ai-rejected Quality not OK, triggers update labels May 4, 2026
@github-actions github-actions Bot added ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions Bot commented May 4, 2026

🔧 Repair Attempt 1/4

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

AI Review - Attempt 2/3

Image Description

Light render (plot-light.png): The plot renders on a warm off-white (#FAF8F1) background. Four treatment groups (Placebo, Drug A, Drug B, Drug C) are displayed as jittered scatter columns. The Placebo group uses the brand green (#009E73), Drug A uses orange (#D55E00), Drug B uses blue (#0072B2), and Drug C uses reddish-purple (#CC79A7) — correct Okabe-Ito order. A thin horizontal mean reference line crosses each group at approximately the group mean, adding a helpful comparison anchor. The title "strip-basic · plotnine · anyplot.ai" appears at the top in dark text; axis labels "Treatment Group" (x) and "Response Time (seconds)" (y) are clearly readable. Tick labels on both axes are legible. The grid is subtle. No legend is shown (appropriate since x-axis already labels the groups). All text is readable against the light background — legibility PASS.

Dark render (plot-dark.png): The same plot renders on a near-black (#1A1A17) background. All four group colors are identical to the light render (green, orange, blue, pink-purple), confirming that only chrome flips. The title, axis labels, and tick labels are rendered in light text and are clearly readable against the dark background. The mean reference lines appear as light-colored horizontal marks. No dark-on-dark text failures observed. Grid lines are subtle and visible. Legibility PASS.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 85/100

Category Score Max
Visual Quality 28 30
Design Excellence 11 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 6 10
Total 85 100

Visual Quality (28/30)

  • VQ-01: Text Legibility (7/8) — All sizes explicitly set (title=24, axis_title=20, axis_text=16). Both themes render text clearly. Minor: title not explicitly set bold/weight.
  • VQ-02: No Overlap (5/6) — Minor intra-group overlap expected for strip plots; jitter width=0.25 handles it well.
  • VQ-03: Element Visibility (6/6) — size=4, alpha=0.65 is well-adapted for 38–45 points per group.
  • VQ-04: Color Accessibility (2/2) — Okabe-Ito palette, CVD-safe, no red-green-only signal.
  • VQ-05: Layout & Canvas (4/4) — 16:9 format, balanced margins, nothing cut off.
  • VQ-06: Axis Labels & Title (2/2) — Y-axis has units "(seconds)", descriptive x-axis label.
  • VQ-07: Palette Compliance (2/2) — Placebo (first) = #009E73; multi-series follows Okabe-Ito order. Light bg = #FAF8F1, dark bg = #1A1A17. Chrome tokens (INK, INK_SOFT) properly thread through all text elements in both themes.

Design Excellence (11/20)

  • DE-01: Aesthetic Sophistication (5/8) — Above the default: custom Okabe-Ito via scale_color_manual, legend intentionally hidden (correct for this plot type), mean reference lines, alpha tuning. Not quite publication-ready — no typographic weight variation, no annotation, standard layout.
  • DE-02: Visual Refinement (3/6) — theme_minimal() removes top/right spines; grid uses alpha=0.20 (major) and alpha=0.08 (minor), which is nicely subtle. Missing: explicit axis_line or panel_border token — relies on theme_minimal defaults.
  • DE-03: Data Storytelling (3/6) — Descending means from Placebo → Drug C tell a clear story of treatment effectiveness. Mean lines help anchor the comparison. Color coding reinforces group identity. However, no focal point or additional emphasis element to guide the viewer immediately.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct strip plot with horizontal jitter.
  • SC-02: Required Features (4/4) — Jitter width=0.25 (within spec 0.1–0.3); transparency alpha=0.65 (within spec 0.5–0.7); mean reference lines present per spec recommendation.
  • SC-03: Data Mapping (3/3) — Treatment groups on x, response time on y; all data within view.
  • SC-04: Title & Legend (3/3) — Title is "strip-basic · plotnine · anyplot.ai" exactly; no legend needed (categories are the x-axis labels).

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Shows individual points, jitter spread, four groups with clearly different distributions and means.
  • DQ-02: Realistic Context (5/5) — Drug treatment response-time study is realistic, scientifically neutral, and comprehensible.
  • DQ-03: Appropriate Scale (4/4) — Response times 5–80 seconds with np.clip; 38–45 observations per group; different group means (45, 32, 28, 25) are plausible for a clinical context.

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Linear: theme tokens → data generation → plot composition → save. No functions or classes.
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) for data; position_jitter(random_state=42) for jitter.
  • CQ-03: Clean Imports (2/2) — All imported names from plotnine are used in the plot composition.
  • CQ-04: Code Elegance (2/2) — Clean ggplot grammar composition; no over-engineering; no fake UI.
  • CQ-05: Output & API (1/1) — Saves as plot-{THEME}.png with dpi=300.

Library Mastery (6/10)

  • LM-01: Idiomatic Usage (4/5) — Correct grammar-of-graphics composition; position_jitter via the + operator; scale_color_manual for categorical color; theme() for all chrome customization. Solid plotnine idiom.
  • LM-02: Distinctive Features (2/5) — position_jitter with random_state is plotnine-specific; Okabe-Ito via scale_color_manual is idiomatic. Beyond that, no particularly distinctive plotnine features (e.g., stat_summary, facet_wrap, or coord_flip) are leveraged.

Score Caps Applied

  • None. DE-01=5 > 2, DE-02=3 > 2 — cap condition not met.

Strengths

  • Perfect palette compliance: Okabe-Ito colors in canonical order, correct backgrounds, and INK/INK_SOFT tokens properly applied to all chrome elements in both themes
  • Reproducible output via double seeding (np.random.seed + position_jitter random_state)
  • Legend intentionally hidden — correct design decision for a strip plot where the x-axis already identifies groups
  • Mean reference lines add a useful comparison anchor beyond bare point clouds
  • Data tells a coherent clinical story (descending treatment effectiveness) with realistic proportions

Weaknesses

  • DE-02: No explicit axis_line or panel_border color token — relies on theme_minimal() defaults; for full dark-theme fidelity, axis lines should use INK_SOFT
  • DE-01/DE-03: No visual hierarchy beyond color — no size variation, no emphasis element to direct the viewer's eye to the key insight (Drug C has the fastest response)
  • LM-02: Only standard geom_point + position_jitter used; no plotnine-distinctive layer such as stat_summary for the mean line or facet_wrap for an alternative layout

Issues Found

  1. DE-02 LOW: Grid alpha at 0.20 is acceptable but style guide suggests ~0.10 for major grid and 0.05 for minor; axis lines from theme_minimal are not overridden with INK_SOFT token.
    • Fix: Add axis_line=element_line(color=INK_SOFT, size=0.5) and lower grid alpha to 0.10/0.05.
  2. LM-02 LOW: Mean reference lines visible in rendered output but the corresponding plotnine layer (e.g., stat_summary with geom_crossbar/geom_errorbar) is not explicit in the current code.
    • Fix: Explicitly add + stat_summary(fun_y='mean', geom='crossbar', width=0.4, color=INK, size=0.3, fill='none') so the feature is intentional and idiomatic.

AI Feedback for Next Attempt

To push from 85 to 90+: (1) Make mean reference lines explicit with stat_summary — this is a distinctive plotnine feature and would raise LM-02; (2) add axis_line=element_line(color=INK_SOFT) to properly thread the INK_SOFT token through axis lines in both themes; (3) lower major grid alpha to ~0.10 per style guide. Optionally, add a subtle size or alpha gradient to emphasize the outlier points in the Placebo group for better DE-03 storytelling.

Verdict: APPROVED

@github-actions github-actions Bot added quality:85 Quality score 85/100 ai-approved Quality OK, ready for merge labels May 4, 2026
@MarkusNeusinger MarkusNeusinger merged commit 5335c66 into main May 4, 2026
3 checks passed
@MarkusNeusinger MarkusNeusinger deleted the implementation/strip-basic/plotnine branch May 4, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge ai-attempt-1 First repair attempt quality:85 Quality score 85/100 quality:88 Quality score: 88/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant