Skip to content

feat(bokeh): implement errorbar-basic#5386

Merged
github-actions[bot] merged 5 commits intomainfrom
implementation/errorbar-basic/bokeh
Apr 25, 2026
Merged

feat(bokeh): implement errorbar-basic#5386
github-actions[bot] merged 5 commits intomainfrom
implementation/errorbar-basic/bokeh

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Implementation: errorbar-basic - python/bokeh

Implements the python/bokeh version of errorbar-basic.

File: plots/errorbar-basic/implementations/python/bokeh.py

Parent Issue: #973


🤖 impl-generate workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 25, 2026

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): The plot shows six experimental groups (Control, Treatment A–E) on a warm off-white #FAF8F1 background. Data is rendered in brand green #009E73 throughout: circular markers (size 28) with a light-colored border, and Whisker error bars with TeeHead caps (line_width=5, TeeHead size=40). The asymmetric errors are clearly visible — Treatment C notably has a large lower error bar extending to ~29. The y-axis spans 0–56 with only a subtle y-axis grid (10% alpha), leaving roughly 40% of the vertical chart area empty below the data (which starts at ~23). All text is readable: title at top-left in dark ink, axis labels and tick labels in INK_SOFT. No overlap. Legibility verdict: PASS.

Dark render (plot-dark.png): The same plot on a warm near-black #1A1A17 background. The brand green #009E73 data color is identical to the light render — all six groups and error bars use the same green. Chrome flips correctly: title appears in off-white (#F0EFE8), axis labels and tick labels in #B8B7B0. The subtle y-grid remains visible at low opacity. The marker border now uses PAGE_BG = #1A1A17, appearing as a dark ring — this blends into the background slightly but the marker is still distinguishable. No dark-on-dark text issues observed. Legibility verdict: PASS.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 84/100

Category Score Max
Visual Quality 28 30
Design Excellence 10 20
Spec Compliance 15 15
Data Quality 14 15
Code Quality 10 10
Library Mastery 7 10
Total 84 100

Visual Quality (28/30)

  • VQ-01: Text Legibility (8/8) — All sizes explicitly set (36pt title, 32pt axis labels, 24pt tick labels); perfectly readable in both renders
  • VQ-02: No Overlap (6/6) — Six categories well-spaced, no text collisions
  • VQ-03: Element Visibility (6/6) — Markers (size=28) and Whisker bars (line_width=5, TeeHead size=40) are prominent and clear
  • VQ-04: Color Accessibility (2/2) — CVD-safe Okabe-Ito green throughout
  • VQ-05: Layout & Canvas (2/4) — y-axis starts at 0 but data begins at ~23; ~40% of vertical chart area is empty dead space. Right margin also generous. Internal space could be reclaimed by setting y_range.start closer to data minimum.
  • VQ-06: Axis Labels & Title (2/2) — "Experimental Group" and "Response Value (units)" — descriptive with units
  • VQ-07: Palette Compliance (2/2) — #009E73 as sole series color; #FAF8F1 / #1A1A17 backgrounds correct; full theme-adaptive chrome via INK/INK_SOFT tokens

Design Excellence (10/20)

  • DE-01: Aesthetic Sophistication (4/8) — Clean and correct but not exceptional; single brand-green palette with no color variation across groups; looks like a well-configured library default
  • DE-02: Visual Refinement (4/6) — X-grid removed, y-grid at 10% alpha, minor ticks removed, outline_line_color=None removes figure border; good refinements. Top/right spines absent (Bokeh default), left/bottom axes styled with INK_SOFT tokens.
  • DE-03: Data Storytelling (2/6) — Data is displayed clearly but no visual hierarchy or emphasis; asymmetric errors are present but no focal point guides the viewer (e.g., Treatment D as maximum or Treatment C as highest variability)

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct error bar plot with central markers and bidirectional error bars
  • SC-02: Required Features (4/4) — Error bars with TeeHead caps ✓, consistent error bar widths ✓, asymmetric errors demonstrated ✓, categorical x-axis ✓
  • SC-03: Data Mapping (3/3) — Categorical x, numeric y (means), asymmetric upper/lower computed correctly from source arrays
  • SC-04: Title & Legend (3/3) — "errorbar-basic · bokeh · anyplot.ai" ✓; no legend appropriate for single series

Data Quality (14/15)

  • DQ-01: Feature Coverage (5/6) — Shows error bars with caps, mean markers, asymmetric errors, 6-group variety, and realistic error magnitude variation. Missing: multi-series comparison (optional per spec, but would add coverage)
  • DQ-02: Realistic Context (5/5) — Experimental groups (Control, Treatment A–E) with Response Value — scientific research context, neutral and plausible
  • DQ-03: Appropriate Scale (4/4) — Values 25–48 with errors 2–7 — realistic and well-proportioned

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Flat script: imports → tokens → data → figure → style → save
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set; data is effectively hardcoded (deterministic)
  • CQ-03: Clean Imports (2/2) — All imports used; minimal and appropriate
  • CQ-04: Code Elegance (2/2) — Clean, Pythonic, appropriate complexity; no fake functionality
  • CQ-05: Output & API (1/1) — plot-{THEME}.png + plot-{THEME}.html both saved; current Bokeh API

Library Mastery (7/10)

  • LM-01: Idiomatic Usage (4/5) — ColumnDataSource used throughout, categorical x_range correctly defined, Whisker annotation is the idiomatic Bokeh pattern for error bars; toolbar_location=None for clean export
  • LM-02: Distinctive Features (3/5) — Whisker + TeeHead is a genuinely Bokeh-specific combination (not easily replicated in other libraries without custom drawing); uses Bokeh's annotation system appropriately

Score Caps Applied

  • None

Strengths

  • Full theme-adaptive chrome: every text, grid, and axis element uses INK/INK_SOFT tokens with correct light/dark values
  • Idiomatic use of Bokeh's Whisker + TeeHead annotation for capped error bars — the right tool for this task
  • Asymmetric error bars demonstrated well, showing distributional awareness
  • Perfectly clean code structure with all sizing explicit and above minimums

Weaknesses

  • Y-axis starts at 0 when data begins at ~23 — set y_range.start to something like float(min(lower) * 0.85) to eliminate the ~40% dead vertical space
  • No design differentiation across the six groups — consider varying marker shapes or using a subtle color gradient across groups using Okabe-Ito positions to add visual interest
  • No visual focal point or storytelling — Treatment D is the maximum and Treatment C has the highest variability (large lower error), but neither is highlighted

Issues Found

  1. VQ-05 LOW: Y-axis starts at 0 but minimum data value (with lower error) is ~23.2 — creates large empty space in bottom 40% of chart.
    • Fix: p.y_range.start = float(min(lower) * 0.85) to reclaim vertical space
  2. DE-01 MODERATE: Single-color design with no variation across groups. All six groups rendered identically in green.
    • Fix: Use Okabe-Ito multi-color (one color per group) to distinguish groups visually and add sophistication
  3. DE-03 LOW: No visual hierarchy. Treatment D (highest mean) and Treatment C (highest lower-error) are hidden in the crowd.
    • Fix: Assign distinct Okabe-Ito colors per group to naturally create visual hierarchy; or add a subtle annotation at the max/min point

AI Feedback for Next Attempt

Three improvements will push this into the 90+ range: (1) Fix the y-range — set y_range.start to ~85% of the minimum lower bound to eliminate dead vertical space. (2) Apply Okabe-Ito colors per group (OKABE_ITO[0] through OKABE_ITO[5]) — this turns a monochrome plot into a visually rich multi-color one and significantly raises DE-01. (3) Add a legend (required once colors differ per group) and optionally a brief annotation calling out the group with highest variability (Treatment C). Together these address VQ-05, DE-01, and DE-03.

Verdict: REJECTED

@github-actions github-actions Bot added quality:84 Quality score 84/100 ai-rejected Quality not OK, triggers update labels Apr 25, 2026
@github-actions github-actions Bot added ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels Apr 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 Repair Attempt 1/3

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 25, 2026

AI Review - Attempt 2/3

Image Description

Light render (plot-light.png): The plot renders on a warm off-white #FAF8F1 background. Six experimental groups (Control, Treatment A–E) are shown as scatter points with error bars, each in its own Okabe-Ito color: Control in brand green #009E73, Treatment A in vermillion #D55E00, Treatment B in blue #0072B2, Treatment C in reddish purple #CC79A7, Treatment D in orange #E69F00, Treatment E in sky blue #56B4E9. Error bars use Bokeh's Whisker model with TeeHead caps visible at both ends. Asymmetric errors are clearly shown — Treatment C's error bar extends ~6.5 units downward vs ~2.8 upward. A "highest variability" italic annotation appears near Treatment C's lower extent. Title "errorbar-basic · bokeh · anyplot.ai" is in dark ink at top-left. Y-axis has a subtle grid; no x-grid. All text is clearly readable against the light background. Legibility verdict: PASS.

Dark render (plot-dark.png): The same plot on near-black #1A1A17 background. All six data colors are identical to the light render — the Okabe-Ito palette is consistent across themes as required. Title, axis labels ("Experimental Group", "Response Value (units)"), and tick labels all render in light tones (#F0EFE8 / #B8B7B0), clearly readable against the dark surface. No dark-on-dark failures observed. Grid lines and axis lines use theme-appropriate muted values. Legibility verdict: PASS.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 88/100

Category Score Max
Visual Quality 28 30
Design Excellence 13 20
Spec Compliance 15 15
Data Quality 14 15
Code Quality 10 10
Library Mastery 8 10
Total 88 100

Visual Quality (28/30)

  • VQ-01: Text Legibility (8/8) — Title 36pt, axis labels 32pt, ticks 24pt, all explicitly set; readable in both themes
  • VQ-02: No Overlap (6/6) — Six well-spaced categories, no overlapping labels or elements
  • VQ-03: Element Visibility (5/6) — Markers visible; Whisker line_width=5 and TeeHead size=40 render slightly thin at full 4800×2700; caps visible but not prominent
  • VQ-04: Color Accessibility (2/2) — Full Okabe-Ito palette, CVD-safe, no red-green sole signal
  • VQ-05: Layout & Canvas (3/4) — y-range trimmed with 15% padding; minor excess vertical space at top; horizontal spread appropriate for 6 categories
  • VQ-06: Axis Labels & Title (2/2) — "Experimental Group" and "Response Value (units)" both descriptive with unit qualifier
  • VQ-07: Palette Compliance (2/2) — First series #009E73 ✓; Okabe-Ito canonical order ✓; light #FAF8F1 / dark #1A1A17 backgrounds ✓; chrome adapts correctly in both renders ✓

Design Excellence (13/20)

  • DE-01: Aesthetic Sophistication (5/8) — Per-group color-coded error bars and asymmetric errors are above generic defaults; intentional color hierarchy adds polish; falls short of publication-ready sophistication
  • DE-02: Visual Refinement (4/6) — Y-grid only at 10% alpha, outline removed, minor ticks hidden, axis line colors customized; clear refinement beyond defaults
  • DE-03: Data Storytelling (4/6) — "highest variability" annotation actively guides the viewer; asymmetric errors signal skewed uncertainty; color coding gives each group a visual identity

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Error bar plot with TeeHead caps on both ends ✓
  • SC-02: Required Features (4/4) — Categorical x-axis, central value markers, caps, asymmetric errors, 6 groups ✓
  • SC-03: Data Mapping (3/3) — Experimental groups on x, response values on y, all data visible ✓
  • SC-04: Title & Legend (3/3) — "errorbar-basic · bokeh · anyplot.ai" exact ✓; legend omitted correctly (x-labels identify groups) ✓

Data Quality (14/15)

  • DQ-01: Feature Coverage (6/6) — Varying means, different error magnitudes, asymmetric errors, annotation highlight ✓
  • DQ-02: Realistic Context (4/5) — Clinical/lab trial context with control and treatment arms is plausible; "Response Value (units)" unit label is slightly generic
  • DQ-03: Appropriate Scale (4/4) — Means 25–48 with errors 2–6.5 realistic for biological measurements ✓

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Imports → Data → Plot → Style → Save; no functions or classes ✓
  • CQ-02: Reproducibility (2/2) — np.random.seed(42)
  • CQ-03: Clean Imports (2/2) — All imported symbols used ✓
  • CQ-04: Code Elegance (2/2) — Pythonic zip(..., strict=True), clean per-group loop, no over-engineering ✓
  • CQ-05: Output & API (1/1) — Saves plot-{THEME}.png and plot-{THEME}.html

Library Mastery (8/10)

  • LM-01: Idiomatic Usage (5/5) — Whisker + TeeHead for error bars, ColumnDataSource per group, Label layout annotation, categorical x_range — all idiomatic Bokeh ✓
  • LM-02: Distinctive Features (3/5) — Whisker/TeeHead and Label are Bokeh-native; per-group ColumnDataSource for individual coloring is a Bokeh-specific pattern; interactive features (HoverTool, tooltips) that distinguish Bokeh from static libraries are absent

Score Caps Applied

  • None — no cap conditions triggered

Strengths

  • Full theme adaptation: warm off-white / near-black backgrounds, all chrome (text, grid, axes) flip correctly — both renders pass readability checks
  • Idiomatic Bokeh Whisker+TeeHead error bars with per-group ColumnDataSource enabling individual colors — a Bokeh-distinctive pattern not easily replicated
  • Asymmetric errors (Treatment C: -6.5/+2.8) showcase a meaningful feature of error bar plots
  • "highest variability" annotation adds light data storytelling without crowding
  • Perfect code quality: seed set, KISS structure, all imports used, strict zip

Weaknesses

  • Error bar line_width=5 and TeeHead size=40 render noticeably thin at full 4800×2700 — increase to line_width=8–10 and TeeHead size=60+ for visual prominence
  • Bokeh's interactive strength (HoverTool showing exact mean and CI values) is unused — adding a tooltip would use a feature unique to Bokeh that no static library can match
  • DE-01 lacks the extra polish step: consider adding a reference line (e.g., control mean as a horizontal dashed baseline) or subtle y-axis minor styling to push beyond "well-configured defaults"
  • DQ-02 unit label "Response Value (units)" — use a concrete domain unit (e.g., "Tumor Volume (mm³)" or "Yield (mg/L)") to boost realistic context score

Issues Found

  1. LM-02 PARTIAL: HoverTool absent — Bokeh's key differentiator vs. static libraries is interactive tooltips
    • Fix: Add HoverTool(tooltips=[("Group", "@categories"), ("Mean", "@means{0.1f}"), ("Upper CI", "@upper{0.1f}"), ("Lower CI", "@lower{0.1f}")]) — this alone raises LM-02 to 4-5
  2. VQ-03 THIN: Whisker lines and TeeHead caps slightly thin at full resolution
    • Fix: Increase line_width on Whisker from 5 → 9, TeeHead size from 40 → 70, scatter size from 28 → 32
  3. DQ-02 GENERIC UNIT: "Response Value (units)" is abstract
    • Fix: Replace with a concrete measurement unit appropriate to the clinical/lab context

AI Feedback for Next Attempt

Add a HoverTool with mean and CI tooltips — this is the single highest-leverage change (raises LM-02 and demonstrates what makes Bokeh unique). Increase Whisker line_width to 9 and TeeHead size to 70 for visual prominence at full canvas size. Replace "Response Value (units)" with a domain-specific unit label like "Cell Viability (%)" or "Enzyme Activity (U/mL)". Consider adding a horizontal reference line at the Control mean as a subtle visual anchor — this elevates DE-01 into publication territory without adding clutter.

Verdict: REJECTED

@github-actions github-actions Bot added quality:88 Quality score: 88/100 ai-approved Quality OK, ready for merge labels Apr 25, 2026
@github-actions github-actions Bot merged commit a87a80a into main Apr 25, 2026
3 checks passed
@github-actions github-actions Bot deleted the implementation/errorbar-basic/bokeh branch April 25, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge ai-attempt-1 First repair attempt quality:84 Quality score 84/100 quality:88 Quality score: 88/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants