Skip to content

feat(highcharts): implement shap-waterfall#5902

Merged
MarkusNeusinger merged 3 commits intomainfrom
implementation/shap-waterfall/highcharts
May 7, 2026
Merged

feat(highcharts): implement shap-waterfall#5902
MarkusNeusinger merged 3 commits intomainfrom
implementation/shap-waterfall/highcharts

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented May 7, 2026

Implementation: shap-waterfall - python/highcharts

Implements the python/highcharts version of shap-waterfall.

File: plots/shap-waterfall/implementations/python/highcharts.py

Parent Issue: #5237


🤖 impl-generate workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

AI Review - Attempt 1/3

Image Description

Light render (plot-light.png): The chart renders on a warm off-white #FAF8F1 background. A horizontal waterfall chart shows credit-default risk attribution: a long green (#009E73) baseline bar spans from 0 to 0.35, followed by 10 feature SHAP bars (orange #D55E00 for positive contributions, blue #0072B2 for negative), and a final green prediction bar at 0.20. Bars are ordered from largest-to-smallest absolute SHAP magnitude (Credit Score –0.180 at top, Savings Balance –0.020 at bottom). Data labels beside each bar show signed SHAP values. Two dashed reference lines mark the baseline (0.35) and prediction (0.20). Title "Credit Default Risk · shap-waterfall · highcharts · anyplot.ai" is dark and readable. Feature labels on the left y-axis, tick labels on the x-axis, and the y-axis title "Probability of Default" are all clearly readable. All text is legible against the light background. Legibility verdict: PASS — however the x-axis has 52 tick marks at 0.01 intervals (–0.02 to 0.50) which are dense, and the right ~30% of the chart is mostly empty since data peaks near 0.35 while the axis extends to 0.50.

Dark render (plot-dark.png): Identical layout on a warm near-black #1A1A17 background. The baseline and prediction bars remain #009E73 (unchanged from light — correct). Positive bars stay #D55E00 and negative bars #0072B2 — data colors are identical across themes. Title text renders in light cream #F0EFE8; feature labels and tick labels appear in light gray #B8B7B0; data labels beside bars are light-colored. No dark-on-dark failures detected. Grid lines are subtle (10% opacity white rule). Legibility verdict: PASS — all chrome (text, grid, background) flips correctly to dark-theme tokens; data colors are unchanged.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 86/100

Category Score Max
Visual Quality 27 30
Design Excellence 13 20
Spec Compliance 14 15
Data Quality 15 15
Code Quality 9 10
Library Mastery 8 10
Total 86 100

Visual Quality (27/30)

  • VQ-01: Text Legibility (7/8) — all sizes explicitly set (title 28px, axis labels 22px, tick labels 18–20px, data labels 18px); reference line annotation labels are 16px (below the 18px minimum — raise to 18px)
  • VQ-02: No Overlap (5/6) — 52 x-axis ticks at 0.01 intervals are very dense; no true overlap at 4800px but crowded; widen tick interval to 0.05
  • VQ-03: Element Visibility (6/6) — all bars clearly visible, colors well-contrasted in both themes
  • VQ-04: Color Accessibility (2/2) — Okabe-Ito palette is CVD-safe; orange/blue distinction is clear without relying on hue alone
  • VQ-05: Layout & Canvas (3/4) — plot fills ~65% of canvas; right side has excessive empty space (axis extends to 0.50, data peaks near 0.35); reduce max to ~0.42
  • VQ-06: Axis Labels & Title (2/2) — "Probability of Default" and "Feature" are descriptive
  • VQ-07: Palette Compliance (2/2) — #009E73 baseline/prediction, #D55E00 positive SHAP, #0072B2 negative SHAP (Okabe-Ito order); #FAF8F1 / #1A1A17 backgrounds; all chrome theme-correct

Design Excellence (13/20)

  • DE-01: Aesthetic Sophistication (5/8) — semantic color mapping (green = anchor, orange = risk up, blue = risk down) is intentional and professional; slightly above well-configured default but not publication-ready
  • DE-02: Visual Refinement (4/6) — borderWidth: 0 on bars, subtle 10% opacity grid, generous explicit margins (340px left, 220px right), legend disabled; good refinement
  • DE-03: Data Storytelling (4/6) — features sorted by absolute SHAP magnitude creates clear visual hierarchy; semantic color coding tells the positive/negative story; reference lines anchor baseline vs prediction; lacks a subtitle or annotation explaining the credit-default context to a non-expert viewer

Spec Compliance (14/15)

  • SC-01: Plot Type (5/5) — native Highcharts waterfall series with inverted: true for horizontal orientation; connector lines present via lineWidth: 2
  • SC-02: Required Features (4/4) — features ordered by |SHAP|, cumulative waterfall stacking, signed color encoding, baseline and prediction reference lines with labels, numeric SHAP data labels, horizontal layout with features on y-axis
  • SC-03: Data Mapping (3/3) — x-axis is probability space, y-axis is features, data flows correctly from E[f(x)]=0.35 to f(x)=0.20
  • SC-04: Title & Legend (2/3) — title is Credit Default Risk · shap-waterfall · highcharts · anyplot.ai; spec requires shap-waterfall · highcharts · anyplot.ai (extra descriptive prefix deviates from format); legend correctly disabled for single series

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — shows both positive and negative SHAP values, wide range of magnitudes (±0.18 to ±0.02), baseline and prediction bars, covers all waterfall aspects
  • DQ-02: Realistic Context (5/5) — credit scoring loan application is a canonical SHAP use case; features (Credit Score, Debt-to-Income, Annual Income, Loan Amount, Employment Years, Payment History…) are authentic credit model features; neutral business domain
  • DQ-03: Appropriate Scale (4/4) — BASE_VALUE=0.35 (35% default probability), FINAL_VALUE=0.20; SHAP values sum correctly (0.35 + (–0.15) = 0.20 ✓); magnitudes realistic for a credit model

Code Quality (9/10)

  • CQ-01: KISS Structure (2/3) — download_js() helper function defined; KISS requires flat Imports → Data → Plot → Save with no functions/classes
  • CQ-02: Reproducibility (2/2) — all data is hardcoded; fully deterministic
  • CQ-03: Clean Imports (2/2) — all imports are used
  • CQ-04: Code Elegance (2/2) — JSON + placeholder string-replace for JS functions is a pragmatic solution; CDP screenshot is the correct approach for full-resolution 4800×2700 capture; CDN fallback logic is well-structured
  • CQ-05: Output & API (1/1) — saves plot-{THEME}.png and plot-{THEME}.html; current Highcharts 11 API

Library Mastery (8/10)

  • LM-01: Idiomatic Usage (4/5) — native waterfall series type, isSum: true for summary bar, inverted: true, plotLines with labels, dataLabels with Highcharts.numberFormat — idiomatic and correct
  • LM-02: Distinctive Features (4/5) — isSum: true (Highcharts-specific waterfall feature for running totals), highcharts-more.js waterfall module, Page.captureScreenshot CDP command for exact-dimension PNG capture; these features are specific to Highcharts + Selenium combination

Score Caps Applied

  • None — no score caps triggered

Strengths

  • Native Highcharts waterfall series with isSum: true for the prediction bar — idiomatic and correct
  • Semantic Okabe-Ito color assignment (green = anchor bars, orange = positive SHAP, blue = negative SHAP) clearly communicates the plot's meaning
  • Fully theme-adaptive chrome — all text, grid, and background tokens flip correctly between light and dark without any dark-on-dark failures
  • Realistic credit-scoring domain with internally consistent SHAP values that sum to the correct prediction
  • CDP screenshot technique delivers precise 4800×2700 PNG without window-size tricks

Weaknesses

  • Title format must be shap-waterfall · highcharts · anyplot.ai — remove the "Credit Default Risk ·" prefix that precedes the spec-id
  • download_js() helper function violates KISS; inline the CDN download logic directly
  • X-axis has 52 tick marks at 0.01 intervals (–0.02 to 0.50) — replace with 0.05 tick intervals via tickInterval: 0.05 to reduce density
  • Y-axis max: 0.50 wastes ~30% of canvas width; reduce to ~0.42 to tighten the plot area
  • Reference line annotation font size is 16px; raise to 18px to meet minimum legibility standard
  • DE: no subtitle or contextual framing for non-expert viewers; consider adding a subtitle like "Single loan applicant — features sorted by |SHAP value|"

Issues Found

  1. SC-04 TITLE FORMAT: Credit Default Risk · shap-waterfall · highcharts · anyplot.ai should be shap-waterfall · highcharts · anyplot.ai
    • Fix: Remove the Credit Default Risk · prefix from the title text
  2. CQ-01 FUNCTION: download_js() helper function violates KISS structure
    • Fix: Inline the CDN download loop directly where highcharts_js and highcharts_more_js are assigned
  3. VQ-02 / VQ-05 AXIS: X-axis from –0.02 to 0.50 at 0.01 intervals creates 52 ticks and wastes right-side canvas
    • Fix: Set "tickInterval": 0.05 on yAxis (note: inverted chart, so probability is yAxis in Highcharts before inversion) and "max": 0.42
  4. VQ-01 ANNOTATION SIZE: Reference line labels at 16px are below the 18px floor
    • Fix: Change "fontSize": "16px" in plotLines label styles to "fontSize": "18px"

AI Feedback for Next Attempt

Fix four concrete issues: (1) title must be exactly shap-waterfall · highcharts · anyplot.ai; (2) inline the JS download loop — no helper functions; (3) set tickInterval: 0.05 and max: 0.42 on the probability axis to remove tick density and wasted right-side space; (4) raise plotLine annotation font size from 16px to 18px. For design improvement, consider adding a subtitle that frames the scenario (e.g., "Individual loan applicant — features ranked by |SHAP value|") and explore slightly larger bar heights (reduce groupPadding) to make the waterfall fill the vertical space more prominently.

Verdict: REJECTED

@github-actions github-actions Bot added quality:86 Quality score 86/100 ai-rejected Quality not OK, triggers update ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions Bot commented May 7, 2026

🔧 Repair Attempt 1/4

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

AI Review - Attempt 2/3

Image Description

Light render (plot-light.png): The chart renders on a warm off-white #FAF8F1 background. A horizontal SHAP waterfall chart is displayed with 12 rows: baseline bar (green #009E73), 10 feature bars (orange #D55E00 for positive SHAP, blue #0072B2 for negative SHAP), and a final prediction bar (green #009E73). Feature names are shown on the left y-axis (E[f(x)] Baseline, Credit Score, Debt-to-Income, Annual Income, Loan Amount, Employment Years, Payment History, Open Accounts, Credit Inquiries, Credit Age, Savings Balance, f(x) Prediction). The x-axis is labeled "Probability of Default" with tick values from -0.05 to 0.45. SHAP value labels (e.g., -0.180, +0.150, -0.120) are displayed beside each bar. Two reference lines mark Baseline 0.35 (dashed) and Prediction 0.20 (dotted). All title, axis label, tick label, and data label text is clearly readable as dark ink on the light surface. Legibility verdict: PASS.

Dark render (plot-dark.png): The same chart renders on a warm near-black #1A1A17 background. Layout, data, and structure are identical to the light render. Data bar colors are unchanged — green baseline/prediction, orange positive, blue negative — confirming Okabe-Ito positions 1–3 are theme-invariant. Chrome elements flip correctly: feature labels appear as light gray #B8B7B0 against the dark background, axis titles are light #F0EFE8, and grid lines are very subtle. The reference line labels ("Prediction 0.20" in brand green, "Baseline 0.35" in soft gray) are legible. No dark-on-dark text failures detected. Legibility verdict: PASS.

Both paragraphs are required. A review that only describes one render is invalid.

Score: 86/100

Category Score Max
Visual Quality 28 30
Design Excellence 13 20
Spec Compliance 14 15
Data Quality 15 15
Code Quality 9 10
Library Mastery 7 10
Total 86 100

Visual Quality (28/30)

  • VQ-01: Text Legibility (7/8) — All sizes explicitly set (title 28px, axis 22px, ticks 18-20px, data labels 18px); data labels slightly small at this canvas size, 20-22px would be better
  • VQ-02: No Overlap (6/6) — No collisions in either render
  • VQ-03: Element Visibility (6/6) — All bars sized and padded well, clearly distinct
  • VQ-04: Color Accessibility (2/2) — Orange/blue are high-contrast and CVD-safe; green reference bars unambiguous
  • VQ-05: Layout & Canvas (3/4) — Chart fills canvas well, but yAxis.max: 0.50 extends 15+ percentage points beyond the Baseline 0.35 reference line, wasting canvas on the right
  • VQ-06: Axis Labels & Title (2/2) — "Probability of Default" (x-axis) and "Feature" (y-axis) are descriptive
  • VQ-07: Palette Compliance (2/2) — Okabe-Ito positions 1-3 used correctly; backgrounds #FAF8F1/#1A1A17; chrome tokens applied to all elements in both themes

Design Excellence (13/20)

  • DE-01: Aesthetic Sophistication (5/8) — Above defaults: semantic color coding for positive/negative direction, brand green for reference bars, clean fontFamily, left-aligned title with explicit positioning; not yet publication-level
  • DE-02: Visual Refinement (4/6) — borderWidth: 0 removes bar outlines, legend disabled, tooltip disabled, subtle GRID token applied; top/right axes still present (Highcharts default frame)
  • DE-03: Data Storytelling (4/6) — Color instantly signals contribution direction (warm orange = risk up, cool blue = risk down, green = anchor points); magnitude-ordered features guide the eye to highest-impact variables; clear visual flow from baseline to prediction

Spec Compliance (14/15)

  • SC-01: Plot Type (5/5) — Native Highcharts waterfall type with inverted: true — correct horizontal waterfall
  • SC-02: Required Features (4/4) — Cumulative bars, positive/negative color coding, base value bar, prediction bar, numeric SHAP labels, horizontal layout, reference lines, features sorted by |SHAP| magnitude, native waterfall connector lines
  • SC-03: Data Mapping (3/3) — Features on y-axis, probability of default on x-axis, all 10 features shown
  • SC-04: Title & Legend (2/3) — Title is "Credit Default Risk · shap-waterfall · highcharts · anyplot.ai" — the "Credit Default Risk · " prefix is non-standard; required format is {spec-id} · {library} · anyplot.ai = "shap-waterfall · highcharts · anyplot.ai"

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Both positive (Debt-to-Income, Loan Amount, Open Accounts, Credit Inquiries) and negative (Credit Score, Annual Income, Employment Years, Payment History, Credit Age, Savings Balance) contributors shown; baseline and prediction bookend bars complete the picture
  • DQ-02: Realistic Context (5/5) — Credit default risk model for a single loan application; all feature names are domain-appropriate; neutral business scenario with no controversial content
  • DQ-03: Appropriate Scale (4/4) — Base value 0.35 (35% average default probability) realistic for a credit model; SHAP values ranging ±0.18 appropriate for probability-scale output; final prediction 0.20 is plausible

Code Quality (9/10)

  • CQ-01: KISS Structure (2/3) — download_js() helper function defined; KISS requires flat script with no functions/classes
  • CQ-02: Reproducibility (2/2) — All data hardcoded; fully deterministic
  • CQ-03: Clean Imports (2/2) — All 9 imports are used
  • CQ-04: Code Elegance (2/2) — JS formatter injection via string-replace is pragmatic for Highcharts Python integration; multi-CDN fallback is justified; no fake UI
  • CQ-05: Output & API (1/1) — Saves plot-{THEME}.png and plot-{THEME}.html correctly

Library Mastery (7/10)

  • LM-01: Idiomatic Usage (3/5) — Uses native Highcharts waterfall type correctly; however, bypasses highcharts_core Python API entirely in favour of raw JSON + string manipulation — the Python library's Chart/HighchartsOptions API is not used
  • LM-02: Distinctive Features (4/5) — isSum: true for the prediction total bar (Highcharts-specific waterfall feature), inverted: true for horizontal orientation, plotLines with styled labels for reference lines, CDP-based full-resolution screenshot capture — all Highcharts-distinctive

Score Caps Applied

  • None

Strengths

  • Native Highcharts waterfall chart type with isSum flag correctly models the cumulative SHAP attribution chain from baseline to final prediction
  • Semantic Okabe-Ito color assignment (orange = positive/risk-up, blue = negative/risk-down, green = anchors) creates an immediate, intuitive visual narrative that needs no legend
  • Complete theme-adaptive chrome: all INK, INK_SOFT, GRID, and PAGE_BG tokens applied to every axis, label, and background element in both renders
  • Multi-CDN fallback download logic with retry ensures robust CI execution even when a CDN is unreachable
  • Dual plotLines (baseline dashed, prediction dotted) with branded label styling fulfil the spec's "labeled reference lines" requirement

Weaknesses

  • Title prefix "Credit Default Risk · " is non-standard; must be exactly shap-waterfall · highcharts · anyplot.ai
  • yAxis.max: 0.50 wastes ~30% of canvas width to the right of the Baseline 0.35 line; trim to ~0.40–0.42
  • Data label fontSize 18px is slightly small for a 4800×2700 canvas; 20–22px would improve legibility
  • download_js() helper function violates KISS; inline the CDN logic or simplify to a single URL
  • Raw JSON construction bypasses highcharts_core Python API; LM-01 capped at 3/5

Issues Found

  1. SC-04 TITLE: "Credit Default Risk · shap-waterfall · highcharts · anyplot.ai" has a non-standard prefix
    • Fix: Change title text to "shap-waterfall · highcharts · anyplot.ai"
  2. VQ-05 CANVAS: "max": 0.50 on yAxis extends well past data range, leaving large empty area right of Baseline line
    • Fix: Set "max": 0.42 to reduce wasted space while preserving the reference lines

AI Feedback for Next Attempt

Fix the two concrete issues: (1) Strip the "Credit Default Risk · " prefix from the title — use exactly "shap-waterfall · highcharts · anyplot.ai". (2) Lower yAxis max from 0.50 to ~0.42 so the chart fills the canvas more efficiently. Optionally bump data label fontSize from 18px to 20-22px and inline the CDN download logic to remove the helper function.

Verdict: APPROVED

@github-actions github-actions Bot added the ai-approved Quality OK, ready for merge label May 7, 2026
@MarkusNeusinger MarkusNeusinger enabled auto-merge (squash) May 7, 2026 20:18
MarkusNeusinger added a commit that referenced this pull request May 7, 2026
)

## Summary
First run of \`auto-update-pr-branches.yml\` after #5957 found 0 BEHIND
PRs even though three were stuck behind main (#5916, #5870, #5902). Two
issues:

1. **Timing.** The workflow runs ~4s after the push to main, but GitHub
recomputes \`mergeStateStatus\` and the cached PR head SHA
asynchronously. Right after the push the field is still UNKNOWN and the
cached head can be stale → \`update-branch\` returns *expected head sha
didn't match current head ref*. Add a 30s sleep at the start.
2. **Over-strict filter.** The script only iterated PRs where
\`mergeStateStatus == "BEHIND"\`, skipping UNKNOWN candidates — exactly
the ones we wanted to fix. Drop the filter: after a push to main, every
open auto-merge PR is behind, and \`update-branch\` is a no-op when the
head is already up-to-date.

Also:
- Bump permissions to \`contents: write\` (update-branch creates a merge
commit on the head ref).
- Drop \`--silent\` and capture stderr so the actual GitHub error lands
in the log.

Verified manually: calling \`PUT /pulls/{num}/update-branch\` from the
CLI on #5916 and #5870 worked and they auto-merged within seconds. The
422 on #5902 was a real history-divergence conflict (4 ahead / 58 behind
/ merge_base differs) — separate problem.

## Test plan
- [ ] After this merges, push something to main and confirm the workflow
finds N>0 PRs (where N is open auto-merge PRs).
- [ ] Confirm any genuinely stuck PR (conflict) gets a clear error in
the log instead of \`likely conflict or stale ref\`.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MarkusNeusinger MarkusNeusinger force-pushed the implementation/shap-waterfall/highcharts branch from e479b98 to 221519c Compare May 7, 2026 20:24
@MarkusNeusinger MarkusNeusinger merged commit 2a2c9d8 into main May 7, 2026
6 checks passed
@MarkusNeusinger MarkusNeusinger deleted the implementation/shap-waterfall/highcharts branch May 7, 2026 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge ai-attempt-1 First repair attempt quality:86 Quality score 86/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant