Skip to content

feat(plotnine): implement line-win-probability#5098

Merged
github-actions[bot] merged 7 commits intomainfrom
implementation/line-win-probability/plotnine
Mar 20, 2026
Merged

feat(plotnine): implement line-win-probability#5098
github-actions[bot] merged 7 commits intomainfrom
implementation/line-win-probability/plotnine

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Implementation: line-win-probability - plotnine

Implements the plotnine version of line-win-probability.

File: plots/line-win-probability/implementations/plotnine.py

Parent Issue: #4418


🤖 impl-generate workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 20, 2026

AI Review - Attempt 1/3

Image Description

The plot shows a win probability chart for an NFL game between the Eagles (home) and Cowboys (away). The x-axis displays game progression with labels at Kickoff, Q2, Halftime, Q4, and Final, separated by subtle dotted vertical lines. The y-axis shows Home Win Probability from 0% to 100% with a dashed horizontal reference line at 50%. A dark line traces the win probability across ~130 plays. The area above the 50% line is filled with a muted teal (#004C54, Eagles) and below with a muted rose (#8B1A1A, Cowboys), both at reduced opacity. White-filled circle markers with bold text labels annotate scoring events (FG Home, TD Away, TD Home, FG Away). A boxed annotation in the lower right reads "Final: Eagles 24 – Cowboys 17". The legend at top shows both teams. The title reads "line-win-probability · plotnine · pyplots.ai".

Score: 86/100

Category Score Max
Visual Quality 26 30
Design Excellence 14 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 6 10
Total 86 100

Visual Quality (26/30)

  • VQ-01: Text Legibility (7/8) - All font sizes explicitly set (title=24, axis_title=20, axis_text=16, legend_text=16). geom_text size=7 is slightly small for annotation labels at high resolution.
  • VQ-02: No Overlap (4/6) - Some annotation labels are crowded near each other, particularly "TD Home"/"FG Away" near the end of the game and around the Q4 area where multiple events cluster.
  • VQ-03: Element Visibility (5/6) - Line (size=1.2) and points (size=3) are visible. Ribbon fills at alpha=0.4 are appropriate but could be slightly more saturated for stronger visual impact.
  • VQ-04: Color Accessibility (4/4) - Teal and rose are distinguishable for all forms of color blindness.
  • VQ-05: Layout & Canvas (4/4) - Good 16:9 proportions, plot fills canvas well with balanced margins.
  • VQ-06: Axis Labels & Title (2/2) - "Home Win Probability" and "Game Progression" are descriptive; percentage shown on y-axis ticks.

Design Excellence (14/20)

  • DE-01: Aesthetic Sophistication (5/8) - Custom team-inspired colors (teal, dark rose), muted alpha for fills, dark line for the probability trace. Clearly above defaults with intentional color choices, but not quite FiveThirtyEight-level polish.
  • DE-02: Visual Refinement (4/6) - theme_minimal removes spines, x-grid removed, minor grid removed, y-grid is subtle (#e0e0e0). Quarter dividers are appropriately dotted. Good attention to detail.
  • DE-03: Data Storytelling (5/6) - Excellent narrative: a back-and-forth game with multiple lead changes, scoring event annotations guide the viewer through key moments, final score provides closure. The viewer immediately understands the game flow.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) - Correct line/area chart showing win probability evolution over game time.
  • SC-02: Required Features (4/4) - All spec features present: 50% reference line, team-colored area fills above/below 50%, scoring event annotations, final score annotation, quarter markers.
  • SC-03: Data Mapping (3/3) - X = play number (game progression), Y = win probability. Correctly mapped.
  • SC-04: Title & Legend (3/3) - Title follows {spec-id} · {library} · pyplots.ai format. Legend labels "Home (Eagles)" and "Away (Cowboys)" are clear.

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) - Shows multiple momentum shifts, both teams leading at different points, various scoring event types (TDs, FGs for both teams), game ending with home team winning.
  • DQ-02: Realistic Context (5/5) - NFL Eagles vs Cowboys game context. Neutral sports topic with realistic scoring events.
  • DQ-03: Appropriate Scale (4/4) - Win probability ranges appropriately (clipped at 4-96%, ending ~78%). Scoring event magnitudes are realistic for NFL win probability models.

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) - Clean Imports → Data → Plot → Save structure with no functions or classes.
  • CQ-02: Reproducibility (2/2) - np.random.seed(42) set.
  • CQ-03: Clean Imports (2/2) - All imports are used.
  • CQ-04: Code Elegance (2/2) - Well-structured, appropriate complexity. Data generation with scoring events is clean and readable.
  • CQ-05: Output & API (1/1) - Saves as plot.png with dpi=300, current API.

Library Mastery (6/10)

  • LM-01: Idiomatic Usage (4/5) - Good grammar of graphics usage: layered geom_ribbon + geom_line + geom_point + geom_text composition, proper aes() mapping, scale_fill_manual, theme customization. Could use more advanced plotnine patterns.
  • LM-02: Distinctive Features (2/5) - Uses plotnine-specific layered grammar with geom_ribbon fill aesthetics and scale_fill_manual, but the approach could be replicated in most plotting libraries without much change.

Score Caps Applied

  • None

Strengths

  • Excellent spec compliance — all required features (50% line, team color fills, event annotations, quarter markers, final score) are present
  • Strong data storytelling with a realistic back-and-forth NFL game narrative that immediately engages the viewer
  • Clean, well-structured code with appropriate complexity and full reproducibility
  • Good color choices (teal/rose) that are accessible and evocative of team identities

Weaknesses

  • Annotation label overlap in areas with clustered scoring events (near end of game)
  • Annotation text size (size=7) could be slightly larger for better legibility at high resolution
  • Ribbon fill alpha could be increased slightly for stronger visual impact
  • Could leverage more distinctive plotnine features (e.g., stat layers, coord transforms)

Issues Found

  1. VQ-02 MODERATE: Annotation labels cluster and nearly overlap near Q4/Final where multiple events occur close together
    • Fix: Increase label offset or stagger label positions for closely-spaced events
  2. LM-02 LOW: Generic library usage that doesn't showcase plotnine-specific capabilities
    • Fix: Consider using plotnine-specific features like after_stat(), position adjustments, or coord_* transforms
  3. DE-01 MODERATE: Good but not publication-quality design — needs final polish
    • Fix: Consider slightly bolder fill colors, refined annotation styling, or a more distinctive color palette

AI Feedback for Next Attempt

Focus on three areas: (1) Fix annotation overlap by staggering label positions for closely-spaced events or using leader lines, (2) Increase annotation text size from 7 to 8-9 for better legibility, (3) Slightly increase ribbon alpha from 0.4 to 0.5 for stronger visual impact. These changes should push the score above 90.

Verdict: REJECTED

@github-actions github-actions Bot added quality:86 Quality score 86/100 ai-rejected Quality not OK, triggers update labels Mar 20, 2026
@github-actions github-actions Bot added ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels Mar 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 Repair Attempt 1/3

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 20, 2026

AI Review - Attempt 2/3

Image Description

The plot displays a win probability chart for an NFL game between the Eagles (home, teal #004C54) and Cowboys (away, muted dark red #8B1A1A). The title "line-win-probability · plotnine · pyplots.ai" is bold at the top. A legend at the top identifies both teams. The y-axis shows "Home Win Probability" from 0% to 100% with a dashed horizontal reference line at 50%. The area above 50% is filled with teal and below with dark red, both at alpha=0.4. A dark line (size=1.2) traces the win probability. White-filled circle markers denote scoring events (FG Home, TD Away, TD Home, FG Away) with bold text annotations. Dotted vertical lines mark quarter boundaries labeled Kickoff, Q2, Halftime, Q4, Final on the x-axis. A boxed annotation in the bottom-right reads "Final: Eagles 24 – Cowboys 17". The layout uses theme_minimal with subtle horizontal grid lines only.

Score: 87/100

Category Score Max
Visual Quality 27 30
Design Excellence 14 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 6 10
Total 87 100

Visual Quality (27/30)

  • VQ-01: Text Legibility (8/8) — All font sizes explicitly set: title=24, axis_title=20, axis_text=16, legend_text=16. All text clearly readable.
  • VQ-02: No Overlap (5/6) — Annotation labels in the Q4-Final area (FG Home, TD Home, FG Away) are positioned close together; readable but tight spacing.
  • VQ-03: Element Visibility (5/6) — Line size=1.2 and point size=3 are visible. Ribbon alpha=0.4 is appropriate. Markers could be slightly larger for better emphasis.
  • VQ-04: Color Accessibility (4/4) — Teal (#004C54) vs dark red (#8B1A1A) are distinguishable for all forms of color vision deficiency.
  • VQ-05: Layout & Canvas (3/4) — Right edge is tight; "FG Away" annotation and "Final" x-axis label are very close to the boundary. Some content nearly cut off.
  • VQ-06: Axis Labels & Title (2/2) — "Home Win Probability" with percentage formatting, "Game Progression" with descriptive quarter labels.

Design Excellence (14/20)

  • DE-01: Aesthetic Sophistication (6/8) — Custom team colors (not defaults), white-filled event markers, styled final score annotation box, intentional color hierarchy. Clearly above library defaults.
  • DE-02: Visual Refinement (4/6) — theme_minimal removes spines, x-grid removed, subtle y-grid (#e0e0e0), minor grid removed. Good refinement but not perfect — could use more whitespace and further polish.
  • DE-03: Data Storytelling (4/6) — Game narrative is clear through annotations, momentum swings visible via colored areas. Visual hierarchy guides the viewer through the game. Could be elevated with stronger emphasis on the decisive moment.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct win probability line chart with area fills above/below 50%.
  • SC-02: Required Features (4/4) — All spec requirements met: 50% reference line ✓, area fills with team colors ✓, scoring event annotations ✓, final score display ✓, quarter markers ✓.
  • SC-03: Data Mapping (3/3) — X=game progression (play number), Y=win probability (0-100%). Correctly mapped.
  • SC-04: Title & Legend (3/3) — Title follows exact format. Legend shows team names correctly.

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Multiple scoring events from both teams (TDs, FGs), momentum swings in both directions, close game that resolves to home win.
  • DQ-02: Realistic Context (5/5) — Eagles vs Cowboys NFL game with realistic scoring events and plausible final score (24-17). Sports domain is neutral.
  • DQ-03: Appropriate Scale (4/4) — Win probability starts at 0.50, scoring shifts are realistic (0.10-0.22 for TDs/FGs), clipped to 0.04-0.96 range, converges appropriately at game end.

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Clean Imports → Data → Plot → Save structure. No functions or classes.
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set.
  • CQ-03: Clean Imports (2/2) — All imports used, no extras.
  • CQ-04: Code Elegance (2/2) — Appropriate complexity for simulating a game scenario. Clean data generation and plotting logic.
  • CQ-05: Output & API (1/1) — Saves as plot.png with dpi=300. Current plotnine API used.

Library Mastery (6/10)

  • LM-01: Idiomatic Usage (4/5) — Proper grammar of graphics approach: ggplot() + multiple geom layers, aes mappings, scale functions, theme customization. Well-structured layer composition.
  • LM-02: Distinctive Features (2/5) — Uses geom_ribbon with fill aesthetic mapping and scale_fill_manual, which leverages the grammar of graphics paradigm. However, the approach is not uniquely distinctive to plotnine — similar patterns exist in ggplot2/letsplot.

Score Caps Applied

  • None

Strengths

  • Excellent spec compliance — all required features implemented perfectly (50% line, area fills, annotations, quarter markers, final score)
  • Strong data storytelling with realistic NFL game scenario and well-placed scoring event annotations
  • Clean, well-structured code with proper seed and idiomatic plotnine grammar of graphics patterns
  • Good color choices with team-appropriate teal and dark red that are colorblind-safe

Weaknesses

  • Right edge is tight — "FG Away" annotation and "Final" x-axis label nearly at boundary
  • Annotation labels in Q4-Final area are close together and could benefit from smarter positioning
  • Library mastery could be elevated with more distinctive plotnine features

Issues Found

  1. VQ-05 MODERATE: Right margin too tight — "FG Away" annotation near play 122 and "Final" x-axis label are very close to the plot boundary
    • Fix: Add right-side padding via scale_x_continuous expand parameter or adjust figure margins
  2. LM-02 LOW: Generic grammar of graphics usage without distinctive plotnine features
    • Fix: Consider using plotnine-specific features like stat layers, after_stat, or more sophisticated theme element customization
  3. DE-02 MODERATE: Visual refinement is good but not perfect — could push further toward publication quality
    • Fix: Add more generous margins, consider font family customization, refine annotation styling

AI Feedback for Next Attempt

Focus on three areas: (1) Fix right-edge clipping by expanding x-axis limits or adjusting margins so no annotations are cut off. (2) Improve annotation label positioning in dense areas — consider alternating label positions (above/below) more strategically or adjusting nudge values. (3) Push library mastery by using plotnine-specific features like stat transformations or more sophisticated theme element customization to demonstrate distinctive usage.

Verdict: REJECTED

@github-actions github-actions Bot added quality:87 Quality score 87/100 ai-rejected Quality not OK, triggers update labels Mar 20, 2026
@github-actions github-actions Bot added ai-attempt-2 Second repair attempt and removed ai-rejected Quality not OK, triggers update labels Mar 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 Repair Attempt 2/3

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 20, 2026

AI Review - Attempt 3/3

Image Description

The plot displays a win probability chart for an NFL game between the Philadelphia Eagles (Home) and Dallas Cowboys (Away). A dark line traces the win probability from Kickoff through Q2, Halftime, Q4, to Final. The area above the 50% dashed reference line is filled with a muted teal (#004C54, Eagles) and the area below with a muted rose/dark red (#8B1A1A, Cowboys). Nine scoring events are annotated with white-filled circle markers and bold text labels (FG Home, TD Away, TD Home, FG Away, etc.). A subtle golden highlight rectangle marks the decisive Q4 moment around the "TD Home" play at ~play 110. Dotted vertical lines mark quarter boundaries. The final score "Final: Eagles 24 – Cowboys 17" appears in a labeled box at the bottom-left. The title reads "line-win-probability · plotnine · pyplots.ai" in bold at the top. The legend at top-center shows both team colors. Y-axis runs from 0% to 100% with the label "Home Win Probability"; X-axis shows "Game Progression" with custom labels (Kickoff, Q2, Halftime, Q4, Final). The overall layout is clean with minimal grid lines (subtle horizontal only) and generous whitespace.

Score: 92/100

Category Score Max
Visual Quality 27 30
Design Excellence 16 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 10 10
Library Mastery 9 10
Total 92 100

Visual Quality (27/30)

  • VQ-01: Text Legibility (7/8) — Font sizes explicitly set (title=24, axis_title=18, axis_text=16, legend_text=15). Event labels at size=7 are slightly small relative to the large canvas but remain readable.
  • VQ-02: No Overlap (5/6) — Minor crowding in Q3 area where "FG Away" and "TD Home" labels are in proximity. Staggered offset logic helps but doesn't fully eliminate label proximity in dense regions.
  • VQ-03: Element Visibility (5/6) — Line at size=1.2 is clearly visible; markers at size=4 with white fill and stroke stand out well. Ribbon alpha=0.35 is adequate but could be slightly more saturated for stronger visual impact.
  • VQ-04: Color Accessibility (4/4) — Teal vs dark red is colorblind-safe with strong luminance contrast.
  • VQ-05: Layout & Canvas (4/4) — 16:9 aspect ratio with good canvas utilization, balanced margins, no cut-off elements.
  • VQ-06: Axis Labels & Title (2/2) — "Home Win Probability" and "Game Progression" are descriptive. Percentages shown on y-axis ticks.

Design Excellence (16/20)

  • DE-01: Aesthetic Sophistication (6/8) — Custom team colors with intentional palette, golden highlight zone for the decisive moment, refined typography hierarchy. Clearly above library defaults with purposeful design choices.
  • DE-02: Visual Refinement (5/6) — theme_minimal base, x-grid removed, subtle y-grid (#e8e8e8, 0.3), dotted quarter lines, legend with custom background. Well-polished.
  • DE-03: Data Storytelling (5/6) — The golden highlight zone draws attention to the decisive Q4 moment. Team-colored fills convey momentum at a glance. Event annotations mark key turning points. Final score box anchors the outcome. The viewer immediately grasps the game narrative.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct win probability line chart with team-colored area fills.
  • SC-02: Required Features (4/4) — All spec features present: 50% reference line, team-colored fills above/below 50%, scoring event annotations, final score display, quarter/period markers.
  • SC-03: Data Mapping (3/3) — X maps to game progression (play number), Y maps to win probability (0–1). Axes show full range.
  • SC-04: Title & Legend (3/3) — Title follows exact format "line-win-probability · plotnine · pyplots.ai". Legend labels "Eagles (Home)" and "Cowboys (Away)" match the data.

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Shows scoring events for both teams (FG and TD), momentum swings in both directions, a competitive mid-game, and a decisive late-game surge. Demonstrates all aspects of win probability dynamics.
  • DQ-02: Realistic Context (5/5) — NFL game between Eagles and Cowboys with realistic scoring events (field goals, touchdowns). Sports context is neutral and universally relatable.
  • DQ-03: Appropriate Scale (4/4) — 130 plays (realistic NFL game length), probabilities clipped to 0.04–0.96, scoring shifts of plausible magnitudes, final score 24–17 is realistic.

Code Quality (10/10)

  • CQ-01: KISS Structure (3/3) — Clean linear flow: imports → data generation → plot construction → save. No functions or classes.
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set at the top.
  • CQ-03: Clean Imports (2/2) — All imports are used in the implementation.
  • CQ-04: Code Elegance (2/2) — Appropriate complexity for the visualization. Smart annotation positioning with staggered offsets is well-implemented. No over-engineering or fake functionality.
  • CQ-05: Output & API (1/1) — Saves as plot.png with dpi=300. Current plotnine API used throughout.

Library Mastery (9/10)

  • LM-01: Idiomatic Usage (5/5) — Excellent grammar of graphics layering: ggplot() + multiple geom layers with proper aes mapping, scale functions, coord_cartesian, labs, theme composition. Textbook plotnine/ggplot2 style.
  • LM-02: Distinctive Features (4/5) — Uses plotnine-distinctive features: geom_ribbon for area fills, geom_segment for quarter lines (noted as plotnine-idiomatic vs geom_vline), scale_alpha_identity, annotate("label") with label_padding, element_blank/element_line/element_rect for fine-grained theme control. Layer composition pattern is distinctively ggplot2-based.

Score Caps Applied

  • None — No score caps triggered.

Strengths

  • Excellent data storytelling with the golden highlight zone marking the decisive Q4 moment
  • Strong plotnine idiom: multi-layer grammar of graphics composition with geom_ribbon, geom_segment, geom_text, and annotate
  • Complete spec compliance with all required features (50% reference, team fills, event annotations, quarter markers, final score)
  • Realistic NFL game data with plausible scoring events and momentum shifts for both teams
  • Well-refined visual design: minimal chrome, subtle grid, custom legend background, intentional color palette

Weaknesses

  • Event annotation labels (size=7) are slightly small relative to other text elements on the large canvas
  • Minor label crowding in Q3 region where multiple scoring events occur close together

Issues Found

None critical — minor visual polish items noted above do not warrant point deductions beyond what's already reflected.

AI Feedback for Next Attempt

This implementation is strong across all categories. The decisive moment highlight, team-colored ribbon fills, and clean grammar of graphics composition make this a polished win probability visualization. Minor improvements would be slightly larger event labels and additional anti-crowding logic for dense annotation regions, but these are cosmetic.

Verdict: APPROVED

@github-actions github-actions Bot added quality:92 Quality score 92/100 ai-approved Quality OK, ready for merge labels Mar 20, 2026
@github-actions github-actions Bot merged commit d8bf434 into main Mar 20, 2026
3 checks passed
@github-actions github-actions Bot deleted the implementation/line-win-probability/plotnine branch March 20, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge ai-attempt-1 First repair attempt ai-attempt-2 Second repair attempt quality:86 Quality score 86/100 quality:87 Quality score 87/100 quality:92 Quality score 92/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants