Skip to content

feat(matplotlib): implement pp-basic#4869

Merged
github-actions[bot] merged 6 commits intomainfrom
implementation/pp-basic/matplotlib
Mar 15, 2026
Merged

feat(matplotlib): implement pp-basic#4869
github-actions[bot] merged 6 commits intomainfrom
implementation/pp-basic/matplotlib

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Implementation: pp-basic - matplotlib

Implements the matplotlib version of pp-basic.

File: plots/pp-basic/implementations/matplotlib.py

Parent Issue: #4587


🤖 impl-generate workflow

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 15, 2026

AI Review - Attempt 1/3

Image Description

The plot displays a P-P (Probability-Probability) plot with 200 blue scatter points (#306998, Python Blue) with white edges (alpha=0.7), plotted against a gray dashed 45-degree reference line. The x-axis is labeled "Theoretical Cumulative Probability" and the y-axis "Empirical Cumulative Probability", both ranging from 0.0 to 1.0. The title reads "pp-basic · matplotlib · pyplots.ai" in medium weight. The plot uses a square aspect ratio (12×12 figure). Top and right spines are removed. Subtle grid lines (alpha=0.2) appear on both axes. The data points follow the diagonal closely but show a slight S-shaped deviation in the upper range (around 0.7–0.9), consistent with the mixture of two normal distributions used to generate the data.

Score: 75/100

Category Score Max
Visual Quality 25 30
Design Excellence 9 20
Spec Compliance 15 15
Data Quality 12 15
Code Quality 9 10
Library Mastery 5 10
Total 75 100

Visual Quality (25/30)

  • VQ-01: Text Legibility (7/8) — Font sizes explicitly set (title 24, labels 20, ticks 16). All text clearly readable.
  • VQ-02: No Overlap (5/6) — Some scatter point overlap in dense regions (lower-left, upper-right), inherent to data density but exacerbated by oversized markers.
  • VQ-03: Element Visibility (4/6) — Markers at s=180 are too large for 200 data points. Guidelines recommend s=50–100 for 100–300 points. Oversized markers cause unnecessary visual clutter and obscure the reference line.
  • VQ-04: Color Accessibility (4/4) — Single-series Python Blue, colorblind-safe, good contrast.
  • VQ-05: Layout & Canvas (3/4) — Square aspect ratio is correct per spec. 12×12 at 300dpi = 3600×3600 matching allowed square format. Good utilization.
  • VQ-06: Axis Labels & Title (2/2) — Descriptive labels, probability is unitless. Title follows required format.

Design Excellence (9/20)

  • DE-01: Aesthetic Sophistication (4/8) — Well-configured default with Python Blue and white edges. Clean but not exceptional. No custom palette needed for single series.
  • DE-02: Visual Refinement (3/6) — Top/right spines removed, subtle grid on both axes. Some refinement visible beyond defaults.
  • DE-03: Data Storytelling (2/6) — Data shows deviation from diagonal but no visual emphasis or hierarchy draws attention to the deviation pattern. Viewer must find the insight themselves.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct P-P plot comparing empirical vs theoretical CDF.
  • SC-02: Required Features (4/4) — 45-degree reference line, both axes 0–1, square aspect ratio, empirical CDF computed as i/(n+1), theoretical CDF evaluated against fitted normal.
  • SC-03: Data Mapping (3/3) — Theoretical CDF on x-axis, empirical on y-axis, all data visible.
  • SC-04: Title & Legend (3/3) — Title follows {spec-id} · {library} · pyplots.ai format. No legend needed for single series.

Data Quality (12/15)

  • DQ-01: Feature Coverage (5/6) — Mixture distribution creates visible deviation from diagonal, demonstrating the diagnostic purpose. Could be slightly more dramatic to better illustrate the S-shaped pattern mentioned in the spec.
  • DQ-02: Realistic Context (3/5) — Data is plausible (mixture of normals simulating contaminated sample) but generic — no real-world context or domain framing beyond abstract "observed" data.
  • DQ-03: Appropriate Scale (4/4) — 200 data points within spec's 50–500 range. Sensible parameter values.

Code Quality (9/10)

  • CQ-01: KISS Structure (3/3) — Linear script: imports → data → plot → save. No functions or classes.
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set.
  • CQ-03: Clean Imports (2/2) — All imports used (matplotlib.pyplot, numpy, math.erfc, math.sqrt).
  • CQ-04: Code Elegance (1/2) — Using from math import erfc, sqrt with a list comprehension for CDF calculation is slightly inelegant. A vectorized approach using scipy.stats.norm.cdf or numpy's erf would be cleaner and more Pythonic.
  • CQ-05: Output & API (1/1) — Saves as plot.png with dpi=300, bbox_inches='tight'. Current API.

Library Mastery (5/10)

  • LM-01: Idiomatic Usage (4/5) — Uses ax methods correctly, fig/ax pattern, spine manipulation, grid styling, tight_layout. Good idiomatic matplotlib.
  • LM-02: Distinctive Features (1/5) — Generic scatter + line plot. No distinctive matplotlib features leveraged (e.g., custom transforms, path effects, fill_between for confidence bands).

Score Caps Applied

  • None

Strengths

  • Perfect spec compliance — all required P-P plot features present with correct data mapping
  • Clean, readable code structure following KISS principle
  • Appropriate use of square aspect ratio as specified
  • Good color choice with Python Blue and white marker edges

Weaknesses

  • Marker size (s=180) is too large for 200 data points, causing visual clutter and overlap
  • No data storytelling — the S-shaped deviation pattern is present but not emphasized
  • Generic data context — no real-world domain framing
  • Manual CDF calculation using math.erfc instead of vectorized numpy/scipy approach
  • No distinctive matplotlib features used

Issues Found

  1. VQ-03 MEDIUM: Markers oversized at s=180 for 200 points
    • Fix: Reduce to s=60–80 with alpha=0.6 for better density visualization
  2. DE-03 LOW: No visual hierarchy or storytelling
    • Fix: Consider using fill_between to shade the area between points and the diagonal, or use color gradient along the diagonal to highlight deviation regions
  3. DQ-02 MEDIUM: Generic data context
    • Fix: Frame data in a real-world context (e.g., manufacturing quality measurements, lab test results)
  4. LM-02 LOW: No distinctive matplotlib features
    • Fix: Leverage features like fill_between for confidence bands, path effects, or custom tick formatting
  5. CQ-04 LOW: Non-vectorized CDF calculation
    • Fix: Use scipy.stats.norm.cdf or numpy's vectorized operations instead of list comprehension with math.erfc

AI Feedback for Next Attempt

Reduce marker size to s=60–80 for better density handling. Add visual storytelling through fill_between or color encoding to highlight deviation from the diagonal. Frame the data in a real-world context (e.g., manufacturing QC measurements vs. normal specification). Use vectorized CDF computation. Consider adding a subtle confidence band around the reference line to enhance both design excellence and library mastery.

Verdict: REJECTED

@github-actions github-actions bot added quality:75 Quality score 75/100 ai-rejected Quality not OK, triggers update labels Mar 15, 2026
@github-actions github-actions bot added ai-attempt-1 First repair attempt and removed ai-rejected Quality not OK, triggers update labels Mar 15, 2026
Attempt 1/3 - fixes based on AI review
@github-actions
Copy link
Copy Markdown
Contributor Author

🔧 Repair Attempt 1/3

Applied fixes based on AI review feedback.

Status: Repair completed, re-triggering review...


🤖 impl-repair

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 15, 2026

AI Review - Attempt 2/3

Image Description

The plot displays a P-P (Probability-Probability) plot on a square canvas with both axes ranging from 0 to 1. The x-axis is labeled "Theoretical Cumulative Probability (Normal)" and the y-axis "Empirical Cumulative Probability." The title reads "pp-basic · matplotlib · pyplots.ai" with an italic subtitle "Bolt tensile strength (MPa) vs. normal distribution — quality control diagnostic." Data points are colored in two tones: Python Blue (#306998) for points near the diagonal and coral red (#C44E52) for points deviating more than 0.03 from the reference line. A dashed gray 45-degree reference line with a subtle stroke path effect runs diagonally. A light blue 95% confidence band surrounds the reference line. An annotation with a curved arrow points to the deviation region, reading "Heavier upper tail (secondary supplier batch)." The legend in the lower right shows "95% confidence band" and "Perfect normal fit." Top and right spines are removed; remaining spines are thinned and colored gray. Both major and minor grid lines are present at very low alpha. The S-shaped departure from the diagonal in the upper portion is clearly visible, telling the story of a mixture distribution.

Score: 90/100

Category Score Max
Visual Quality 28 30
Design Excellence 16 20
Spec Compliance 15 15
Data Quality 15 15
Code Quality 9 10
Library Mastery 7 10
Total 90 100

Visual Quality (28/30)

  • VQ-01: Text Legibility (8/8) — All font sizes explicitly set: title 24pt, axis labels 20pt, ticks 16pt, legend 14pt, annotation 14pt
  • VQ-02: No Overlap (6/6) — No overlapping text; annotation well-positioned away from data
  • VQ-03: Element Visibility (5/6) — s=70 for 200 points is within guideline range (50-100) but on the smaller side; alpha=0.65 appropriate
  • VQ-04: Color Accessibility (3/4) — Blue vs coral/red is generally colorblind-safe but not optimal for all types of color vision deficiency
  • VQ-05: Layout & Canvas (4/4) — Square aspect ratio well-utilized, balanced margins, nothing cut off
  • VQ-06: Axis Labels & Title (2/2) — Descriptive labels with distribution context specified

Design Excellence (16/20)

  • DE-01: Aesthetic Sophistication (6/8) — Strong design: custom two-color scheme encoding deviation magnitude, path effects on reference line for visual depth, confidence band, italic domain subtitle. Clearly above defaults.
  • DE-02: Visual Refinement (5/6) — Top/right spines removed, remaining spines thinned and colored feat(plotly): implement donut-labeled #555, subtle major+minor grid at low alpha, generous whitespace. Very polished.
  • DE-03: Data Storytelling (5/6) — Excellent storytelling: color-coded deviations immediately highlight distributional departure, annotation explains the cause (secondary supplier batch), subtitle provides domain context. Viewer immediately understands the insight.

Spec Compliance (15/15)

  • SC-01: Plot Type (5/5) — Correct P-P plot: empirical CDF vs theoretical CDF
  • SC-02: Required Features (4/4) — All spec features: 45-degree reference line, square aspect ratio, sorted data with i/(n+1) plotting position, theoretical CDF evaluation, visible S-shaped deviation
  • SC-03: Data Mapping (3/3) — X=theoretical CDF, Y=empirical CDF, both axes 0-1
  • SC-04: Title & Legend (3/3) — Title format "pp-basic · matplotlib · pyplots.ai" correct; legend labels meaningful and accurate

Data Quality (15/15)

  • DQ-01: Feature Coverage (6/6) — Mixture distribution creates clear S-shaped departure from diagonal, demonstrating exactly what P-P plots detect. Shows both conforming and deviating regions.
  • DQ-02: Realistic Context (5/5) — Bolt tensile strength (MPa) for quality control is a real, neutral engineering scenario directly matching spec applications
  • DQ-03: Appropriate Scale (4/4) — 840/910 MPa with σ=28-35 are realistic values for steel bolt tensile strength

Code Quality (9/10)

  • CQ-01: KISS Structure (2/3) — Linear flow but sys.path manipulation adds boilerplate complexity; vectorized normal_cdf is borderline function definition
  • CQ-02: Reproducibility (2/2) — np.random.seed(42) set
  • CQ-03: Clean Imports (2/2) — All imports used: os/sys for path hack, math for erf/sqrt, patheffects, pyplot, ticker, numpy
  • CQ-04: Code Elegance (2/2) — Clean, appropriate complexity; no fake UI or over-engineering
  • CQ-05: Output & API (1/1) — Saves as plot.png, dpi=300, bbox_inches='tight'

Library Mastery (7/10)

  • LM-01: Idiomatic Usage (4/5) — Consistently uses Axes methods, mticker for tick control, fill_between for confidence band. Solid idiomatic matplotlib.
  • LM-02: Distinctive Features (3/5) — Uses matplotlib.patheffects (pe.Stroke + pe.Normal) for reference line depth, mticker.MultipleLocator for precise tick placement — these are distinctive matplotlib features

Score Caps Applied

  • None — no caps triggered

Strengths

  • Excellent data storytelling through color-coded deviations and domain-contextual annotation
  • Strong visual refinement: path effects, thinned/colored spines, dual-level grid, confidence band
  • Perfect spec compliance with all required P-P plot features including square aspect ratio and i/(n+1) formula
  • Realistic manufacturing quality control scenario with plausible bolt tensile strength values
  • 95% confidence band adds genuine statistical value beyond spec requirements

Weaknesses

  • Marker size (s=70) could be slightly larger for better visibility at 200 points
  • Blue/red color pairing could be improved for deuteranopia accessibility

Issues Found

  • None critical

AI Feedback for Next Attempt

N/A — implementation meets quality threshold.

Verdict: APPROVED

@github-actions github-actions bot added quality:90 Quality score 90/100 ai-approved Quality OK, ready for merge labels Mar 15, 2026
@github-actions github-actions bot merged commit 002adfa into main Mar 15, 2026
@github-actions github-actions bot deleted the implementation/pp-basic/matplotlib branch March 15, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-approved Quality OK, ready for merge ai-attempt-1 First repair attempt quality:75 Quality score 75/100 quality:90 Quality score 90/100

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants