feat(matplotlib): implement calibration-curve by github-actions[bot] · Pull Request #2364 · MarkusNeusinger/anyplot

github-actions · 2025-12-26T19:38:10Z

Implementation: `calibration-curve` - matplotlib

Implements the matplotlib version of calibration-curve.

File: plots/calibration-curve/implementations/matplotlib.py

claude · 2025-12-26T19:44:16Z

AI Review - Attempt 1/3

Image Description

The plot consists of two vertically stacked subplots. The main upper subplot shows three calibration curves against a dashed black diagonal reference line representing perfect calibration. The "Well-Calibrated" model (blue line with circle markers) follows closely along the diagonal. The "Overconfident" model (yellow line with square markers) shows a steep S-curve pattern, jumping sharply from 0 to 1 around the 0.4-0.6 probability range. The "Underconfident" model (pink/magenta line with triangle markers) shows a flatter curve. Each model displays its Brier score in the legend (0.101, 0.020, 0.181 respectively). The lower subplot shows a histogram of predicted probability distributions for all three models, clearly showing that the overconfident model clusters predictions near 0 and 1, while the underconfident model clusters near 0.5, and the well-calibrated model has a more spread distribution. All text is clearly readable, colors are distinct and colorblind-friendly, and the layout is well-balanced.

Quality Score: 93/100

Criteria Checklist

Visual Quality (38/40 pts)

VQ-01: Text Legibility (10/10) - Title at 24pt, axis labels at 20pt, tick labels at 16pt, legend at 16pt - all perfectly readable
VQ-02: No Overlap (8/8) - No overlapping text or elements anywhere
VQ-03: Element Visibility (7/8) - Markers at size 12 with linewidth 3 are clearly visible; could be slightly larger but acceptable
VQ-04: Color Accessibility (5/5) - Blue, yellow, and pink/magenta are distinguishable for colorblind users
VQ-05: Layout Balance (5/5) - Two-subplot layout with 3:1 height ratio uses canvas effectively
VQ-06: Axis Labels (1/2) - Labels are descriptive ("Mean Predicted Probability", "Fraction of Positives", "Count") but lack units
VQ-07: Grid & Legend (2/2) - Grid at alpha=0.3 with dashed style is subtle, legends well-placed

Spec Compliance (25/25 pts)

SC-01: Plot Type (8/8) - Correct calibration/reliability diagram with diagonal reference
SC-02: Data Mapping (5/5) - X-axis shows mean predicted probability, Y-axis shows fraction of positives
SC-03: Required Features (5/5) - Has diagonal reference line, 10 bins, Brier scores displayed, histogram subplot for prediction distribution, multiple model comparison with distinct colors and legend
SC-04: Data Range (3/3) - Both axes range from 0 to 1 as appropriate for probabilities
SC-05: Legend Accuracy (2/2) - Legends correctly identify each model with Brier scores
SC-06: Title Format (2/2) - Uses exact format "calibration-curve · matplotlib · pyplots.ai"

Data Quality (18/20 pts)

DQ-01: Feature Coverage (7/8) - Shows well-calibrated, overconfident, and underconfident models demonstrating key calibration patterns; histogram clearly shows distribution differences
DQ-02: Realistic Context (6/7) - Simulated classifier outputs are plausible; using 35% positive rate is realistic for imbalanced classification
DQ-03: Appropriate Scale (5/5) - 2000 samples, probabilities correctly bounded 0-1, Brier scores in realistic range

Code Quality (10/10 pts)

CQ-01: KISS Structure (3/3) - Follows imports → data → plot → save structure, no functions or classes
CQ-02: Reproducibility (3/3) - Uses np.random.seed(42)
CQ-03: Clean Imports (2/2) - Only imports matplotlib.pyplot and numpy, both used
CQ-04: No Deprecated API (1/1) - All APIs are current
CQ-05: Output Correct (1/1) - Saves as 'plot.png'

Library Features (2/5 pts)

LF-01: Uses distinctive library features (2/5) - Uses matplotlib correctly with subplots and gridspec_kw for height ratios, but doesn't leverage more distinctive matplotlib features like fill_between for confidence intervals or custom tick formatting

Strengths

Excellent multi-model comparison showing well-calibrated, overconfident, and underconfident classifiers
Includes histogram subplot as suggested in spec for showing prediction distributions
Brier scores integrated into legend for quick comparison
Clean separation of calibration curve calculation logic
Colorblind-friendly palette with distinct marker shapes for each model
Professional layout with appropriate subplot height ratios

Weaknesses

Axis labels lack units or additional context (e.g., could specify "probability" units explicitly)
Could use more distinctive matplotlib features like fill_between for confidence bands

Verdict: APPROVED

…libration-curve

github-actions Bot added 2 commits December 26, 2025 19:37

feat(matplotlib): implement calibration-curve

66ad202

chore(matplotlib): add metadata for calibration-curve

cc1a6fe

github-actions Bot added the quality:93 Quality score 93/100 label Dec 26, 2025

chore(matplotlib): update quality score 93 and review feedback for ca…

afb357a

…libration-curve

github-actions Bot added the ai-approved Quality OK, ready for merge label Dec 26, 2025

Merge branch 'main' into implementation/calibration-curve/matplotlib

52c373e

github-actions Bot merged commit 88eaac4 into main Dec 26, 2025
3 checks passed

github-actions Bot deleted the implementation/calibration-curve/matplotlib branch December 26, 2025 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(matplotlib): implement calibration-curve#2364

feat(matplotlib): implement calibration-curve#2364
github-actions[bot] merged 4 commits intomainfrom
implementation/calibration-curve/matplotlib

github-actions Bot commented Dec 26, 2025

Uh oh!

claude Bot commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

github-actions Bot commented Dec 26, 2025

Implementation: calibration-curve - matplotlib

Uh oh!

claude Bot commented Dec 26, 2025

AI Review - Attempt 1/3

Image Description

Quality Score: 93/100

Criteria Checklist

Strengths

Weaknesses

Verdict: APPROVED

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Implementation: `calibration-curve` - matplotlib