Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions plots/learning-curve-basic/specification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# learning-curve-basic: Model Learning Curve

## Description

A learning curve visualizes model performance (training and validation scores) as a function of training set size. It is essential for diagnosing bias vs variance tradeoffs, determining whether collecting more data would improve model performance, and guiding model selection decisions. The plot typically shows two lines with shaded confidence bands representing variability across cross-validation folds.

## Applications

- Diagnosing underfitting (high bias) when both training and validation scores are low
- Diagnosing overfitting (high variance) when training score is high but validation score is low with a large gap
- Determining if collecting more training data would improve model performance
- Comparing learning characteristics across different model architectures

## Data

- `train_sizes` (numeric) - Array of training set sizes used for evaluation
- `train_scores` (numeric) - Training scores at each sample size (2D: folds × sizes)
- `validation_scores` (numeric) - Validation scores at each sample size (2D: folds × sizes)
- Size: 5-20 different training set sizes, typically with 5-10 cross-validation folds
- Example: Scikit-learn's `learning_curve` function output

## Notes

- Use shaded regions to show confidence bands (e.g., ±1 standard deviation across folds)
- Clearly label the y-axis with the metric being evaluated (accuracy, F1, MSE, etc.)
- Include a legend distinguishing training from validation curves
- X-axis should show actual sample sizes or percentages of total training data
- Consider using distinct colors (e.g., blue for training, orange for validation) for clarity
30 changes: 30 additions & 0 deletions plots/learning-curve-basic/specification.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Specification-level metadata for learning-curve-basic
# Auto-synced to PostgreSQL on push to main

spec_id: learning-curve-basic
title: Model Learning Curve

# Specification tracking
created: 2025-12-26T17:28:31Z
updated: 2025-12-26T17:28:31Z
issue: 2275
suggested: MarkusNeusinger

# Classification tags (applies to all library implementations)
# See docs/concepts/tagging-system.md for detailed guidelines
tags:
plot_type:
- line
- learning-curve
data_type:
- numeric
- continuous
domain:
- machine-learning
- statistics
- data-science
features:
- basic
- confidence-band
- comparison
- diagnostic