**Programmer:** python_scripts (Abhijith Warrier)

**PYTHON SCRIPT TO **_PLOT LEARNING CURVES TO DETECT UNDERFITTING & OVERFITTING_**. 🐍📈🤖**

This script demonstrates how **learning curves** help you visualize whether a model is **underfitting, overfitting, or just right**.
By tracking training and validation scores as the dataset size increases, you can assess if your model generalizes well.

### 📦 Import Required Libraries

We’ll use scikit-learn’s learning_curve utility and matplotlib for visualization.

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import learning_curve
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

### 🧩 Load Dataset and Initialize Model

We’ll use the Iris dataset and a Logistic Regression model for simplicity.

In [None]:
# Load the Iris dataset
data = load_iris()
X, y = data.data, data.target

# Initialize the model
model = LogisticRegression(max_iter=200)

### 📊 Generate Learning Curve Data

The learning_curve function computes training and validation scores for increasing training set sizes.

In [None]:
train_sizes, train_scores, test_scores = learning_curve(
    model, X, y, cv=5, scoring='accuracy',
    train_sizes=np.linspace(0.1, 1.0, 10), random_state=42
)

### 📈 Compute Mean and Standard Deviation

We’ll average results across cross-validation folds for smoother curves.

In [None]:
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)

### 🎨 Plot the Learning Curves

We visualize how the training and validation accuracy evolve with increasing data.

In [None]:
plt.figure(figsize=(8, 6))
plt.plot(train_sizes, train_mean, 'o-', color='teal', label='Training score')
plt.plot(train_sizes, test_mean, 'o-', color='orange', label='Cross-validation score')

# Add shading for standard deviation
plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, color='teal', alpha=0.2)
plt.fill_between(train_sizes, test_mean - test_std, test_mean + test_std, color='orange', alpha=0.2)

plt.title('Learning Curves (Logistic Regression)')
plt.xlabel('Training Set Size')
plt.ylabel('Accuracy Score')
plt.legend(loc='best')
plt.grid(True)
plt.imsave("/Users/abhijith/Downloads/AIwP-MOE-Output.png")
plt.show()

### 🧠 Interpretation
- If both curves converge at low accuracy → Underfitting (model too simple).
- If there’s a wide gap between curves → Overfitting (model too complex).
- If both converge at high accuracy → Good Fit (balanced model).