# 🧠 Model Evaluation in Scikit-Learn: Accuracy, Precision, Recall, and Beyond

This notebook is a **complete guide** to evaluating models in **scikit-learn**, focusing on both **classification** and **regression** metrics.
It’s designed for aspiring **ML engineers**, emphasizing concepts that transfer to other frameworks (TensorFlow, PyTorch, etc.).

---

## 📘 Table of Contents
1. Why Model Evaluation Matters  
2. Classification Metrics  
3. Regression Metrics  
4. Classification Example – Logistic Regression  
5. Regression Example – Linear Regression  
6. ML Engineering Takeaways  

---


## 🧩 Classification Example: Logistic Regression on the Iris Dataset

We'll evaluate a simple logistic regression model using:
- **Accuracy**
- **Precision**
- **Recall**
- **F1-score**
- **Confusion Matrix**
- **ROC Curve and AUC**


In [None]:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report, roc_curve, roc_auc_score
import matplotlib.pyplot as plt

# Load and prepare data
iris = load_iris()
X, y = iris.data, iris.target

# Use only two classes for binary classification
X, y = X[y != 2], y[y != 2]

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]

# Metrics
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1-score:", f1_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# ROC curve
fpr, tpr, _ = roc_curve(y_test, y_proba)
plt.plot(fpr, tpr, label='ROC curve')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()


## 📊 Regression Example: Predicting House Prices

Here we’ll train a **linear regression model** and evaluate it using metrics like:
- **Mean Absolute Error (MAE)**
- **Mean Squared Error (MSE)**
- **Root Mean Squared Error (RMSE)**
- **R² (coefficient of determination)**


In [None]:

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Mock dataset
np.random.seed(42)
sqft = np.random.randint(800, 3500, 100)
price = sqft * 200 + np.random.randint(20000, 50000, 100)
data = pd.DataFrame({'SquareFeet': sqft, 'Price': price})

# Split
X_train, X_test, y_train, y_test = train_test_split(data[['SquareFeet']], data['Price'], test_size=0.2, random_state=42)

# Model
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"MAE: {mae:.2f}")
print(f"MSE: {mse:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.3f}")


## 💡 General ML Engineering Takeaways

- Always choose metrics aligned with **business goals** (e.g., recall for medical tests, precision for spam filters).  
- For **imbalanced data**, accuracy can be **misleading** — use precision, recall, or AUC.  
- Use **cross-validation** for more reliable metric estimates.  
- In regression, RMSE penalizes **larger errors** more than MAE.  
- Always visualize residuals and ROC curves to **interpret performance**.

---

| Problem Type | Key Metrics | When to Use |
|---------------|-------------|--------------|
| Classification | Accuracy, Precision, Recall, F1, ROC-AUC | Discrete labels |
| Regression | MAE, MSE, RMSE, R² | Continuous targets |

---
