# ðŸ“˜ ML Learning Journey â€” Model Evaluation & Selection

## RÂ², Adjusted RÂ², Generalization & Model Selection

---

## 1. Classification vs Regression Metrics (Quick Context)

### Classification Metrics
- **Accuracy** works well for **balanced datasets**
- For **imbalanced datasets**, accuracy is misleading

Better metrics for imbalanced data:
- Precision
- Recall
- F1-score
- Balanced Accuracy

**F1 Score**
- Harmonic mean of Precision and Recall
- Penalizes extreme values
- Treats FP and FN equally
- Useful when both error types matter

---

## 2. Regression Metrics Overview

Regression problems deal with **continuous target variables**.

### Common Regression Metrics
- **MAE (Mean Absolute Error)**  
  Average absolute difference between actual and predicted values

- **MSE (Mean Squared Error)**  
  Penalizes large errors more heavily

- **RMSE (Root Mean Squared Error)**  
  Same units as target variable, more interpretable

---

## 3. Why RÂ² (R-Squared)?

Error-based metrics alone do not explain **how well a model explains data variability**.

**RÂ² measures the proportion of variance in the target variable explained by the model.**

### Interpretation
- RÂ² = 1 â†’ Perfect model
- RÂ² = 0 â†’ Model explains nothing
- RÂ² < 0 â†’ Worse than mean model

---

## 4. Baseline: Mean Model

Before evaluating a regression model, consider a **mean model**:

\[
\hat{y} = \bar{y}
\]

This acts as a baseline for comparison.

---

## 5. Error Decomposition

### Total Sum of Squares (TSS)
\[
TSS = \sum (y_i - \bar{y})^2
\]

### Residual Sum of Squares (RSS)
\[
RSS = \sum (y_i - \hat{y}_i)^2
\]

### Explained Variance
\[
Explained = TSS - RSS
\]

---

## 6. RÂ² Formula

\[
R^2 = 1 - \frac{RSS}{TSS}
\]

### Meaning
- Measures fraction of variance explained by the model
- Example:  
  RÂ² = 0.8 â†’ 80% of variance explained

---

## 7. Limitation of RÂ²

- RÂ² **always increases** when more features are added
- Even if the new features are not useful
- Leads to **overfitting**

---

## 8. Adjusted RÂ²

Adjusted RÂ² penalizes unnecessary features.

### Formula
\[
\text{Adjusted } R^2 = 1 - (1 - R^2)\frac{n - 1}{n - k - 1}
\]

Where:
- `n` = number of observations
- `k` = number of features

---

## 9. Why Adjusted RÂ² is Better

- Increases only if new feature adds real value
- Decreases if feature adds noise
- Useful for **model comparison**

**Rule of Thumb**
- Use RÂ² for explanation
- Use Adjusted RÂ² for model selection

---

## 10. When Can RÂ² Be Negative?

- When model performs worse than the mean model
- Often due to:
  - Poor feature selection
  - Wrong model choice
  - Non-linear data with linear model

---

## 11. Model Evaluation Flow



Features (X)  
â†“  
Model  
â†“  
Predictions (Å·)  
â†“  
Compare Å· vs y  
â†“  
Metrics (RÂ², Adj RÂ², MAE, RMSE)  


---

## 12. Training vs Model

### Training Pipeline

#### Data â†’ ML Algorithm â†’ Model


- Algorithm + hyperparameters define the model
- Different hyperparameters â†’ different models

---

## 13. Hyperparameter Tuning

Goal:
- Find optimal hyperparameters
- Maximize performance
- Avoid overfitting

Example (KNN):
- k
- distance metric
- weights

---

## 14. Memorization vs Generalization

- **Memorization**
  - Very good training performance
  - Poor unseen data performance

- **Generalization**
  - Good performance on both train and unseen data
  - True goal of Machine Learning

---

## 15. Testing Generalization

### Cross Validation
- Split data into multiple folds
- Evaluate on unseen data

### Learning Curves
- Compare training vs validation error
- Detect overfitting and underfitting

---

## 16. Statistical Perspective

Machine Learning follows **inferential thinking**:

1. Draw a representative sample
2. Analyze the sample
3. Extract patterns
4. Apply findings to population with confidence

---

## âœ… Key Takeaways

- RÂ² explains variance
- Adjusted RÂ² helps in feature selection
- More features â‰  better model
- Model selection is as important as accuracy
- Generalization is the ultimate goal
