## Model Evaluation Metrics – Classification & Regression (Deep Dive)

- This notebook focuses on a **detailed and practical understanding of evaluation metrics** used in **classification and regression problems**, with emphasis on **metric selection based on problem context**.


## Learning Objective

## Learning Objective

The goal of this notebook is to:
- Understand how different evaluation metrics work
- Learn **when** and **why** to choose a specific metric
- Analyze model performance beyond accuracy
- Connect evaluation metrics with real-world decision making

## Classification Metrics Recap

## Classification Metrics – Recap

Before diving deeper, we briefly revisit core classification metrics:

- **Precision** – Correctly predicted positive samples out of all predicted positives
- **Recall** – Correctly predicted positive samples out of all actual positives
- **F1 Score** – Harmonic mean of Precision and Recall

These metrics are especially important when class distribution is **imbalanced**.


## Cost of Misclassification (Key Concept)

## Cost of Misclassification (CoM)

Choosing an evaluation metric depends on the **cost of incorrect predictions**:

- **False Positive (FP)**: Predicting positive when it is actually negative
- **False Negative (FN)**: Predicting negative when it is actually positive

### Metric selection based on cost:
- FP cost high → Focus on **Precision**
- FN cost high → Focus on **Recall**
- Both FP & FN important → Use **F1 Score**

Metric choice should always align with the **objective of the ML problem**.

## Accuracy: When It Works & When It Fails

## Accuracy – When Is It Suitable?

Accuracy is a good metric **only when data is perfectly balanced**.

### Balanced Dataset:
- All classes have nearly equal proportions
- Accuracy reasonably represents model performance

### Imbalanced Dataset:
- Accuracy can be misleading
- Model may perform well on majority class but fail on minority class
- Accuracy alone is **not sufficient**


## Balanced Accuracy

## Balanced Accuracy

Balanced Accuracy is a modified accuracy metric used for **imbalanced datasets**.

It is defined as the average of:
- True Positive Rate (Recall)
- True Negative Rate (Specificity)

Balanced accuracy gives equal importance to all classes and provides a more reliable performance measure for imbalanced data.

## F1 Score and Fβ Score

## F1 Score and Fβ Score

### F1 Score
- Harmonic mean of Precision and Recall
- Suitable when both FP and FN are equally important

### Fβ Score
- Generalized version of F1 Score
- β controls the importance of Recall vs Precision

Interpretation:
- β < 1 → Precision is more important
- β = 1 → Precision and Recall equally important
- β > 1 → Recall is more important

The value of β depends on the **problem objective**.

## Baseline Model Concept

## Baseline Model Concept
A baseline model provides a **minimum benchmark** for model performance.

Examples:
- Classification → Predicting the majority class (mode)
- Regression → Predicting the mean value

Any trained model should **outperform the baseline model**.
If a model performs worse than baseline, it is considered **not useful**.

## Regression Metrics Introduction

## Regression Metrics

In regression problems, the target variable is **numeric**.
Model performance is evaluated using **error-based metrics**.

Instead of comparing against the mean, we compare:
- Predicted values (ŷ)
- Actual values (y)

All regression metrics are derived from **prediction errors**.

## Common Regression Metrics

## Common Regression Metrics

- **MAE (Mean Absolute Error)**  
  Average absolute difference between predicted and actual values

- **MSE (Mean Squared Error)**  
  Average of squared prediction errors

- **RMSE (Root Mean Squared Error)**  
  Square root of MSE, penalizes larger errors more

These metrics help quantify how far predictions are from actual values.


## R² and Explained Variance

## R² (R-Squared)

R² measures how well the regression model explains the variance in the data.

- Compares model performance against a baseline mean model
- Indicates the proportion of variance explained by the model

### Error Decomposition:
- Total Sum of Squares (TSS)
- Explained Sum of Squares (ESS)
- Residual Sum of Squares (RSS)

R² = ESS / TSS

## Key Takeaways

## Key Takeaways

- Metric selection depends on **problem context**, not convenience
- Accuracy is unreliable for imbalanced datasets
- Precision, Recall, and F-scores provide better insight
- Models must always outperform a baseline
- Regression metrics focus on prediction errors and explained variance

## Conclusion

## Conclusion
This notebook builds a **strong evaluation mindset**, emphasizing:
- Business-driven metric selection
- Proper evaluation for imbalanced data
- Understanding both classification and regression metrics

Revisiting metrics at a deeper level strengthens overall ML model reliability.