# 📊 Performance Metrics in Regression Model

Performance metrics help assess how well a machine learning model performs. The choice of metric depends on the **problem type**—whether it's classification, regression, or clustering. Below is a breakdown of key metrics used in different types of problems, with **formulas**, **definitions**, and **advantages/disadvantages**.

---

## 🔹 **2. Regression Metrics**
Used when the output is **continuous** (e.g., predicting temperature, sales, house prices).

### a. **Mean Absolute Error (MAE)**
- **Definition**: Average of absolute errors  
  $$
  MAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|
  $$
  - $n$ = number of observations  
  - $y_i$ = actual value  
  - $\hat{y}_i$ = predicted value
- **Advantages**:
  - Easy to interpret.
  - **Same unit as target**.
- **Disadvantages**:
  - Doesn't penalize large errors.
- **Best Use Case**: When you want a **simple error metric**.

---

### b. **Mean Squared Error (MSE)**
- **Definition**: Average of squared errors  
  $$
  MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
  $$
  - Same as above
- **Advantages**:
  - Penalizes **large errors** more.
- **Disadvantages**:
  - Less interpretable due to squaring.
- **Best Use Case**: When you want to **penalize large deviations**.

---

### c. **Root Mean Squared Error (RMSE)**
- **Definition**: Square root of MSE  
  $$
  RMSE = \sqrt{MSE}
  $$
- **Advantages**:
  - **Same unit as target**.
  - Balances error size.
- **Disadvantages**:
  - Sensitive to **outliers**.
- **Best Use Case**: For general-purpose **regression evaluation**.

---

### d. **R² Score (Coefficient of Determination)**
- **Definition**: Proportion of variance explained by the model  
  $$
  R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
  $$
  - $SS_{res} = \sum_{i=1}^n (y_i - \hat{y}_i)^2$ → residual sum of squares  
  - $SS_{tot} = \sum_{i=1}^n (y_i - \bar{y})^2$ → total sum of squares  
  - $\bar{y}$ = mean of actual values
- **Advantages**:
  - Measures how well the model explains the data.
- **Disadvantages**:
  - Can be misleading for **nonlinear** models or **overfitted** models.
- **Best Use Case**: When evaluating **goodness of fit**.

---

### e. **Adjusted R² Score**
- **Definition**: R² adjusted for the number of predictors  
  $$
  \text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - k - 1} \right)
  $$
  - $n$ = number of observations  
  - $k$ = number of independent predictors
- **Advantages**:
  - Penalizes irrelevant features.
  - Adjusts for **model complexity**.
- **Disadvantages**:
  - Slightly more complex to compute.
- **Best Use Case**: Comparing models with **multiple features**.

---

## 🔹 **3. Clustering Metrics**
Used in **unsupervised learning** (e.g., grouping customers or documents).

### a. **Silhouette Score**
- **Definition**: Measures how similar a point is to its own cluster compared to other clusters  
  $$
  \text{Silhouette} \in [-1, 1]
  $$
- **Advantages**:
  - Intuitive: closer to 1 is better.
- **Disadvantages**:
  - Computationally expensive for large datasets.
- **Best Use Case**: Evaluating **cohesion** and **separation** in clustering.

---

### b. **Davies-Bouldin Index**
- **Definition**: Measures average similarity between each cluster and the most similar one.
- **Advantages**:
  - **Lower scores** indicate better clustering.
- **Disadvantages**:
  - Doesn't handle **irregular shapes** of clusters well.
- **Best Use Case**: For comparing clustering **tightness and separation**.

---

### c. **Adjusted Rand Index (ARI)**
- **Definition**: Compares predicted clusters against ground truth.
- **Advantages**:
  - **Accounts for chance** in clustering.
  - Corrects for **random clustering**.
- **Disadvantages**:
  - Requires **true labels**.
- **Best Use Case**: When you have **ground truth labels** for clustering.

---

## 🔹 Summary Table

| Problem Type     | Metric               | Formula                                          | Advantages                                    | Disadvantages                              | Best Use Case                             |
|------------------|----------------------|--------------------------------------------------|-----------------------------------------------|--------------------------------------------|--------------------------------------------|
| Classification   | Accuracy             | $ \frac{TP + TN}{TP + TN + FP + FN} $            | Easy to interpret                             | Misleading on imbalanced data              | Balanced datasets                         |
|                  | Precision            | $ \frac{TP}{TP + FP} $                           | Focus on correct positives                    | Ignores false negatives                    | Minimize false positives (e.g., spam)     |
|                  | Recall               | $ \frac{TP}{TP + FN} $                           | Captures all actual positives                 | Ignores false positives                    | Minimize false negatives (e.g., medical)  |
|                  | F1-Score             | $ 2 \times \frac{Precision \times Recall}{Precision + Recall} $ | Balances precision & recall | Hard to interpret intuitively              | Imbalanced datasets                       |
|                  | ROC-AUC              | -                                                | Evaluates without threshold                  | Misleading on imbalanced data              | Probabilistic classifiers                 |
| Regression       | MAE                  | $ \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i| $ | Simple and interpretable                      | Doesn’t penalize large errors              | Simple error analysis                     |
|                  | MSE                  | $ \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 $ | Penalizes large errors                        | Less interpretable                         | Model optimization                        |
|                  | RMSE                 | $ \sqrt{MSE} $                                   | Same unit as target, penalizes large errors   | Sensitive to outliers                      | General-purpose regression metric         |
|                  | R² Score             | $ 1 - \frac{SS_{res}}{SS_{tot}} $                | Measures explained variance                   | Overestimates performance                  | Fit evaluation                            |
|                  | Adjusted R² Score    | $ 1 - \left( \frac{(1 - R^2)(n - 1)}{n - k - 1} \right) $ | Penalizes irrelevant features                 | Slightly complex                           | Feature selection and model comparison    |
| Clustering       | Silhouette Score     | $ \in [-1, 1] $                                  | No labels required, easy to interpret         | Slow for large datasets                    | Evaluating cohesion/separation            |
|                  | Davies-Bouldin Index | -                                                | Simple, efficient                             | May fail on irregular clusters             | Model selection in clustering             |
|                  | Adjusted Rand Index  | -                                                | Accounts for random labeling                  | Needs true labels                          | Comparing to true labels in clustering    |

--- 
---
---


Absolutely, Shreya! Here's a detailed and well-structured 📘 **Regression Metrics** note in the **same style** as your classification one, with ✅ formulas, 🔍 use-cases, 📈 advantages/disadvantages, and 🧠 best-fit scenarios.

---

# 📉 Performance Metrics in Regression Models

Regression metrics are used when the model’s **output is continuous** (e.g., house price, temperature, stock value). These metrics help measure **how far the model's predictions are from actual values**.

---

### 🔹 Common Use-Cases of Regression:
- Predicting house prices
- Forecasting sales or demand
- Estimating crop yields
- Predicting blood sugar levels

---

### 📊 Metric Comparison Table

| Metric               | Best Use Case                              | Range            | Formula Complexity |
|----------------------|--------------------------------------------|------------------|---------------------|
| MAE                 | Average error size                          | 0 to ∞           | Low                 |
| MSE                 | Penalises large errors                      | 0 to ∞           | Low                 |
| RMSE                | Same unit as target, penalises large errors | 0 to ∞           | Medium              |
| R² (R-squared)      | Proportion of variance explained            | -∞ to 1          | Medium              |
| Adjusted R²         | Corrects R² for multiple features           | -∞ to 1          | Medium              |
| MAPE                | Error relative to actual value (%)          | 0 to ∞           | Medium              |
| MSLE                | Penalises under-prediction less             | 0 to ∞           | Medium              |

---

## 🔹 Regression Metrics Summary Table

| Metric               | Formula                                                                                   | Advantages                                          | Disadvantages                                         | Best Use Case                                               |
|----------------------|-------------------------------------------------------------------------------------------|-----------------------------------------------------|-------------------------------------------------------|-------------------------------------------------------------|
| **MAE**              | $ \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $                                          | Easy to interpret, equal weight to all errors        | Doesn’t penalise large errors                          | General regression tasks                                     |
| **MSE**              | $ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $                                         | Emphasises large errors                             | Sensitive to outliers                                 | When large errors are more problematic                      |
| **RMSE**             | $ \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} $                                 | Same units as actual values                         | Same sensitivity to outliers as MSE                   | House price, medical dosage prediction                      |
| **R² Score**         | $ 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} $                           | Explains variance, intuitive                        | Can be misleading with few data points                | Model explanation and feature evaluation                    |
| **Adjusted R²**      | $ 1 - (1 - R^2)\frac{n - 1}{n - p - 1} $                                                  | Adjusts R² for number of predictors                 | Complex, can be < R²                                  | Multiple linear regression                                  |
| **MAPE**             | $ \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| $             | Easy to interpret as percentage                     | Undefined when $y_i = 0$                              | Business forecasts, demand prediction                       |
| **MSLE**             | $ \frac{1}{n} \sum_{i=1}^{n} \left( \log(1 + y_i) - \log(1 + \hat{y}_i) \right)^2 $       | Better for lower values                             | Can't handle negative values                          | Growth rate prediction, exponential trends                  |

---

### a. **MAE (Mean Absolute Error)**

- **Definition**:  
  Measures the average magnitude of errors in a set of predictions, without considering direction (positive/negative).

- **Formula**:  
  $$
  \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
  $$

- **Pros**:
  - Intuitive and simple.
  - Equal weight to all errors.

- **Cons**:
  - Doesn’t highlight large errors.

- **Use Case**:  
  Basic regression tasks like estimating sales where equal weight to all mistakes is acceptable.

---

### b. **MSE (Mean Squared Error)**

- **Definition**:  
  Squared average of the prediction errors. Emphasizes larger errors more due to squaring.

- **Formula**:  
  $$
  \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  $$

- **Pros**:
  - Penalizes large errors → useful when big mistakes are costly.

- **Cons**:
  - Not in the same unit as the target.
  - Sensitive to outliers.

- **Use Case**:  
  Model tuning, where penalising large errors (like in energy consumption) is key.

---

### c. **RMSE (Root Mean Squared Error)**

- **Definition**:  
  Square root of MSE — brings the error to the same unit as the original output.

- **Formula**:  
  $$
  \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
  $$

- **Pros**:
  - Interpretable in same units as the target.
  - Highlights large errors.

- **Cons**:
  - Still sensitive to outliers.

- **Use Case**:  
  Projects involving physical quantities (e.g., predicting house area in square feet).

---

### d. **R² (Coefficient of Determination)**

- **Definition**:  
  Proportion of the variance in the dependent variable that is predictable from the independent variable(s).

- **Formula**:  
  $$
  R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}
  $$

- **Pros**:
  - Indicates how well the model fits.
  - Common and interpretable.

- **Cons**:
  - Can be misleading with few observations.
  - Doesn’t indicate error size.

- **Use Case**:  
  When explaining variance is crucial (e.g., effect of education on income).

---

### e. **Adjusted R²**

- **Definition**:  
  Adjusts R² to account for number of predictors. Prevents overfitting by penalising unnecessary features.

- **Formula**:  
  $$
  \text{Adjusted } R^2 = 1 - (1 - R^2) \cdot \frac{n - 1}{n - p - 1}
  $$

- **Pros**:
  - Reliable for models with many features.

- **Cons**:
  - Complex to compute and interpret.

- **Use Case**:  
  Multivariate regression models (e.g., predicting house price from multiple features).

---

### f. **MAPE (Mean Absolute Percentage Error)**

- **Definition**:  
  Measures average absolute percent error between actual and predicted values.

- **Formula**:  
  $$
  \text{MAPE} = \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|
  $$

- **Pros**:
  - Expresses error as a percentage → easy to interpret.

- **Cons**:
  - Undefined when actual value is 0.
  - Biased toward low values.

- **Use Case**:  
  Forecasting revenue or sales in business analytics.

---

### g. **MSLE (Mean Squared Logarithmic Error)**

- **Definition**:  
  Penalizes under-predictions less than over-predictions. Suitable when the target values vary over several orders of magnitude.

- **Formula**:  
  $$
  \text{MSLE} = \frac{1}{n} \sum_{i=1}^{n} \left( \log(1 + y_i) - \log(1 + \hat{y}_i) \right)^2
  $$

- **Pros**:
  - Better for comparing relative error.
  - Less sensitive to large differences in higher ranges.

- **Cons**:
  - Cannot be used with negative values.

- **Use Case**:  
  Growth modeling, e.g., predicting user adoption over time.

---

Would you like me to create **visuals** like the confusion matrix or any **summary flashcards** for revision?

---
---

Certainly! Here's a detailed overview of **Regression** types, including the **summary tables** and key metrics.

---

## **🔸 Types of Regression**

### 1. **Linear Regression**

Linear regression is used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data.

- **Formula**: $ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n $
  
- **Best Use Case**: When the dependent variable has a linear relationship with the independent variables.

- **Advantages**:
  - Simple to implement.
  - Easy to interpret coefficients.
  - Works well for small to medium datasets.

- **Disadvantages**:
  - Assumes a linear relationship.
  - Sensitive to outliers.

---

### 2. **Multiple Linear Regression**

Multiple linear regression extends linear regression to predict a dependent variable based on two or more independent variables.

- **Formula**: $ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n $

- **Best Use Case**: Predicting a single dependent variable when there are multiple predictors.

- **Advantages**:
  - Can handle multiple predictors.
  - More accurate than simple linear regression with multiple influencing factors.

- **Disadvantages**:
  - Prone to overfitting.
  - Needs large datasets to work well.

---

### 3. **Polynomial Regression**

Polynomial regression is a form of regression analysis where the relationship between the independent and dependent variables is modeled as an nth degree polynomial.

- **Formula**: $ Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + ... + \beta_n X^n $

- **Best Use Case**: When data shows a non-linear relationship that can be approximated by a polynomial.

- **Advantages**:
  - Can model more complex relationships than linear regression.
  - Flexible for curved data patterns.

- **Disadvantages**:
  - High degree polynomials can lead to overfitting.
  - Complex and difficult to interpret.

---

### 4. **Ridge Regression (L2 Regularization)**

Ridge regression adds a penalty term (L2 regularization) to the least squares objective function to prevent overfitting by shrinking the coefficients.

- **Formula**: $ \hat{\beta} = \text{argmin} \left( \sum_{i=1}^{n} (y_i - \hat{y_i})^2 + \lambda \sum_{j=1}^{m} \beta_j^2 \right) $

- **Best Use Case**: When dealing with multicollinearity or high-dimensional data.

- **Advantages**:
  - Helps reduce model complexity.
  - Works well when predictors are highly correlated.

- **Disadvantages**:
  - May underfit if lambda (penalty term) is too large.

---

### 5. **Lasso Regression (L1 Regularization)**

Lasso regression is similar to ridge regression but uses L1 regularization, which can force some coefficient estimates to zero, effectively performing feature selection.

- **Formula**: $ \hat{\beta} = \text{argmin} \left( \sum_{i=1}^{n} (y_i - \hat{y_i})^2 + \lambda \sum_{j=1}^{m} |\beta_j| \right) $

- **Best Use Case**: When there is a need for feature selection or dealing with sparse data.

- **Advantages**:
  - Helps with feature selection.
  - Can handle high-dimensional datasets.

- **Disadvantages**:
  - Prone to underfitting.
  - Difficult to choose the right value for lambda.

---

### 6. **Elastic Net Regression**

Elastic Net is a combination of Lasso and Ridge regression, which balances L1 and L2 regularization. It is particularly useful when there are correlations between features.

- **Formula**: $ \hat{\beta} = \text{argmin} \left( \sum_{i=1}^{n} (y_i - \hat{y_i})^2 + \lambda \left( \alpha \sum_{j=1}^{m} |\beta_j| + \frac{1 - \alpha}{2} \sum_{j=1}^{m} \beta_j^2 \right) \right) $

- **Best Use Case**: When dealing with highly correlated features or a mix of regularization needs.

- **Advantages**:
  - Combines the benefits of both Ridge and Lasso.
  - Flexible and robust for many types of regression problems.

- **Disadvantages**:
  - Needs careful tuning of both regularization parameters.

---

## **📊 Regression Metrics Summary Table**

| Metric                | Formula                                                                                             | Advantages                                                | Disadvantages                                               | Best Use Case                                               |
|-----------------------|-----------------------------------------------------------------------------------------------------|-----------------------------------------------------------|-------------------------------------------------------------|-------------------------------------------------------------|
| **Mean Absolute Error (MAE)** | $ \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y_i}| $                                           | Simple, easy to interpret                                  | Sensitive to outliers                                        | General regression tasks                                    |
| **Mean Squared Error (MSE)** | $ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2 $                                                | Penalizes larger errors more than MAE                     | Sensitive to outliers                                        | Tasks with a focus on minimizing large errors                |
| **Root Mean Squared Error (RMSE)** | $ \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y_i})^2 } $                                   | Easy to interpret (in same units as target)               | Sensitive to outliers                                        | Tasks where large errors should be penalized more            |
| **R-Squared (R²)**    | $ 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y_i})^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} $                | Measures goodness of fit, easy to interpret               | Doesn’t work well with non-linear models                    | Evaluating how well a model fits the data                    |
| **Adjusted R-Squared**| $ 1 - (1 - R^2) \cdot \frac{n-1}{n-p-1} $                                                           | Adjusts for the number of predictors, handles overfitting | Still sensitive to outliers                                  | Linear regression with multiple predictors                   |
| **Mean Absolute Percentage Error (MAPE)** | $ \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y_i}}{y_i} \right| $                 | Interpretable percentage errors                           | Can be misleading if actual values are near zero              | Forecasting, time series                                   |
| **Huber Loss**        | $ \frac{1}{n} \sum_{i=1}^{n} \delta \cdot \left| y_i - \hat{y_i} \right| \text{ for small errors} $ | Robust to outliers, combines MAE and MSE                   | Harder to interpret                                          | Regression tasks with outliers                               |

---

## **📊 Regression Metrics Comparison Table**

| Metric                | Best Use Case                             | Range       | Formula Complexity |
|-----------------------|-------------------------------------------|-------------|---------------------|
| **Mean Absolute Error (MAE)** | Tasks with smaller errors important    | 0 to ∞      | Low                 |
| **Mean Squared Error (MSE)** | Minimizing large errors                 | 0 to ∞      | Low                 |
| **Root Mean Squared Error (RMSE)** | Tasks where larger errors should be penalized | 0 to ∞      | Low                 |
| **R-Squared (R²)**    | Evaluating goodness of fit                | 0 to 1      | Low                 |
| **Adjusted R-Squared**| Model evaluation with multiple predictors | 0 to 1      | Medium              |
| **Mean Absolute Percentage Error (MAPE)** | Time series, forecasting | 0 to ∞      | Medium              |
| **Huber Loss**        | Regression with outliers                  | 0 to ∞      | Medium              |

---

### 📈 **Clustering Metrics Summary Table**

The following table summarizes key clustering evaluation metrics (for comparison between clusters) which you might find useful when selecting the best regression model based on the task.

---

Let me know if you'd like further clarification on any of the metrics or regression types!