## Regression Metrices:

Regression metrics are statistical measures used to evaluate the performance of a regression model. They help quantify how well a model predicts continuous outcomes by comparing the predicted values with the actual target values. Each metric captures different aspects of model performance.

### 1. **Mean Absolute Error (MAE)**  
**Formula**:  
$$
MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
$$  
**Where**:  
- $ y_i $: Actual value for the $ i $-th observation  
- $ \hat{y}_i $: Predicted value for the $ i $-th observation  
- $ n $: Total number of observations  

**Explanation**:  
- MAE calculates the average absolute difference between predicted and actual values.  
- It measures the average magnitude of errors in predictions, irrespective of direction (positive or negative).  
- **Units**: The same as the target variable.  

**Characteristics**:  
- MAE is **robust to outliers** compared to other metrics like MSE because it doesn't square the errors.  
- It provides an intuitive sense of average error magnitude.

**Example**:  
If your target variable represents house prices in thousands, and MAE is 5, it means your model's predictions are, on average, off by $5,000.



### 2. **Mean Squared Error (MSE)**  
**Formula**:  
$$
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$  
**Explanation**:  
- MSE computes the average squared difference between predicted and actual values.  
- Squaring the errors penalizes larger errors more heavily than smaller ones.  

**Units**:  
- The square of the target variable’s unit (e.g., if the target is in thousands of dollars, MSE is in $ \text{thousands}^2 $).

**Characteristics**:  
- **Sensitive to outliers**: Larger errors contribute disproportionately due to squaring.  
- Useful for situations where larger errors are more significant than smaller ones.  

**Example**:  
If house prices have an MSE of 25 (in thousands squared), the large value indicates significant variance in errors.



### 3. **Root Mean Squared Error (RMSE)**  
**Formula**:  
$$
RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
$$  
**Explanation**:  
- RMSE is simply the square root of MSE.  
- It gives the error in the same unit as the target variable, making it more interpretable than MSE.  

**Units**:  
- Same as the target variable.  

**Characteristics**:  
- RMSE is **sensitive to outliers**, like MSE, due to squaring before averaging.  
- Offers a balance between penalizing larger errors and interpretability.  

**Example**:  
If house prices have an RMSE of 5 (in thousands), it implies the typical prediction error is $5,000.

### Comparison Between MAE, MSE, and RMSE:
| **Metric** | **Key Feature** | **Sensitivity to Outliers** | **Units** |
|------------|-----------------|----------------------------|-----------|
| MAE        | Average absolute error | Less sensitive           | Same as target |
| MSE        | Average squared error  | Highly sensitive         | Squared of target |
| RMSE       | Square root of MSE     | Highly sensitive         | Same as target |



### When to Use Which Metric:
1. **MAE**: Use when you want to minimize the average magnitude of errors, regardless of their direction or size. It’s best when the impact of outliers isn’t critical.  
2. **MSE**: Use when you want to emphasize larger errors in your model evaluation. Suitable for datasets where big errors are significantly worse.  
3. **RMSE**: A compromise between interpretability (units) and penalizing larger errors. Widely used in practice for general regression problems.

---

## R2 Score and Adjusted R2 Score:

The $ R^2 $ score and Adjusted $ R^2 $ score are metrics used to evaluate the performance of a regression model. They measure how well the independent variables (features) explain the variability of the dependent variable (target). These metrics are particularly useful for understanding the fit of a model in relation to the data.



### **1. $ R^2 $ Score (Coefficient of Determination)**  

#### **Formula**:  
$$
R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
$$  

Where:  
- $ SS_{res} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $: Residual sum of squares (unexplained variance).  
- $ SS_{tot} = \sum_{i=1}^{n} (y_i - \bar{y})^2 $: Total sum of squares (total variance in the target).  
- $ y_i $: Actual target value for the $ i $-th observation.  
- $ \hat{y}_i $: Predicted target value for the $ i $-th observation.  
- $ \bar{y} $: Mean of the actual target values.  
- $ n $: Number of observations.



#### **Interpretation**:  
- $ R^2 $ represents the proportion of the variance in the target variable ($ y $) that is explained by the features ($ X $).  
- Its value ranges from $ 0 $ to $ 1 $:  
  - $ R^2 = 0 $: The model explains none of the variability; it is as good as guessing the mean ($ \bar{y} $).  
  - $ R^2 = 1 $: The model explains all the variability perfectly.  
  - $ R^2 < 0 $: Indicates that the model performs worse than a simple mean-based prediction.



#### **Advantages of $ R^2 $**:  
- Provides a quick way to assess how well the regression model fits the data.  
- Easy to interpret: A higher $ R^2 $ means better model performance.

#### **Disadvantages**:  
- $ R^2 $ always increases with the addition of more features, even if those features don't improve the model's predictive power.  
- Doesn't account for the number of features used in the model, leading to potential overfitting.



### **2. Adjusted $ R^2 $ Score**  

To address the issue of $ R^2 $'s sensitivity to the number of features, the Adjusted $ R^2 $ score was introduced.

#### **Formula**:  
$$
\text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - p - 1} \right)
$$

Where:  
- $ n $: Total number of observations (data points).  
- $ p $: Number of independent variables (features) in the model.  
- $ R^2 $: Standard $ R^2 $ score.



#### **Interpretation**:  
- Adjusted $ R^2 $ adjusts the $ R^2 $ score based on the number of features.  
- It penalizes adding features that do not improve the model and rewards features that contribute significantly to the model's performance.  
- Unlike $ R^2 $, Adjusted $ R^2 $ can decrease if unnecessary features are added to the model.

#### **Key Properties**:  
- When $ p = 0 $ (no predictors), Adjusted $ R^2 = R^2 = 0 $.  
- If a new feature improves the model (reduces residual error), Adjusted $ R^2 $ increases.  
- If a new feature doesn't improve the model, Adjusted $ R^2 $ decreases.

### **Comparison Between $ R^2 $ and Adjusted $ R^2 $:**

| Metric          | **Sensitivity to Feature Count** | **Interpretation**                | **Use Case**                          |
|------------------|----------------------------------|------------------------------------|----------------------------------------|
| $ R^2 $        | Increases with more features    | Total variance explained           | Quick assessment of model fit.         |
| Adjusted $ R^2 $ | Penalizes unnecessary features | Variance explained with penalty    | For comparing models with different features. |


### **Example for $ R^2 $ and Adjusted $ R^2 $:**

1. **Scenario**: Predicting house prices with 10 features:  
   - $ R^2 = 0.85 $ (85% of the variance in prices is explained by the model).  
   - Adding a feature that has no correlation with house prices increases $ R^2 $ slightly to 0.86.  

2. **Adjusted $ R^2 $**:  
   - If the new feature doesn’t improve predictions significantly, Adjusted $ R^2 $ might decrease (e.g., from 0.83 to 0.82), indicating the model is overfitting.



### **When to Use Which?**

1. Use **$ R^2 $**:  
   - For a single model with a fixed set of features.  
   - When the focus is solely on variance explanation.  

2. Use **Adjusted $ R^2 $**:  
   - To compare models with different numbers of features.  
   - When you want to ensure that only meaningful features are included.  

---