## Regression Metrics

* MAE 
* MSE
* RMSE
* R2 Score
* Adjusted R2 Score

## 1. MAE : Mean Absolute Error

MAE is a very simple metric which calculates the absolute difference between actual and predicted values. 

A smaller MAE indicates a better fit of the model to the data. An MAE of 0 means that the model makes perfect predictions (which is practically unlikely unless you’re overfitting your model).


![Modulus Graph](../images/MAE_formula.png)

The Mean absolute error represents the average of the absolute difference between the actual and predicted values in the dataset. It measures the average of the residuals in the dataset.

* Pros of MAE:

    - Easy to Understand and Calculate: MAE is simple to understand and calculate. It provides a straightforward way to represent average error.

    - Less Sensitive to Outliers: Since MAE doesn’t square the residuals, it is less sensitive to outliers compared to MSE and RMSE. This makes it a better metric when outliers are not of particular concern.

* Cons of MAE:

    - No Emphasis on Large Errors: While being less sensitive to outliers can be an advantage, it can also be a disadvantage when large errors are particularly undesirable.
    - Not Differentiable at Zero: Unlike MSE and RMSE, MAE isn’t differentiable at zero, which makes it less suitable for certain machine learning algorithms that rely on differentiation.    


## 2. MSE : Mean Squared Error

MSE is a measure of prediction error. Specifically, it quantifies the average squared difference between the actual and predicted values.

Mean Squared Error represents the average of the squared difference between the original and predicted values in the data set. It measures the variance of the residuals.


<img src="../images/MSE_formula.jpg" alt="MSE Formula" style="width: 500px; height: 200px;">

* Pros of MSE:
    - Emphasizes larger errors: By squaring the residuals, MSE places heavier weight on larger errors. This can be beneficial when larger errors are particularly undesirable.
    - Differentiability: The square function has derivatives, which makes MSE more tractable for optimization in machine learning algorithms.

* Cons of MSE:
    - Sensitive to outliers: Because MSE squares the residuals, it can be highly sensitive to outliers. A single outlier can potentially have a large effect on the MSE.
    - Scale-dependent: MSE is scale-dependent, which means you cannot compare the MSEs of different variables that are on different scales.
    - Not directly interpretable: The units of MSE are not the same as the units of the target variable. This makes it harder to interpret in a business setting.


### 3. RMSE : Root of Mean Square Error

RMSE is a good measure to use when you care more about penalizing large errors. By squaring the errors before averaging them, RMSE gives higher weight to large errors. This means that RMSE is most useful in contexts where large errors are particularly undesirable.

Like MSE, RMSE is commonly used in regression analysis and forecasting where the aim is often to minimize large errors. It’s also a handy metric to use when you want to explain the performance of a model in a more interpretable way, since its units are the same as the target variable.


![RMSE Formula](../images/RMSE_formula.png)


* Advantage : 
    - The graph of squred value is differentiable
    - It does not changes the unit to (Unit)^2

* Disadvantage :
    - It is not Robust to outliers



## 4. R2 Score 

Metrics for evaluating the goodness of fit of a regression model. 

The value of R² lies between 0 and 1. A value of 1 means the model perfectly predicts the dependent variable using the independent variable(s). A value of 0 means the model cannot predict the dependent variable at all using the independent variable(s).

![MSE Formula](../images/R2_Score_Formula.png)

Where:

* SSR (Sum of Squared Residuals) is the sum of the squares of the residuals. The residuals are the difference between the actual values of the dependent variable and the predicted values from the regression model.

* SST (Total Sum of Squares) is the total sum of squares, which is the sum of the differences between the actual values of the dependent variable and the mean value of the dependent variable, all squared.

An R-Squared of 100% (or 1) indicates that all changes in the dependent variable are completely explained by changes in the independent variable(s). In other words, our model perfectly fits the data.

On the other hand, an R-Squared of 0% indicates that the dependent variable cannot be predicted from the independent variable(s) at all.

* Disadvantages :
    - *It Only Measures Explained Variance* : 
    R-Squared does not tell us if the chosen model is good or bad, and it doesn’t convey the reliability of the model. It only quantifies the amount of variability in the target variable that’s accounted for by the predictors in the model.

    - *Sensitive to Unnecessary Features* : 
    R-Squared value will either stay the same or increase with the addition of more variables, even if those variables are only weakly associated with the response. This can lead to overfitting, especially when dealing with many features.
    - *Not Suitable for Comparing Different Datasets* : 
    R-Squared is not a good metric to compare model performances across different datasets. Because it measures the proportion of variance, its value can vary significantly with varying variances of different datasets.
    - *Lower Performance with Non-linear Data* : 
    While R-Squared is a good measure for linear regression, it doesn’t perform as well when dealing with non-linear data patterns.
    
### 5.  Adjusted R2 Score

Like R-Squared, it provides a measure of the proportion of the total variance in the dependent variable that is explained by the independent variables. However, it also takes into account the number of predictors used, adding a penalty for model complexity.

![MSE Formula](../images/Adjusted_R2_Score_Formula.png) 

Like R-Squared, Adjusted R-Squared is a decimal between 0 and 1, and is often expressed as a percentage. An Adjusted R-Squared of 100% indicates that all changes in the dependent variable are completely accounted for by the independent variables in the model. A score of 0% indicates the model explains none of the variability of the response data around its mean.

However, unlike R-Squared, Adjusted R-Squared takes into account the number of predictors in the model. For example, if you have two models with the same R-Squared but different numbers of predictors, the model with fewer predictors will have a higher Adjusted R-Squared, reflecting the fact that it achieved the same goodness of fit with fewer predictors.

![All Formula](../images/All_formula_list.png)


### Difference between R-Squared and Adjusted R-Squared
While R-Squared and Adjusted R-Squared both provide measures of how well the model fits the data, there’s a crucial difference between them: R-Squared assumes that every single variable explains the variation in the dependent variable, while Adjusted R-Squared adds a penalty for unnecessary complexity in the model.

The problem with R-Squared is that it tends to overestimate the performance of the model as more variables are added, even if those variables are only weakly associated with the response. This can lead to overfitting and misleadingly high R-Squared values.

Adjusted R-Squared overcomes this issue by decreasing the value when unnecessary predictors are included in the model. This makes Adjusted R-Squared a more robust measure for evaluating the overall quality of the regression model, especially when comparing models with a different number of predictors.


## Comparative Analysis

#### Comparison of the Different Metrics

Each metric that we have discussed so far — R-Squared, Adjusted R-Squared, MSE, RMSE, and MAE — offers a unique perspective on the performance of a regression model. Let’s compare them:

> R-Squared and Adjusted R-Squared: 

* Both of these metrics provide a measure of how much of the variability in the dependent variable is explained by the independent variables. 

* However, they don’t give us any information on the absolute size of the error. Adjusted R-Squared has an added benefit of taking into account the number of predictors in the model, which helps to avoid overfitting.

> MSE and RMSE: 
* Both MSE and RMSE give more weight to larger errors by squaring the residuals. They are useful when large errors are particularly undesirable. 
* The key difference between them is that RMSE is in the same units as the dependent variable, making it easier to interpret. 
* Both MSE and RMSE can be heavily influenced by outliers.

> MAE: 
* Unlike MSE and RMSE, MAE treats all errors equally by taking the absolute value of the residuals. It provides a clear representation of the average error and is less sensitive to outliers. 
* However, it doesn’t put as much emphasis on large errors.



> In general, no one metric is “the best” in all situations. The choice of which metric to use depends on the specific context, the presence of outliers, the importance of larger errors, and the needs of the stakeholders involved.