# Introduction
https://www.analyticsvidhya.com/blog/2021/05/know-the-best-evaluation-metrics-for-your-regression-model/  
Regression is also one type of supervised Machine learning and in this tutorial, we will discuss various metrics for evaluating regression Models and How to implement them using the sci-kit-learn library. 


# Table of Contents
- Regression 
- Why we require Evaluation Metrics  
- Mean Absolute Error(MAE)  
- Mean Squared Error(MSE)  
- RMSE  
- RMSLE  
- R squared  
- Adjusted R Squares  

## Regression
Regression is a type of Machine learning which helps in finding the relationship between independent and dependent variable.  

In simple words, Regression can be defined as a Machine learning problem where we have to predict discrete values like price, Rating, Fees, etc.

## Why We require Evaluation Metrics?
Most beginners and practitioners most of the time do not bother about the model performance. The talk is about building a well-generalized model, Machine learning model cannot have 100 per cent efficiency otherwise the model is known as a biased model. which further includes the concept of overfitting and underfitting.  

It is necessary to obtain the accuracy on training data, But it is also important to get a genuine and approximate result on unseen data otherwise Model is of no use.   

So to build and deploy a generalized model we require to Evaluate the model on different metrics which helps us to better optimize the performance, fine-tune it, and obtain a better result.  

If one metric is perfect, there is no need for multiple metrics. To understand the benefits and disadvantages of Evaluation metrics because different evaluation metric fits on a different set of a dataset.  

## Dataset
For demonstrating each evaluation metric using the sci-kit-learn library we will use the placement dataset which is a simple linear dataset that looks something like this.
![Screenshot%202021-06-11%20160813.png](attachment:Screenshot%202021-06-11%20160813.png)
Now I am applying linear regression on the particular dataset and after that, we will study each evaluation metric and check it on our Linear Regression model.  

In [1]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train,y_train)
y_pred = lr.predict(X_test)

ModuleNotFoundError: No module named 'sklearn'

# METRICS

## Mean Absolute Error(MAE)
<span class="mark">MAE is a very simple metric which calculates the absolute difference between actual and predicted values.</span>  

To better understand, let’s take an example you have input data and output data and use Linear Regression, which draws a best-fit line.  

Now you have <span class="mark">to find the MAE of your model which is basically a mistake made by the model known as an error. Now find the difference between the actual value and predicted value that is an absolute error but we have to find the mean absolute of the complete dataset.</span>  

so, sum all the errors and divide them by a total number of observations And this is MAE. And we aim to get a minimum MAE because this is a loss.  
![Screenshot%202021-06-11%20161109.png](attachment:Screenshot%202021-06-11%20161109.png)
**Advantages of MAE** 

- The MAE you get is in the same unit as the output variable.
- It is most Robust to outliers.

**Disadvantages of MAE**

- <span class="mark">The graph of MAE is not differentiable so we have to apply various optimizers like Gradient descent which can be differentiable.</span>  

**from sklearn.metrics import mean_absolute_error  
print("MAE",mean_absolute_error(y_test,y_pred))**   
Now to overcome the disadvantage of MAE next metric came as MSE.

## Mean Squared Error(MSE)
MSE is a most used and very simple metric with a little bit of change in mean absolute error. Mean squared error states that finding the squared difference between actual and predicted value.  

So, above we are finding the absolute difference and here we are finding the squared difference.  

What actually the MSE represents? It represents the squared distance between actual and predicted values. we perform squared to avoid the cancellation of negative terms and it is the benefit of MSE.  
![Screenshot%202021-06-11%20161613.png](attachment:Screenshot%202021-06-11%20161613.png)
**Advantages of MSE** 

- <span class="mark">The graph of MSE is differentiable, so you can easily use it as a loss function.</span>

**Disadvantages of MSE**  

- The value you get after calculating MSE is a squared unit of output. for example, the output variable is in meter(m) then after calculating MSE the output we get is in meter squared.  
- If you have outliers in the dataset then it penalizes the outliers most and the calculated MSE is bigger. So, in short, <span class="mark">It is not Robust to outliers which were an advantage in MAE.</span>

**from sklearn.metrics import mean_squared_error   
print("MSE",mean_squared_error(y_test,y_pred))**

## Root Mean Squared Error(RMSE)
RMSE is the most popular evaluation metric used in regression problems. It follows an assumption that error are unbiased and follow a normal distribution. Here are the key points to consider on RMSE:  
 
- The power of ‘square root’  empowers this metric to show large number deviations.
- The ‘squared’ nature of this metric helps to deliver more robust results which prevents cancelling the positive and negative error values. In other words, this metric aptly displays the plausible magnitude of error term.
- It avoids the use of absolute error values which is highly undesirable in mathematical calculations.
- When we have more samples, reconstructing the error distribution using RMSE is considered to be more reliable.
- <span class="mark">RMSE is highly affected by outlier values. Hence, make sure you’ve removed outliers from your data set prior to using this metric.</span>
- As compared to mean absolute error, RMSE gives higher weightage and punishes large errors.
RMSE metric is given by:  
![Screenshot%202021-06-11%20161948.png](attachment:Screenshot%202021-06-11%20161948.png)

**Advantages of RMSE**
- The output value you get is in the same unit as the required output variable which makes interpretation of loss easy.

**Disadvantages of RMSE**
- It is not that robust to outliers as compared to MAE.
- for performing RMSE we have to NumPy NumPy square root function over MSE.

**print("RMSE",np.sqrt(mean_squared_error(y_test,y_pred)))** 
Most of the time people use RMSE as an evaluation metric and mostly when you are working with deep learning techniques the most preferred metric is RMSE.  



## Root Mean Squared Log Error(RMSLE)
Taking the log of the RMSE metric slows down the scale of error. The metric is very helpful when you are developing a model without calling the inputs. In that case, the output will vary on a large scale.  

To control this situation of RMSE we take the log of calculated RMSE error and resultant we get as RMSLE.  

To perform RMSLE we have to use the NumPy log function over RMSE.   

**print("RMSE",np.log(np.sqrt(mean_squared_error(y_test,y_pred))))**  
It is a very simple metric that is used by most of the datasets hosted for Machine Learning competitions.   

In case of Root mean squared logarithmic error, we take the log of the predictions and actual values. So basically, what changes are the variance that we are measuring. RMSLE is usually used when we don’t want to penalize huge differences in the predicted and the actual values when both predicted and true values are huge numbers.  
![Screenshot%202021-06-11%20162259.png](attachment:Screenshot%202021-06-11%20162259.png)
- If both predicted and actual values are small: RMSE and RMSLE are same.
- If either predicted or the actual value is big: RMSE > RMSLE
- If both predicted and actual values are big: RMSE > RMSLE (RMSLE becomes almost negligible)

## R Squared (R2)
<span class="mark">R2 score is a metric that tells the performance of your model,</span> not the loss in an absolute sense that how many wells did your model perform.  

In contrast, MAE and MSE depend on the context as we have seen whereas the R2 score is independent of context.  
  
So, with help of R squared we have a baseline model to compare a model which none of the other metrics provides. The same we have in classification problems which we call a threshold which is fixed at 0.5. So basically R2 squared calculates how must regression line is better than a mean line.  

Hence, R2 squared is also known as Coefficient of Determination or sometimes also known as Goodness of fit  
![Screenshot%202021-06-11%20164021.png](attachment:Screenshot%202021-06-11%20164021.png)
Now, how will you interpret the R2 score? suppose If the R2 score is zero then the above regression line by mean line is equal means 1 so 1-1 is zero. So, in this case, both lines are overlapping means model performance is worst, It is not capable to take advantage of the output column.  

Now the second case is when the R2 score is 1, it means when the division term is zero and it will happen when the regression line does not make any mistake, it is perfect. In the real world, it is not possible.  

So we can conclude that as our regression line moves towards perfection, R2 score move towards one. And the model performance improves.  

The normal case is when the R2 score is between zero and one like 0.8 which means your model is capable to explain 80 per cent of the variance of data.  

**from sklearn.metrics import r2_score  
r2 = r2_score(y_test,y_pred)  
print(r2)**


We learned that when the RMSE decreases, the model’s performance will improve. But these values alone are not intuitive.    

In the case of a classification problem, if the model has an accuracy of 0.8, we could gauge how good our model is against a random model, which has an accuracy of  0.5. So the random model can be treated as a benchmark. But when we talk about the RMSE metrics, we do not have a benchmark to compare.  

This is where we can use R-Squared metric. The formula for R-Squared is as follows  
![Screenshot%202021-06-11%20164227.png](attachment:Screenshot%202021-06-11%20164227.png)
MSE(model): Mean Squared Error of the predictions against the actual values  

MSE(baseline): Mean Squared Error of  mean prediction against the actual values  

In other words how good our regression model as compared to a very simple model that just predicts the mean value of target from the train set as predictions.  

## Adjusted R Squared
The disadvantage of the R2 score is while adding new features in data the R2 score starts increasing or remains constant but it never decreases because It assumes that while adding more data variance of data increases.  
 
But the problem is when we add an irrelevant feature in the dataset then at that time R2 sometimes starts increasing which is incorrect.   

Hence, To control this situation Adjusted R Squared came into existence.  
![Screenshot%202021-06-11%20164559.png](attachment:Screenshot%202021-06-11%20164559.png)
Now as K increases by adding some features so the denominator will decrease, n-1 will remain constant. R2 score will remain constant or will increase slightly so the complete answer will increase and when we subtract this from one then the resultant score will decrease. so this is the case when we add an irrelevant feature in the dataset.  

And if we add a relevant feature then the R2 score will increase and 1-R2 will decrease heavily and the denominator will also decrease so the complete term decreases, and on subtracting from one the score increases.  

**n=40  
k=2  
adj_r2_score = 1 - ((1-r2)*(n-1)/(n-k-1))  
print(adj_r2_score)**      
Hence, this metric becomes one of the most important metrics to use during the evaluation of the model.  

## What is the difference between R squared and adjusted R squared.?

- R squared value also known as coefficient of determination is a statistical performance measure for a regression model. R squared value always lies between 0 and 1 and it must be as high as possible. 
- It explains the proportion of variance for a dependent variable (y) w.r.t an independent variable (x) or variables (x1,x2...) in the regression model. 
- The difference between R squared and adjusted R squared value is that R squared value assumes that all the independent variables considered affect the result of the model, whereas the adjusted R squared value considers only those independent variables which actually have an effect on the performance of the model. 
- When multiple linear regression models are built, say with forward addition method, at each iteration when an independent variable is added, the R squared value will keep increasing, but the adjusted R square will only increase when the variable actually affects the dependent variable. 
- If a variable is non-significant, the R squared value will still increase, but the adjusted R squared value will decrease at that point. This recipe demonstrates an example of the difference between R squared and adjusted R squared value.