<a href="https://colab.research.google.com/github/Rioba-Ian/Statistics/blob/main/Evaluating_regression_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we shall be evaluating the metrics of a supervised machine learning model of the linear regression. 

We shall encounter terms such as **cost, cost function and Gradient descent.**

# Cost and Cost Function. 
Cost can be explained simply by this. For example our model predicts that a house should be  300,000 when we know it should start by 160,000. This difference is called the cost, the 140,000.

It is how far the line is from the real daa. The best line is one that is least off from the real data. In Machine learning, cost functions are used to estimate how badly the model has performed. Simply, cost function is how wrong the model is in terms of the ability to measure the relationship between x and y.

Okay, now that we understand cost and cost function, the cost function quantifies the error between the predicted values and the expected value and presents them as a single real number. Depending on the problem, the cost function can be either minimized or maximised. In minimized, the cost is returned as a loss or an error. The aim will be to find the parameters that return as small number as possible. 

In maximization, it shall be returning the reward. The goal will be to find the values of the parameters that return number as large as possible. 

$$
\begin{equation}
minimize\frac{1}{n}\sum_{i=1}^{n} (pred_i - y_i)^2
\end{equation}$$

The difference between the true value and predictions is the residual.

In [3]:
from IPython.display import Latex, Math, Image

# The MSE mean squared error
MSE is simply the difference between the true target and the value predicted by regressor. By squaring the differences, it penalizes - gives a penalty or weight for deviating from the objective- even a small error which leads to over-estimation of how bad the model is.

$$
\begin{equation}
MSE = \frac{1}{n}\sum (y-\hat{y})^2
\end{equation}
$$

# Root Mean Squared Error RMSE
It is just the square root of the MSE. 
It is preferred where the errors are squared before averging which poses a high penalty on large errors, Thus, the RMSE is useful when large errors are undesired. 


$$
\begin{equation}
RMSE = \sqrt{\frac{\sum_{i=1}^{N} (y-\hat{y})^2}{N}}
\end{equation}
$$

# Mean Absolute error MAE
MAE is the absolute difference between the target value and the value predicted by the model. It does not penlaize the errors as strongly/effectively as the MSE and is unsuitable for use-cases where we want to pay more attention to the outliers. 


$$
\begin{equation}
MAE = \frac{1}{n}\sum |y-\hat{y}|
\end{equation}
$$

# R Squred coefficient of determination. 
R-squared is th goodness-of-fit measure for linear regression. It shows how ell the values fit compared to the original values. The higher the value, the better the model is:


$$
\begin{equation}
R^2 = 1- \frac{\sum (y-\hat{y})^2}{\sum (y-\bar{y})^2}
\end{equation}
$$

PS: it can be negative, meaning that we are doing worse than the mean model. 

## Important question: What to use when?
<img src="https://miro.medium.com/max/630/1*8VM2PELQ-oeM0O3ya7BIyQ.png">

I will be making most of the reference from this article below.

Link is <a src="https://medium.com/usf-msds/choosing-the-right-metric-for-machine-learning-models-part-1-a99d7d7414e4">here.</a>

# The Adjusted R-squared 
it is same as the r-sqared except that it shows how well in terms of a fit of a curve adjusts for the number of terms in a model. 

$$
\begin{equation}
R_{adj}^2 = 1 - [\frac{(1-R^2)(n-1)}{n-k-1}]
\end{equation}
$$

the adj-R2 will alwats be less than or equal to the r-squared. The adj-r2 will always consider the marginal improvement added by additional terms in the model. It will thus increase or decrease based on the usefulness of the terms. 

In [4]:
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model

In [12]:
def metrics(m, X, y):
    yhat = m.predict(X)
    print(yhat)
    ss_residual = sum((y - yhat)**2)
    ss_total = sum((y-np.mean(y))**2)
    r_squared = 1 - (float(ss_residual))/ss_total
    adj_r_squared = 1 - (1-r_squared)*(len(y)-1)/(len(y)-X.shape[1]-1)
    return r_squared, adj_r_squared

In [9]:
df = pd.DataFrame({"x1": [1,2,3,4,5], "x2": [2.1, 4, 6.1, 8, 10.1]})
y = np.array([2.1, 4, 6.2, 8,9])


In [13]:
model1 = linear_model.LinearRegression()
model1.fit(df.drop("x2", axis=1), y)
metrics(model1, df.drop("x2", axis=1), y)

[2.3  4.08 5.86 7.64 9.42]


(0.9854441403334162, 0.9805921871112216)

In [14]:
model2 = linear_model.LinearRegression()
model2.fit(df, y)
metrics(model2, df, y)

[2.20666667 4.22       5.76666667 7.78       9.32666667]


(0.9874761549307456, 0.9749523098614912)

In [15]:
model3 = linear_model.LinearRegression()
model3.fit(df, y)
metrics(model3, df, y)

[2.20666667 4.22       5.76666667 7.78       9.32666667]


(0.9874761549307456, 0.9749523098614912)

the table below shows that even though we are not adding any additional information from case 1 to case 2, still the r2 is increasing whereas the adj-r2 is decreasing to correct the trend; penalizing model2 for more number of varibles. 
<img src="https://miro.medium.com/max/630/1*C-i3nKPtHl_mkfTFgX2IQg.png">
Therefore we can say that, adj-r2 is better than rmse because rmse doesn't actually tell how bad a model is. 

Remember the misconception about going all the way to negative. Don't forget 

Let's get into something more rigorous called the Gradient. I am always learning about it. 

# Gradient

It means slope. a high gradient is a steeper slope. 

## Gradient Descent 
Cost function only gives us information of how good or bad our values are. The Gradient descent updates the values of m and b in order to reduce the cost function - minimize rmse. The idea is to start with random m and b values and the iteratively update these values in order to reach a minimum. 

In gradient descent, you take the current value of m and add the derivative at that, you will go down the slope. Do it a bunch of times and you will hit the bottom. 