## TO-DO LIST

---

> * Understand the regression algorithms

> * Understand the regression evaluation metrics

> * Prepare/transform the dataset to try to improve the performance

## Introduction

---

The goal of linear regression is to model the relationship between one or multiple features and a continuous target variable. Regression analysis is a subcategory of supervised machine learning. In contrast to classification— another subcategory of supervised learning—regression analysis aims to predict outputs on a continuous scale rather than categorical class labels.

<center><img src="15.jpg"></center>


## Simple linear regression

---

The goal of simple (univariate) linear regression is to model the relationship between a single feature (explanatory variable x) and a continuous valued response (target variable y). The equation of a linear model with one explanatory variable is defined as follows:
<center><img src="lr.jpg">

Here, the weight $$ {W}_{0} $$ represents the y-axis intercept and $$ {W}_{1} $$ is the weight coefficient of the explanatory variable. Our goal is to learn the weights of the linear equation to describe the relationship between the explanatory (independent) variable and the target variable, which can then be used to predict the responses of new explanatory variables that
were not part of the training dataset. Based on the linear equation that we defined previously, linear regression can be
understood as finding the best-fitting straight line through the sample points, as shown in the following figure:
<center><img src="scat.jpg">

This best-fitting line is also called the regression line, and the vertical lines from the regression line to the sample points are the so-called offsets or residuals - the errors of our prediction.

## Multiple linear regression

---

The special case of linear regression with one explanatory variable that we introduced in the previous subsection is called simple linear regression. Of course, we can also generalize the linear regression model to multiple explanatory
variables; this process is called multiple linear regression.
<center><img src="mlr.jpg">

Here, $$ {W}_{0} $$ is the y-axis intercept with $$ {X}_{0} = 1 $$

The following figure shows how the two-dimensional, fitted hyperplane of a multiple linear regression model with two features could look like
<center><img src="mm.jpg">

## How do we find the best fit line?

---

We define a **cost function** as:

<center><img src="cf.jpg">

This is also known as **Sum of Squared Errors(SSE)**. Let's try to understand the intuition behind cost function

<center><img src="1.jpg">

Now, no brainer the best fit line would be $$ Y = X $$ which would mean $$ {W}_{0} = 0 $$ and $$ {W}_{1} = 1 $$

<center><img src="2.jpg">

Now if we calculate the cost function for $$ {W}_{0} = 0 $$ and $$ {W}_{1} = 1 $$ the value would be $$ J(w) = ((1 - 1) ^ 2 + (2 - 2) ^ 2 + (3 - 3) ^ 2)/2 = 0 $$

But if we calculate the cost function for $$ {W}_{0} = 0 $$ and $$ {W}_{1} = 0.5 $$ the line fitted would be like 

<center><img src="3.JPG">

And the cost function value would be $$ J(w) = ((1 - 0.5) ^ 2 + (2 - 1) ^ 2 + (3 - 1.5) ^ 2)/2 = 1.75 $$

So overall, the relationship between the weights (& intercept) and the cost function looks like

<center><img src="cp.jpg">

## Now, how do we find the minimum point?

---

We can use a technique called **Gradient Descent (GD)**

<center><font color="red">Repeat until convergence</font></center>

<center><img src="4.jpg"></center>

<center><img src="5.jpg"></center>

---

## Learning rate

---

Learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function

## How do we decide the learning rate?

<center><img src="6.jpg">

<center><img src="7.jpg">

## <font color="blue">Other ways of finding the best fit line: LIBLINEAR (SCIKIT-LEARN), RANSAC

## Evaluation Metrics

---

> **Mean Absolute Error(MAE)**

MAE is a very simple metric which calculates the absolute difference between actual and predicted values.

To better understand, let’s take an example you have input data and output data and use Linear Regression, which draws a best-fit line.

Now you have to find the MAE of your model which is basically a mistake made by the model known as an error. Now find the difference between the actual value and predicted value that is an absolute error but we have to find the mean absolute of the complete dataset.

so, sum all the errors and divide them by a total number of observations And this is MAE. And we aim to get a minimum MAE because this is a loss.


<center><img src="8.jpg"></center>

> **Mean Squared Error(MSE)**

MSE is a most used and very simple metric with a little bit of change in mean absolute error. Mean squared error states that finding the squared difference between actual and predicted value.

So, above we are finding the absolute difference and here we are finding the squared difference.

What actually the MSE represents? It represents the squared distance between actual and predicted values. we perform squared to avoid the cancellation of negative terms and it is the benefit of MSE.

<center><img src="9.jpg"></center>

> **Root Mean Squared Error(RMSE)**

As RMSE is clear by the name itself, that it is a simple square root of mean squared error.

<center><img src="10.jpg"></center>


> **Coefficient of Determination (R-squared)**

R2 score is a metric that tells the performance of your model, not the loss in an absolute sense that how many wells did your model perform.

In contrast, MAE and MSE depend on the context as we have seen whereas the R2 score is independent of context.

So, with help of R squared we have a baseline model to compare a model which none of the other metrics provides. The same we have in classification problems which we call a threshold which is fixed at 0.5. So basically R2 squared calculates how must regression line is better than a mean line.

Hence, R2 squared is also known as Coefficient of Determination or sometimes also known as Goodness of fit.

<center><img src="11.jpg"></center>

where
<center><img src="12.jpg"></center>
and
<center><img src="13.jpg"></center>

> **Adjusted Coefficient of Determination (Adjusted R-squared)**

The disadvantage of the R2 score is while adding new features in data the R2 score starts increasing or remains constant but it never decreases because It assumes that while adding more data variance of data increases.

But the problem is when we add an irrelevant feature in the dataset then at that time R2 sometimes starts increasing which is incorrect.

Hence, To control this situation Adjusted R Squared came into existence.

<center><img src="14.jpg"></center>

In [None]:
# from sklearn.metrics import mean_absolute_error
# mean_absolute_error(<actual target>, <predictions>)

In [None]:
# from sklearn.metrics import mean_squared_error
# mean_squared_error(<actual target>, <predictions>)

In [None]:
# import numpy as np
# from sklearn.metrics import mean_squared_error
# np.sqrt(mean_squared_error(<actual target>, <predictions>))

In [None]:
# from sklearn.metrics import r2_score
# r2_score(<actual target>, <predictions>)

In [None]:
# n=<number_of_records>
# k=<number_of_indep_variables>
# r2 = r2_score(<actual target>, <predictions>)
# adj_r2_score = 1 - ((1-r2)*(n-1)/(n-k-1))
# print(adj_r2_score)

## Coming up next

---

> * Key assumptions to test before fitting linear regression

> * Transformation requirements for linear regression

> * Application & evaluation