# Linear Regression: Beginner Theory

This notebook is quite mathematical since we want to understand this machine learning algorithm pretty well. We'll work with simple linear regression as a starting point where we predict one variable from another variable and then we'll look at multiple linear regression where we can predict one output variable from multiple input variables. 

## The main concept is to make our predictions as close to the actual values as possible. 

So how can we make this happen? 

We want to see how far away our predictions are from the actual values, but this won't be a simple subtraction because sometimes the prediction can be higher than the original and when we want to add up all the differences, it can give us an answer that is close to zero. 

In order to deal with this, we square the differences before adding up the squared differences. 

$$ E = \sum_{i=1}^{N}{(y_i - \hat{y_i})^2} $$

In the equation above, $ y_i $ represents our actual output values and $ \hat{y_i} $ represents our prediction values. 

A perfect linear regression model is where every single $ \hat{y_i} $ is equal to the $ y_i $ . In reality, this will be pretty much impossible, but what we can do is try and adjust the model's parameters so it can try and minimize this error to be as close to zero as possible. 

So let's look at a simple linear regression model.

From high school, you are probably used to $$ y = mx + b $$ . Linear regression works the same exact way because you are working with two parameters: the slope $ m $ and the bias term (intercept) $ b $ . We want to turn this into what a formal machine learning regression model looks like:

$$ \hat{y_i} = \beta_1 x_i + \beta_0   $$  

In the above equation, $ \beta_1 $ is the same as $ m $ and $ \beta_0 $ is the same as the intercept $ b $ . 

In some classes, you probably heard of the term, the line of best fit . The line of best fit is essentially the regression equation which minimizes the error as much as possible. You might be wondering, how does one calculate this line of best fit? This requires some multivariable calculus for predicting the output from one variable and linear algebra for predicting the output from multiple input variables. 

In this notebook, I won't go through the entire derivation, but I'll show you the important steps to get to the line of the best fit. 

## Step 1:

Substitute the regression model for $ \hat{y_i} $ in the error (cost) function. 

$$ E = \sum_{i=1}^{N}{(y_i - \hat{y_i})^2} \space \space  \space \space \space \space \rightarrow \space \space \space \space \space \space E = \sum_{i=1}^{N}{(y_i - (\beta_1 x_i + \beta_0))^2} $$

## Step 2:

Take partial derivative of the error function with respect to $ \beta_1 $ and $ \beta_0 $ 

$$ \frac{\partial{E}}{\partial{\beta_1}} = \sum_{i=1}^{N}{2(y_i - (\beta_1 x_i + \beta_0))(-x_i)} \rightarrow \beta_1 \sum_{i=1}^{N}{{x_i}^2} + \beta_0 \sum_{i=1}^{N}{x_i} = \sum_{i=1}^{N}{x_i y_i} $$ 

$$ \frac{\partial{E}}{\partial{\beta_0}} = \sum_{i=1}^{N}{2(y_i - (\beta_1 x_i + \beta_0))(-1)} \rightarrow \beta_1 \sum_{i=1}^{N}{x_i} + \beta_0 N = \sum_{i=1}^{N}{y_i} $$ 

