# Regression

## Simple Regression 

Simple Regression - just two varialbles, one which is the explanatory variable (x) and the other one is response variable (y). This type of regression can be easily illustrated with scatterplot. 

### Correlation Coefficent 

<b>Correlation Coefficient (r)</b> - the strength and direction of a linear relationship. $r \in [-1,1]$

The boundaries for the strengh of correlation depend on the fiels. General guidelines:  
- Strong: $0.7 \leq |r| < 1.0$   
- Moderate: $0.3 \leq |r| < 0.7$   
- Weak: $0.0 \leq |r| < 0.3$   


$$r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) } { {\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2}} {\sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}}} $$

<b>Important:</b> if $r=0$ it DOES NOT necessarily mean that there is no relationship at all. It just means that there is no <b>linear</b> relationships. So, correlation cofficient only captures linear relationships.

### Example in Python

In [4]:
import pandas as pd
import numpy as np

In [10]:
df = pd.read_excel('quizzes-data-1.xlsx')
df = df[df.Temp.notnull()&df.Sales.notnull()]

In [11]:
np.corrcoef(df.Temp, df.Sales)

array([[1.        , 0.95902026],
       [0.95902026, 1.        ]])

### Regression Line

A line is commonly identified by an intercept and a slope.

The **intercept** is defined as the **predicted value of the response when the x-variable is zero**.

The **slope** is defined as the **predicted change in the response for every one unit increase in the x-variable**.

The regression line is defined as follows: 

$$\hat{y} = b_0 + b_1 x$$

$b_0, b_1$ are statistic values, whereas $\beta_0, \beta_1$ are actual, population parameters. Also, $\hat{y}$ is predicted value, whereas $y$ is actual value.

### Least Squares Algorithm 
Goal: Minimize the sum of the squared vertical distances from the line to points. Objective function will look like this:  

$$E = \sum_{i=1}^n (y_i-\hat{y_i})^2$$

Other loss functions are possible, but this one is the easiest one to work with since it's easy to take it's derivative which is necessary for finding the minimum. 

#### Derivation

We can define our $\hat{y_i}$ as:
$$x^T b$$, 
where $x_i$ is a vector $(1, {x_i}')$ (I'm turning original scalar $x_i$ to a vector so that we can pack it into a dot product. By default, all vectors are column vectors. Now, b is a vector $(b_0, b_1)$. Check that we get the same result after these arrangements: 

$$x^T b = [1; x_i]^T [b_0; b_1] = b_0 + b_1 x_i $$

So we can rewrite our objective function as: 

$$E(b) = \sum_{i=1}^{n} (y_i - x^T b)^2$$ 

This sum is actually the definiton of a dot product, so we can further rewrite it as: 

$$E(b) = \sum_{i=1}^{n} (y_i - x^T b)^2 = (y-Xb)^T (y-Xb)$$, 

where X is a n by 2 matrix with the first column being all 1s. So when we multiply this matrix by vector b, we'll get $\hat{y}$ vector of predictions. Now we can minimize this function, but first we will expand it: 

$$E(b) = (y-Xb)^T (y-Xb) = y^T y - y^T X b - b^T X^T y + b^T X^T X b$$

Here, it's important to notice that $y^T X b = b^T X^T y$, so we can now write: 

$$E(b) = y^T y - 2 b^T X^T y + b^T X^T X b $$

And now we will take the derivative of this guy and equate it to $\vec{0}$:  

$$\nabla{E} = - 2 X^T y + 2 X^T X b = 0 $$

And now we can find the b vector as follows: 

$$X^T y = X^T X b$$

$$b = (X^T X)^{-1} X^T y $$  

The only possible problem here is that the matrix might appear to be non-invertible and in this case there are special techniques that help to avoid it. Typically, pseudoinverse is used. 