### Simple Linear Regression

`y = b0 + b1 * x1`

- y  => dependent variable
- x1 => independent variable
- b0 => constant
- b1 => coefficient

The best fit line is which has the minimum sum of **"ordinary least squares"**

`MIN(SUM(Y - Y_PREDICTED)^2)`

### Multi-Linear Regression

`y = b0 + b1*x1 + b2*x2 + ... + bn*xn`

**Assumptions of a linear regression (you need to check each one before applying it on the data)**

1. Linearity
2. Homoscedasticity
3. Multivariate normality
4. Independence of errors
5. Lack of multicollinearity

**Dummy variable trap**

Let's say that you have a categorical variable (i.e. American State names) which has two categories in the dataset.

You can create a new column to encode this categorical variable as a dummy variable, but you don't need to include two new dummy vars. because the second would be duplicating information.


`y = b0 + b1*x1 + b2*x2 + b3*x3 (after here it comes dummy vars) + b4*D1`

We don't include `b5*D2`, because `D2 = 1 - D1`. When a independent variable predicts another independent, that's called multicollinearity. 

So, you need to exclude a dummy variable.

#### Statistical Significance (intuition)

You assume that you live in an `H_0` universe (called the null hypothesis) and there is an alternative one `H_1`.

Then you run the experiment, and you will rejecting the null hypothesis if the probably of the null hypothesis being true is less than 5% (this threshold is called the P-value).

In a nutshell, statistical significance it's the point where the human intuitive terms you get uneasy about the null hypothesis being true.

#### Building a model

**Five methods to discard (or not 😏) independent variables in order to build a better model**

1. All-in (by prior domanin knowledge or you have to because any reason)
2. Backward Elimination
    - 2.1 Select a significance level to stay in the model (e.g. SL = 0.05).
    - 2.2 Fit the fill model with all possible predictors.
    - 2.3 Consider the predictor with the highest P-value. If `P > SL`, go to STEP 4, otherwise go to FIN.
    - 2.4 Remove the predictor.
    - 2.5 Fit model without this variable. Go to 2.3 step.
    - 2.6 FIN.
3. Forward Selection
    - 3.1 Select significance level to enter the model (e.g. SL = 0.05).
    - 3.2 Fit all simple regression models `y ~ x_n`. Select the one with the lowest P-value.
    - 3.3 Keep this variable and fit all possible models with one extra predictor added to the one(s) you already have.
    - 3.4 Consider thr predictor with the lowest P-value. If `P < SL`, go to STEP 3.3, otherwise go to FIN.
    - 3.5 FIN. Keep the previous model.
4. Bidirectional Elimination
    - 4.1 Select a significance level to enter and to stay in the model e.g.: SLENTER = 0.05, SLSTAY = 0.05.
    - 4.2 Perform the next step of Forward selection (new variables must have: `P < SLENTER` to enter).
    - 4.3 Perform ALL steps of Backward Elimination (old variables must have: `P < SLSTAY` to stay). Go to Step 4.2 or 4.4.
    - 4.4 No new variables can enter and no old variables can exit.
    - 4.5 FIN.
5. Score Comparison - all possible methods.
    - 5.1 Select a criterion of goodness of fit.
    - 5.2 Construct all possible regression models: `2^n - 1` total combinations.
    - 5.3 Select the one with the best criterion.
    - 5.4 FIN.


1), 2) and 3) are called stepwise regression.

#### Important

In multiple linear-regression is not necessary to apply feature scaling, because the coefficient will compensate higher values of features.

### Polynomial Regression

`y = b0 + b1*x1 + b2*x1² + ... + bn*xn^n`

Polinomial regression is just a special version of multiple linear regression.

It's called linear for Polynomial "Linear" Regression because we are talking about the coefficients in the function, and if the function can be expressed as a linear combination of coefficients.

### Support Vector Regression (SVR)

Instead of modeling data with a simple linear curve. We now use a tube, a tube with epsilon width, which allows certain error margin, and the points outside this tube are drawn as support vectors. Because they are guiding the form and location of the tube.

![SVR intuition](./assets/SVR%20intuition.png)

### Regression Decision Tree

The algorithm split the datasets until the leafs don't add more information (related to information entropy), then maps the leafs to the average of the corresponding targets (Y).