# Multiple Linear Regression

---

## Multiple Linear Regression Equation

$$ \hat{y} = b_{0} + b_{1} X_{1} + b_{2} X_{2} + ... + b_{n} X_{n} = \sum_{i=0}^{n} b_{i} X_{i} $$

This equation is similar to the Simple Linear Regression Equation. Again, $\hat{y}$ is the dependent variable, the value we want to predict; $X_{i}$ are the independent variables, the predictors; $b_{0}$, the y-intercept; and $b_{i}$, the slope coefficients.

Since every predictor will have it's own coefficient, there is no need to apply feature scaling when we are using a Multiple Linear Regression model.

### Assumptions of Linear Regression

You cannot blindly apply linear regression every dataset, we need to make sure that our data is fit for using linear regression. That's why we need to make the assumptions of linear regression.

1. **Linearity**: We want to make sure there is a linear relationship between Y and each X.

2. **Homoscedasticity**: This means that our data have equal variance.

3. **Multivariate Normality**: Normality of error distribution. If you look along the line of linear regression, you want to see a normal distribution of your data point.

4. **Independence**: Includes "no autocorrelation". We don't want to see any kind of pattern in our data. If we see this, it means that some rows in our data are not independent, and some rows are affecting another.

5. **Lack of Multicollinearity**: Predictors are not correlated with each other.

6. **The Outlier Check** (Extra Check): We may want to remove the outliers before building a linear regression.

![image.png](https://sds-platform-private.s3-us-east-2.amazonaws.com/uploads/B97-Header-Image.jpg)

---

## Statistical Significance

### Coin Toss

In a coin toss we have two possible situations:

$ H_{0} $: The first assumption is that we live in a fair universe, with a fair coin. This we will call our Null Hypothesis.

$ H_{1} $: The other possibility is that we are living in an unfair universe, with an unfair coin. This will be our Alternative Hypothesis.

We want to understand which situation are we dealing with. The way to do this first we are going to assume that $H_{0}$ is true, and based on our experiment we will see if we can contradict that hypothesis.

We toss a coin six times, and we always get tails. That is an event with a probability of ~1%. As we keep going with the experiment and we get the same result, the probability lowers and we start to feel suspicious about the coin.

### P-Value

The value of this happening, the **P-Value**, given the universe where the null hypothesis is true, is lowering. On the other hand, if we lived in the universe where the null hypothesis is false, then the **P-Value** would have stayed the same thoughout the experiment.

The uneasy feeling we have about the probability of something to happen getting lower and it still happenning relates to the stattistical significance. Just around where we start to get that uneasy feeling is where the statistical significance begins. $ \alpha = 0.5 $ So when we cross this threshold, we decide to reject the hypothesis that we started with, and we are rejecting it with a 95% of confidence.

---

## Building a Model (Step-by-Step)

### Decide what to keep and what to throw

We cannot use all our available data to build our model all the times, there will be times when we will have to decide which are the relevant independent variables. The reasons for this are:

1. That we might be "feeding" out model with unnecessary data, and that would make our model to make questionable predictions.

2. The second reason is that we will have to explain what does it means that the variables we chose predict the behavior of the dependent variable.

### Method 1: All-in

You throw in all your variables. You do this: when you have prior knowledge that the variables you are going to use are the right ones, and you don't have to build anything special; or when you have to, you don't have a decision; or when you are preparing for Backward Elimination.

### Method 2: Backward Elimination

STEP 1: Select a significance level to stay in the model (e.g. SL = 0.05).

STEP 2: Fit the full model with all the possible predictors.

STEP 3: Consider the predictor with the highest P-Value. If P > SL, go to STEP 4, otherwise go to FIN.

STEP 4: Remove the predictor.

STEP 5: Fit model without this variable. Go back to STEP 3.

FIN: Model is ready.

### Method 3: Forward Selection

STEP 1: Select a significance level to eneter the model (e.g. SL = 0.05).

STEP 2: Fit all simple regression models $ y \sim x_{n} $. Select the one with the lowest P-Value.

STEP 3: Keep this variable and fit all possible models with one extra predictor added to the one(s) you already have.

STEP 4: Select the predictor with the <u>lowest</u> P-Value. If P < SL, go to STEP 3, otherwise go to FIN.

FIN: Keep the previous model.

### Method 4: Bidirectional Elimination

STEP 1: Select a significance level to enter and to stay in the model (e.g. SLENTER = 0.05, SLSTAY = 0.05).

STEP 2: Perform the next step of the Forward Selection (new variables must have P < SLENTER to enter).

STEP 3: Perform ALL the steps of the Backward Elimination (old variables must have P < SLSTAY to stay).

Go back to STEP 2 until No new variables can enter and no old variables can exit.

FIN: Model is ready.

### Method 5: Score Comparison

STEP 1: Select a crieterion of goodness of fit (e.g. Akaike criterion).

STEP 2: Construct All Possible Regression Models: $2^{n}-1$ total combinations.

STEP 3: Select the one with the best criterion.

STEP 4: Model is ready.

---

## Additional Reading

*The Application of Multiple Linear Regression and Artifical Neural Network Models for Yield Prediction of Very Early Potato Cultivars before Harvest* Magdalena Piekutowska et. al. (2021). Link: https://www.mdpi.com/2073-4395/11/5/885.