Linear regression relies on several assumptions to ensure the validity of the model and the accuracy of the estimates. 
Here are the key assumptions of linear regression:

1. **Linearity**: 
The relationship between the dependent variable \( y \) and the independent variables \( x_1, x_2, ...., x_n \) is linear. This means that changes in \( y \) are proportional to changes in the independent variables.

2. **Independence of errors**: 
The errors (residuals) in the model are independent of each other. In other words, the error term for one observation should not be correlated with the error term for another observation.

3. **Homoscedasticity**: 
The variance of the errors is constant across all levels of the independent variables. This implies that the spread of the residuals should remain the same regardless of the values of the predictors.

4. **Normality of errors**: 
The errors (residuals) follow a normal distribution. This assumption is important for making statistical inferences and hypothesis testing based on the model.

5. **No perfect multicollinearity**: 
There should be no exact linear relationships between the independent variables. Perfect multicollinearity can lead to unstable estimates of the coefficients.

6. **No influential outliers**: 
Outliers or influential data points should not unduly influence the estimates of the regression coefficients. 

7. **No autocorrelation**: 
The errors should not exhibit autocorrelation, meaning that they are not correlated with each other over time or across observations.

8. **Additivity**: 
The effect of changes in the independent variables on the dependent variable is additive. This means that the effect of changing one predictor variable is independent of the values of other predictor variables.

Violations of these assumptions can lead to biased estimates and inaccurate predictions. Therefore, it's essential to assess these assumptions when fitting a linear regression model and to use diagnostic techniques to check for potential violations. Additionally, techniques such as transformations, robust regression, or generalized linear models can be employed to address violations of these assumptions.

<img src="linear-assumption.png" width="650">

<img src ="multicollinearity.png" width="450">

    Measuring the Multicollinearity (VIF~Variance inflation factors)

Variance inflation factors allow a quick measure of how much a variable is contributing to the standard error in the regression. When significant multicollinearity issues exist, the variance inflation factor will be very large for the variables involved. After these variables are identified, several approaches can be used to eliminate or combine collinear variables, resolving the multicollinearity issue.

Formula and Calculation of VIF
The formula for VIF is:

<p><span data-value="\begin{aligned}&amp;\text{VIF}_i = \frac{ 1 }{ 1 - R_i^2 } \\&amp;\textbf{where:} \\&amp;R_i^2 = \text{Unadjusted coefficient of determination for} \\&amp;\text{regressing the ith independent variable on the} \\&amp;\text{remaining ones} \\\end{aligned}"><span class="katex katex--loaded"><span class="katex-mathml"><math><semantics><mrow><mtable><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><msub><mtext>VIF</mtext><mi>i</mi></msub><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>1</mn><mo>−</mo><msubsup><mi>R</mi><mi>i</mi><mn>2</mn></msubsup></mrow></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>where:</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><msubsup><mi>R</mi><mi>i</mi><mn>2</mn></msubsup><mo>=</mo><mtext>Unadjusted&nbsp;coefficient&nbsp;of&nbsp;determination&nbsp;for</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>regressing&nbsp;the&nbsp;ith&nbsp;independent&nbsp;variable&nbsp;on&nbsp;the</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mtext>remaining&nbsp;ones</mtext></mrow></mstyle></mtd></mtr></mtable></mrow><annotation encoding="application/x-tex">\begin{aligned}&amp;\text{VIF}_i = \frac{ 1 }{ 1 - R_i^2 } \\&amp;\textbf{where:} \\&amp;R_i^2 = \text{Unadjusted coefficient of determination for} \\&amp;\text{regressing the ith independent variable on the} \\&amp;\text{remaining ones} \\\end{aligned}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height: 4.55421em;"></span><span class="strut bottom" style="height: 8.60841em; vertical-align: -4.05421em;"></span><span class="base"><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 4.55421em;"><span class="" style="top: -6.55421em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"></span></span><span class="" style="top: -4.45134em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"></span></span><span class="" style="top: -2.92723em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"></span></span><span class="" style="top: -1.42723em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"></span></span><span class="" style="top: 0.072766em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"></span></span></span><span class="vlist-s">&ZeroWidthSpace;</span></span><span class="vlist-r"><span class="vlist" style="height: 4.05421em;"></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 4.55421em;"><span class="" style="top: -6.55421em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord"><span class="mord text"><span class="mord">VIF</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.311664em;"><span class="" style="top: -2.55em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathit mtight">i</span></span></span></span><span class="vlist-s">&ZeroWidthSpace;</span></span><span class="vlist-r"><span class="vlist" style="height: 0.15em;"></span></span></span></span></span><span class="mord rule" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mord rule" style="margin-right: 0.277778em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 1.32144em;"><span class="" style="top: -2.314em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">1</span><span class="mord rule" style="margin-right: 0.222222em;"></span><span class="mbin">−</span><span class="mord rule" style="margin-right: 0.222222em;"></span><span class="mord"><span class="mord mathit" style="margin-right: 0.00773em;">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.795908em;"><span class="" style="top: -2.42314em; margin-left: -0.00773em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathit mtight">i</span></span></span><span class="" style="top: -3.0448em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">&ZeroWidthSpace;</span></span><span class="vlist-r"><span class="vlist" style="height: 0.276864em;"></span></span></span></span></span></span></span><span class="" style="top: -3.15em;"><span class="pstrut" style="height: 3em;"></span><span class="stretchy" style="height: 0.2em;"><svg width="400em" height="0.2em" viewBox="0 0 400000 200" preserveAspectRatio="xMinYMin slice"><path d="M0 80H400000 v40H0z M0 80H400000 v40H0z"></path></svg></span></span><span class="" style="top: -3.677em;"><span class="pstrut" style="height: 3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s">&ZeroWidthSpace;</span></span><span class="vlist-r"><span class="vlist" style="height: 0.962864em;"></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span><span class="" style="top: -4.45134em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord textbf">where:</span></span></span></span><span class="" style="top: -2.92723em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord"><span class="mord mathit" style="margin-right: 0.00773em;">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height: 0.864108em;"><span class="" style="top: -2.453em; margin-left: -0.00773em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathit mtight">i</span></span></span><span class="" style="top: -3.113em; margin-right: 0.05em;"><span class="pstrut" style="height: 2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">&ZeroWidthSpace;</span></span><span class="vlist-r"><span class="vlist" style="height: 0.247em;"></span></span></span></span></span><span class="mord rule" style="margin-right: 0.277778em;"></span><span class="mrel">=</span><span class="mord rule" style="margin-right: 0.277778em;"></span><span class="mord text"><span class="mord">Unadjusted&nbsp;coefficient&nbsp;of&nbsp;determination&nbsp;for</span></span></span></span><span class="" style="top: -1.42723em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">regressing&nbsp;the&nbsp;ith&nbsp;independent&nbsp;variable&nbsp;on&nbsp;the</span></span></span></span><span class="" style="top: 0.072766em;"><span class="pstrut" style="height: 3.32144em;"></span><span class="mord"><span class="mord"></span><span class="mord text"><span class="mord">remaining&nbsp;ones</span></span></span></span></span><span class="vlist-s">&ZeroWidthSpace;</span></span><span class="vlist-r"><span class="vlist" style="height: 4.05421em;"></span></span></span></span></span></span></span></span></span></span>
</p>

R-squared is the score for each model will be calculated using :

<img src="R-Squared-1.png">

What Can VIF Tell You?
When Ri2 is equal to 0, and therefore, when VIF or tolerance is equal to 1, the ith independent variable is not correlated to the remaining ones, meaning that multicollinearity does not exist.
1
CFI. "Variance Inflation Factor."


In general terms,

VIF equal to 1 = variables are not correlated
VIF between 1 and 5 = variables are moderately correlated 
VIF greater than 5 = variables are highly correlated
2
The higher the VIF, the higher the possibility that multicollinearity exists, and further research is required. When VIF is higher than 10, there is significant multicollinearity that needs to be corrected.
1


Sure, let's say we have a multiple linear regression model with three independent variables: \( x_1 \), \( x_2 \), and \( x_3 \). We want to calculate the Variance Inflation Factor (VIF) for each of these variables to check for multicollinearity.

Here's an example dataset:

```
x1   x2   x3
3    5    7
4    6    8
5    7    9
6    8    10
```

To calculate the VIF for each independent variable:

1. **Fit a regression model for each variable**: We fit a regression model for each independent variable, using all other independent variables as predictors.

2. **Calculate the VIF**: The VIF for each variable is calculated using the formula: 

VIF_i = 1/1 - R_i^2

Where R_i^2 is the coefficient of determination R^2 of the regression model for variable i.

Let's calculate the VIF for each variable:

1. For x_1:
   - Fit a regression model: x_1 = beta_0 + beta_1 x_2 + beta_2 x_3 + epsilon
   - Calculate R_1^2
   - Calculate VIF for x_1
   
2. For x_2:
   - Fit a regression model: x_2 = beta_0 + beta_1 x_1 + beta_2 x_3 + epsilon
   - Calculate R_2^2 
   - Calculate VIF for x_2

3. For x_3:
   - Fit a regression model: x_3 = beta_0 + beta_1 x_1 + beta_2 x_2 + epsilon
   - Calculate R_3^2
   - Calculate VIF for x_3

Let's assume we get the following R^2 values from the regression models:

- R_1^2 = 0.90  for x_1
- R_2^2 = 0.85 for x_2
- R_3^2 = 0.95  for x_3

Now we can calculate the VIF for each variable:

1. For x_1:
   VIFx_1 = 1/1 - R_1^2 = {1}{/1 - 0.90} = {1}/{0.10} = 10 
   
2. For x_2:
   VIFx_2 = 1/1 - R_2^2 = {1}/{1 - 0.85} = {1}/{0.15} approx 6.67 
   
3. For x_3:
   VIFx_3 = 1/1 - R_3^2 = {1}/{1 - 0.95} = {1}/{0.05} = 20 

Interpretation:
- VIFx_1= 10 : Indicates that the variance of  x_1 is inflated by a factor of 10 due to multicollinearity.
- VIFx_2 approx 6.67 : Indicates that the variance of x_2  is inflated by a factor of approximately 6.67 due to multicollinearity.
- VIFx_3 = 20: Indicates that the variance of x_3 is inflated by a factor of 20 due to multicollinearity.

**High VIF values (typically above 10) indicate multicollinearity between the independent variables.** VIF 10 explains the other independent varibal approximetly upto 90%