## Multicollinearity 
It happens when two or more features are highly correlated to each other.  

### Problems
- It becomes difficult to know how each feature is affecting the result. since multiple features are giving similar information it be difficult to find individual contribution
- It can also overfit the model due to presence of similar features 
- Unstable coefficients : when independent features are highly correlated, even small changes in data can make the coefficients fluctuate a lot 
- The model is not relaible due to high errors

### How to detect?
- VIF (Variance inflation factor) : 
    - vif = 1 -> independent features
    - vif <=5 -> acceptable generally
    - vif > 5 -> Multicollinerity
    - vif > 10 -> problematic!

- Correlation matrix :
    - corr > 0.7 or 0.8 -> Multicollinearity

### Solution

- Reduce redundant variables: if two or more features show multicollinearity, one of them can be dropped (less important one)
- Combine variables: combine the variables with multicollinearity and remove individual ones
- Regularization: Ridge regression shrinks the effect of features, Lasso almost removes less important features
- Get more data: Getting more data reduces overfitting and can remove multicollinearity between features
- PCA (Principle component analysis)

In [2]:
## showing VIF
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.stats.outliers_influence import variance_inflation_factor

In [7]:
data = pd.read_csv('BMI.csv')

data.head()

Unnamed: 0,Gender,Height,Weight,Index
0,Male,174,96,4
1,Male,189,87,2
2,Female,185,110,4
3,Female,195,104,3
4,Male,149,61,3


In [8]:

data['Gender'] = data['Gender'].map({'Male':0, 'Female':1})

X = data[['Gender', 'Height', 'Weight']]

vif_data = pd.DataFrame()
vif_data["feature"] = X.columns

vif_data["VIF"] = [variance_inflation_factor(X.values, i)
                          for i in range(len(X.columns))]
print(vif_data)

  feature        VIF
0  Gender   2.028864
1  Height  11.623103
2  Weight  10.688377


feature with VIF > 10 can be removed or combined if possible