## Multiple Regression
This notebook contains model for Multiple regression 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split

## Problem
you have 50 companies in total. You have several data on them. You have to analyze the data for a venture capital firm in order to judge what kind of company should they be interested in. You have to analyze which spend yeilds more profit. 

In [2]:
dataset= pd.read_csv("MultipleRegression.csv")
dataset.head()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94


## Assumptions of a Linear Regression model 

1. Linearity 
2. Homoscendasticity 
3. Multivariate Normality 
4. Independence of errors 
5. Lack of multicollinearity

-------------------------------------------

#### Dummy Variable Trap

We do not include all dummy variables. 

#### Why don't we use all variables 

Some are garbage variables which do not predict something 

#### How to select models 
1. **All in**- Throw in all variables. You do this when you need to or you are preparing for backward elimination 


2. **Backward elimination** - We select a significance level for a variable to stay in the model. You fit all predictors in the model and calculte the p-value. You delete the model with highest p value if it is greater than the significance level and you re fit the whole model and repeat the step until the highest loss possible (p value) is still less than the decided significance level. 


3. **Forward selection** - Sameway as Backward elimination but chosing an entry significance level and chosing out of all relations. Keep the previous model

![FS](ForwardSelection.png)

4. **Bidirectional elimination** - 

![BD](Bidirectional.png)

5. **Score comparison**- Create all models and compare 

We use Backward elimination because it is fast

**Stepwise regression**- 2,3,4 are together called Stepwise regression 

In [3]:
X= dataset.iloc[:,:-1].values
Y=dataset.iloc[:,4].values
le= LabelEncoder()
X[:,3]=le.fit_transform(X[:,3])
ohe= OneHotEncoder(categorical_features=[3])
X=ohe.fit_transform(X).toarray()
pd.DataFrame(X).head()

Unnamed: 0,0,1,2,3,4,5
0,0.0,0.0,1.0,165349.2,136897.8,471784.1
1,1.0,0.0,0.0,162597.7,151377.59,443898.53
2,0.0,1.0,0.0,153441.51,101145.55,407934.54
3,0.0,0.0,1.0,144372.41,118671.85,383199.62
4,0.0,1.0,0.0,142107.34,91391.77,366168.42


In [4]:
# Avoid the dummy varibale trap 
X=X[:,1:]
pd.DataFrame(X).head()

Unnamed: 0,0,1,2,3,4
0,0.0,1.0,165349.2,136897.8,471784.1
1,0.0,0.0,162597.7,151377.59,443898.53
2,1.0,0.0,153441.51,101145.55,407934.54
3,0.0,1.0,144372.41,118671.85,383199.62
4,1.0,0.0,142107.34,91391.77,366168.42


In [5]:
X_train, X_test, Y_train, Y_test= train_test_split(X,Y,test_size= 0.2, random_state=0)

In [6]:
from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(X_train, Y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [7]:
regressor.predict(X_test)

array([103015.20159796, 132582.27760815, 132447.73845175,  71976.09851258,
       178537.48221056, 116161.24230166,  67851.69209676,  98791.73374687,
       113969.43533013, 167921.06569551])

In [8]:
Y_test

array([103282.38, 144259.4 , 146121.95,  77798.83, 191050.39, 105008.31,
        81229.06,  97483.56, 110352.25, 166187.94])

In [9]:
regressor.score(X_test, Y_test)

0.9347068473282446

In [10]:
regressor.coef_

array([-9.59284160e+02,  6.99369053e+02,  7.73467193e-01,  3.28845975e-02,
        3.66100259e-02])

In [11]:
# Backward Elimination 
import statsmodels.formula.api as sm
# We need to add a column of 1st to our model. Why we do that because in our equataion of ax1 + bx2 + cx3 + dx4 
X=np.append(arr=np.ones((50,1)).astype(int), values=X, axis=1)
pd.DataFrame(X).head()

Unnamed: 0,0,1,2,3,4,5
0,1.0,0.0,1.0,165349.2,136897.8,471784.1
1,1.0,0.0,0.0,162597.7,151377.59,443898.53
2,1.0,1.0,0.0,153441.51,101145.55,407934.54
3,1.0,0.0,1.0,144372.41,118671.85,383199.62
4,1.0,1.0,0.0,142107.34,91391.77,366168.42


In [16]:
X_opt= X[:,[0,1,2,3,4,5]]
SL= 0.05
regresoor_ols= sm.OLS(endog= Y, exog= X_opt).fit() # Ordinary Least Squares
#endog- dependent variable
#exog- independent variable 

In [17]:
regresoor_ols.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.951
Model:,OLS,Adj. R-squared:,0.945
Method:,Least Squares,F-statistic:,169.9
Date:,"Sun, 30 Dec 2018",Prob (F-statistic):,1.34e-27
Time:,00:52:37,Log-Likelihood:,-525.38
No. Observations:,50,AIC:,1063.0
Df Residuals:,44,BIC:,1074.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,5.013e+04,6884.820,7.281,0.000,3.62e+04,6.4e+04
x1,198.7888,3371.007,0.059,0.953,-6595.030,6992.607
x2,-41.8870,3256.039,-0.013,0.990,-6604.003,6520.229
x3,0.8060,0.046,17.369,0.000,0.712,0.900
x4,-0.0270,0.052,-0.517,0.608,-0.132,0.078
x5,0.0270,0.017,1.574,0.123,-0.008,0.062

0,1,2,3
Omnibus:,14.782,Durbin-Watson:,1.283
Prob(Omnibus):,0.001,Jarque-Bera (JB):,21.266
Skew:,-0.948,Prob(JB):,2.41e-05
Kurtosis:,5.572,Cond. No.,1450000.0


In [18]:
# Now since x2 has a higher p value 
X_opt= X[:,[0,1,3,4,5]]
SL= 0.05
regresoor_ols= sm.OLS(endog= Y, exog= X_opt).fit()
regresoor_ols.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.951
Model:,OLS,Adj. R-squared:,0.946
Method:,Least Squares,F-statistic:,217.2
Date:,"Sun, 30 Dec 2018",Prob (F-statistic):,8.49e-29
Time:,00:54:32,Log-Likelihood:,-525.38
No. Observations:,50,AIC:,1061.0
Df Residuals:,45,BIC:,1070.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,5.011e+04,6647.870,7.537,0.000,3.67e+04,6.35e+04
x1,220.1585,2900.536,0.076,0.940,-5621.821,6062.138
x2,0.8060,0.046,17.606,0.000,0.714,0.898
x3,-0.0270,0.052,-0.523,0.604,-0.131,0.077
x4,0.0270,0.017,1.592,0.118,-0.007,0.061

0,1,2,3
Omnibus:,14.758,Durbin-Watson:,1.282
Prob(Omnibus):,0.001,Jarque-Bera (JB):,21.172
Skew:,-0.948,Prob(JB):,2.53e-05
Kurtosis:,5.563,Cond. No.,1400000.0


In [19]:
# Now since x1 has a higher p value 
X_opt= X[:,[0,3,4,5]]
SL= 0.05
regresoor_ols= sm.OLS(endog= Y, exog= X_opt).fit()
regresoor_ols.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.951
Model:,OLS,Adj. R-squared:,0.948
Method:,Least Squares,F-statistic:,296.0
Date:,"Sun, 30 Dec 2018",Prob (F-statistic):,4.53e-30
Time:,00:56:37,Log-Likelihood:,-525.39
No. Observations:,50,AIC:,1059.0
Df Residuals:,46,BIC:,1066.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,5.012e+04,6572.353,7.626,0.000,3.69e+04,6.34e+04
x1,0.8057,0.045,17.846,0.000,0.715,0.897
x2,-0.0268,0.051,-0.526,0.602,-0.130,0.076
x3,0.0272,0.016,1.655,0.105,-0.006,0.060

0,1,2,3
Omnibus:,14.838,Durbin-Watson:,1.282
Prob(Omnibus):,0.001,Jarque-Bera (JB):,21.442
Skew:,-0.949,Prob(JB):,2.21e-05
Kurtosis:,5.586,Cond. No.,1400000.0


In [20]:
# Now since x2 has a higher p value 
X_opt= X[:,[0,3,5]]
SL= 0.05
regresoor_ols= sm.OLS(endog= Y, exog= X_opt).fit()
regresoor_ols.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.95
Model:,OLS,Adj. R-squared:,0.948
Method:,Least Squares,F-statistic:,450.8
Date:,"Sun, 30 Dec 2018",Prob (F-statistic):,2.1600000000000003e-31
Time:,00:57:28,Log-Likelihood:,-525.54
No. Observations:,50,AIC:,1057.0
Df Residuals:,47,BIC:,1063.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,4.698e+04,2689.933,17.464,0.000,4.16e+04,5.24e+04
x1,0.7966,0.041,19.266,0.000,0.713,0.880
x2,0.0299,0.016,1.927,0.060,-0.001,0.061

0,1,2,3
Omnibus:,14.677,Durbin-Watson:,1.257
Prob(Omnibus):,0.001,Jarque-Bera (JB):,21.161
Skew:,-0.939,Prob(JB):,2.54e-05
Kurtosis:,5.575,Cond. No.,532000.0


In [21]:
# Now since x2 has a higher o value 
X_opt= X[:,[0,3]]
SL= 0.05
regresoor_ols= sm.OLS(endog= Y, exog= X_opt).fit()
regresoor_ols.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.947
Model:,OLS,Adj. R-squared:,0.945
Method:,Least Squares,F-statistic:,849.8
Date:,"Sun, 30 Dec 2018",Prob (F-statistic):,3.5000000000000004e-32
Time:,01:15:03,Log-Likelihood:,-527.44
No. Observations:,50,AIC:,1059.0
Df Residuals:,48,BIC:,1063.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,4.903e+04,2537.897,19.320,0.000,4.39e+04,5.41e+04
x1,0.8543,0.029,29.151,0.000,0.795,0.913

0,1,2,3
Omnibus:,13.727,Durbin-Watson:,1.116
Prob(Omnibus):,0.001,Jarque-Bera (JB):,18.536
Skew:,-0.911,Prob(JB):,9.44e-05
Kurtosis:,5.361,Cond. No.,165000.0


#### This is a very very slow approach to implement backward elimination but we have done it for the purpose of the course. A wiser appraoch would be obviously to use a fucntion. 