## Assumptions of Linear Regression
- Linearity
- Homoscedasticity
- Multivariate Normality
- Independence of Errors
- Lack of multicollinearity


## 5 methods of building models
- All-in
- StepWise Regression
  - Backward Elimination
  - Forward Selection
  - Bidirectional Elimination
- Score Comparison

## Backward Elimination
1. Select significance level
2. Fit the full model with all possible predicitors
3. Consider the predictor with the highest p-value. If P > Significance Level, go to step 4. Repeat until this isn't true. Then your model is done!!!
4. Remove the predictor
5. Fit the model without this variable

## Forward Selection
1. start with a 1 variable simple linear regression
2. Fit all models with 1 variable at a time. Taking the one with the lowest P-value, add this to our one variable model
3. Consider the predictor with the lowest P-value. If P < significance level,go back to the previous step. Otherwise:
  - keep the previous model

## BiDirectional Elimination

1. Select a significance level to enter and to stay
2. Perform the 

In [3]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('50_Startups.csv')
dataset.head()


Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94


In [4]:
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,4].values

#We need to change those categorical variables

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:,3] = labelencoder_X.fit_transform(X[:,3])
onehotencoder = OneHotEncoder(categorical_features = [3])
X = onehotencoder.fit_transform(X).toarray()
X[0:10,:]


array([[  0.00000000e+00,   0.00000000e+00,   1.00000000e+00,
          1.65349200e+05,   1.36897800e+05,   4.71784100e+05],
       [  1.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          1.62597700e+05,   1.51377590e+05,   4.43898530e+05],
       [  0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.53441510e+05,   1.01145550e+05,   4.07934540e+05],
       [  0.00000000e+00,   0.00000000e+00,   1.00000000e+00,
          1.44372410e+05,   1.18671850e+05,   3.83199620e+05],
       [  0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.42107340e+05,   9.13917700e+04,   3.66168420e+05],
       [  0.00000000e+00,   0.00000000e+00,   1.00000000e+00,
          1.31876900e+05,   9.98147100e+04,   3.62861360e+05],
       [  1.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          1.34615460e+05,   1.47198870e+05,   1.27716820e+05],
       [  0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.30298130e+05,   1.45530060e+05,   3.23876680e+05],


In [5]:
X = X[:,1:]


Avoiding the Dummy Variable trap
We removed the first column(the first dummy varirable column) but we didn't need to do that because the sklearn would have done this for us automatically

In [6]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state = 0)



In [7]:
#Let's do the actual regression now
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [8]:
y_pred = regressor.predict(X_test)

In [9]:
X.shape

(50, 5)

### So, We found a prediction but did we find the best one?????

In [11]:
import statsmodels.formula.api as sm
X = np.append(arr = np.ones((50,1)).astype(int), values = X, axis = 1)
X[:10,:]

array([[  1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   1.65349200e+05,   1.36897800e+05,
          4.71784100e+05],
       [  1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   1.62597700e+05,   1.51377590e+05,
          4.43898530e+05],
       [  1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
          0.00000000e+00,   1.53441510e+05,   1.01145550e+05,
          4.07934540e+05],
       [  1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   1.44372410e+05,   1.18671850e+05,
          3.83199620e+05],
       [  1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
          0.00000000e+00,   1.42107340e+05,   9.13917700e+04,
          3.66168420e+05],
       [  1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   1.31876900e+05,   9.98147100e+04,
          3.62861360e+05],
       [  1.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   1.34

This may look backwards, but because we wanted the zeroes at the beginning, we actually added our regular matrix to the matrix of ones

In [14]:
X_opt = X[:,[0,1,2,3,4,5]]

regressor_OLS = sm.OLS(endog =y, exog = X_opt).fit()
regressor_OLS.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.948
Model:,OLS,Adj. R-squared:,0.943
Method:,Least Squares,F-statistic:,205.0
Date:,"Fri, 01 Dec 2017",Prob (F-statistic):,2.9e-28
Time:,10:16:01,Log-Likelihood:,-526.75
No. Observations:,50,AIC:,1064.0
Df Residuals:,45,BIC:,1073.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,2.73e+04,3185.530,8.571,0.000,2.09e+04,3.37e+04
x1,2.73e+04,3185.530,8.571,0.000,2.09e+04,3.37e+04
x2,1091.1075,3377.087,0.323,0.748,-5710.695,7892.910
x3,-39.3434,3309.047,-0.012,0.991,-6704.106,6625.420
x4,0.8609,0.031,27.665,0.000,0.798,0.924
x5,-0.0527,0.050,-1.045,0.301,-0.154,0.049

0,1,2,3
Omnibus:,14.275,Durbin-Watson:,1.197
Prob(Omnibus):,0.001,Jarque-Bera (JB):,19.26
Skew:,-0.953,Prob(JB):,6.57e-05
Kurtosis:,5.369,Cond. No.,7.08e+17


We need to remove index =2 because this has the highest P value of .991

In [15]:
X_opt = X[:,[0,1,3,4,5]]

regressor_OLS = sm.OLS(endog =y, exog = X_opt).fit()
regressor_OLS.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.948
Model:,OLS,Adj. R-squared:,0.944
Method:,Least Squares,F-statistic:,278.7
Date:,"Fri, 01 Dec 2017",Prob (F-statistic):,1.68e-29
Time:,10:17:35,Log-Likelihood:,-526.81
No. Observations:,50,AIC:,1062.0
Df Residuals:,46,BIC:,1069.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,2.753e+04,3072.973,8.960,0.000,2.13e+04,3.37e+04
x1,2.753e+04,3072.973,8.960,0.000,2.13e+04,3.37e+04
x2,-573.7029,2838.043,-0.202,0.841,-6286.386,5138.981
x3,0.8624,0.030,28.282,0.000,0.801,0.924
x4,-0.0530,0.050,-1.063,0.294,-0.154,0.047

0,1,2,3
Omnibus:,14.902,Durbin-Watson:,1.199
Prob(Omnibus):,0.001,Jarque-Bera (JB):,21.212
Skew:,-0.964,Prob(JB):,2.48e-05
Kurtosis:,5.543,Cond. No.,1.37e+17


In [16]:
X_opt = X[:,[0,1,4,5]]

regressor_OLS = sm.OLS(endog =y, exog = X_opt).fit()
regressor_OLS.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.948
Model:,OLS,Adj. R-squared:,0.946
Method:,Least Squares,F-statistic:,426.8
Date:,"Fri, 01 Dec 2017",Prob (F-statistic):,7.29e-31
Time:,10:18:34,Log-Likelihood:,-526.83
No. Observations:,50,AIC:,1060.0
Df Residuals:,47,BIC:,1065.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,2.744e+04,3008.359,9.122,0.000,2.14e+04,3.35e+04
x1,2.744e+04,3008.359,9.122,0.000,2.14e+04,3.35e+04
x2,0.8621,0.030,28.589,0.000,0.801,0.923
x3,-0.0530,0.049,-1.073,0.289,-0.152,0.046

0,1,2,3
Omnibus:,14.678,Durbin-Watson:,1.189
Prob(Omnibus):,0.001,Jarque-Bera (JB):,20.449
Skew:,-0.961,Prob(JB):,3.63e-05
Kurtosis:,5.474,Cond. No.,3.51e+17


In [17]:
X_opt = X[:,[0,1,4]]

regressor_OLS = sm.OLS(endog =y, exog = X_opt).fit()
regressor_OLS.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.947
Model:,OLS,Adj. R-squared:,0.945
Method:,Least Squares,F-statistic:,849.8
Date:,"Fri, 01 Dec 2017",Prob (F-statistic):,3.5000000000000004e-32
Time:,10:18:54,Log-Likelihood:,-527.44
No. Observations:,50,AIC:,1059.0
Df Residuals:,48,BIC:,1063.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,2.452e+04,1268.948,19.320,0.000,2.2e+04,2.71e+04
x1,2.452e+04,1268.948,19.320,0.000,2.2e+04,2.71e+04
x2,0.8543,0.029,29.151,0.000,0.795,0.913

0,1,2,3
Omnibus:,13.727,Durbin-Watson:,1.116
Prob(Omnibus):,0.001,Jarque-Bera (JB):,18.536
Skew:,-0.911,Prob(JB):,9.44e-05
Kurtosis:,5.361,Cond. No.,1.06e+17
