### Multiple Linear Regression Introduction

In this notebook (and following quizzes), you will be creating a few simple linear regression models, as well as a multiple linear regression model, to predict home value.

Let's get started by importing the necessary libraries and reading in the data you will be using.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm;

df = pd.read_csv('02-Dataset/house_prices.csv')
df.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price
0,1112,B,1188,3,2,ranch,598291
1,491,B,3512,5,3,victorian,1744259
2,5952,B,1134,3,2,ranch,571669
3,3525,A,1940,4,2,ranch,493675
4,5108,B,2208,6,4,victorian,1101539


`1.` Using statsmodels, fit three individual simple linear regression models to predict price.  You should have a model that uses **area**, another using **bedrooms**, and a final one using **bathrooms**.  You will also want to use an intercept in each of your three models.

Use the results from each of your models to answer the first two quiz questions below.

$$prince = f(area, bedrooms, bathrooms)$$

In [2]:
# Adding the intercept
df['intercept'] = 1

# Creating the object.
lm = sm.OLS(df['price'], df[['intercept', 'area']])

# Calculating the coefficients.
results = lm.fit()

# Printing the summary.
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.678
Model:,OLS,Adj. R-squared:,0.678
Method:,Least Squares,F-statistic:,12690.0
Date:,"Thu, 10 Jan 2019",Prob (F-statistic):,0.0
Time:,19:03:14,Log-Likelihood:,-84517.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6026,BIC:,169100.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,9587.8878,7637.479,1.255,0.209,-5384.303,2.46e+04
area,348.4664,3.093,112.662,0.000,342.403,354.530

0,1,2,3
Omnibus:,368.609,Durbin-Watson:,2.007
Prob(Omnibus):,0.0,Jarque-Bera (JB):,349.279
Skew:,0.534,Prob(JB):,1.43e-76
Kurtosis:,2.499,Cond. No.,4930.0


In [3]:
# Adding the intercept
df['intercept'] = 1

# Creating the object.
lm = sm.OLS(df['price'], df[['intercept', 'bedrooms']])

# Calculating the coefficients.
results = lm.fit()

# Printing the summary.
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.553
Model:,OLS,Adj. R-squared:,0.553
Method:,Least Squares,F-statistic:,7446.0
Date:,"Thu, 10 Jan 2019",Prob (F-statistic):,0.0
Time:,19:03:14,Log-Likelihood:,-85509.0
No. Observations:,6028,AIC:,171000.0
Df Residuals:,6026,BIC:,171000.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,-9.485e+04,1.08e+04,-8.762,0.000,-1.16e+05,-7.36e+04
bedrooms,2.284e+05,2646.744,86.289,0.000,2.23e+05,2.34e+05

0,1,2,3
Omnibus:,967.118,Durbin-Watson:,2.014
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1599.431
Skew:,1.074,Prob(JB):,0.0
Kurtosis:,4.325,Cond. No.,10.3


In [4]:
# Adding the intercept
df['intercept'] = 1

# Creating the object.
lm = sm.OLS(df['price'], df[['intercept', 'bathrooms']])

# Calculating the coefficients.
results = lm.fit()

# Printing the summary.
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.541
Model:,OLS,Adj. R-squared:,0.541
Method:,Least Squares,F-statistic:,7116.0
Date:,"Thu, 10 Jan 2019",Prob (F-statistic):,0.0
Time:,19:03:14,Log-Likelihood:,-85583.0
No. Observations:,6028,AIC:,171200.0
Df Residuals:,6026,BIC:,171200.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,4.314e+04,9587.189,4.500,0.000,2.43e+04,6.19e+04
bathrooms,3.295e+05,3905.540,84.358,0.000,3.22e+05,3.37e+05

0,1,2,3
Omnibus:,915.429,Durbin-Watson:,2.003
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1537.531
Skew:,1.01,Prob(JB):,0.0
Kurtosis:,4.428,Cond. No.,5.84


`2.` Now that you have looked at the results from the simple linear regression models, let's try a multiple linear regression model using all three of these variables  at the same time.  You will still want an intercept in this model.

In [5]:
# Adding the intercept
df['intercept'] = 1

# Creating the object.
lm = sm.OLS(df['price'], df[['intercept', 'area', 'bathrooms', 'bedrooms']])

# Calculating the coefficients.
results = lm.fit()

# Printing the summary.
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.678
Model:,OLS,Adj. R-squared:,0.678
Method:,Least Squares,F-statistic:,4230.0
Date:,"Thu, 10 Jan 2019",Prob (F-statistic):,0.0
Time:,19:03:14,Log-Likelihood:,-84517.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6024,BIC:,169100.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,1.007e+04,1.04e+04,0.972,0.331,-1.02e+04,3.04e+04
area,345.9110,7.227,47.863,0.000,331.743,360.079
bathrooms,7345.3917,1.43e+04,0.515,0.607,-2.06e+04,3.53e+04
bedrooms,-2925.8063,1.03e+04,-0.285,0.775,-2.3e+04,1.72e+04

0,1,2,3
Omnibus:,367.658,Durbin-Watson:,2.007
Prob(Omnibus):,0.0,Jarque-Bera (JB):,350.116
Skew:,0.536,Prob(JB):,9.4e-77
Kurtosis:,2.503,Cond. No.,11600.0


`3.` Along with using the **area**, **bedrooms**, and **bathrooms** you might also want to use **style** to predict the price.  Try adding this to your multiple linear regression model.  What happens?  Use the final quiz below to provide your answer.

In [6]:
# Adding the intercept
df['intercept'] = 1

# Creating the object.
#lm = sm.OLS(df['price'], df[['intercept', 'area', 'bathrooms', 'bedrooms', 'style']])
lm = sm.OLS(df['price'], df[['intercept', 'area', 'bathrooms', 'bedrooms']])

# Calculating the coefficients.
results = lm.fit()

# Printing the summary.
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.678
Model:,OLS,Adj. R-squared:,0.678
Method:,Least Squares,F-statistic:,4230.0
Date:,"Thu, 10 Jan 2019",Prob (F-statistic):,0.0
Time:,19:03:14,Log-Likelihood:,-84517.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6024,BIC:,169100.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,1.007e+04,1.04e+04,0.972,0.331,-1.02e+04,3.04e+04
area,345.9110,7.227,47.863,0.000,331.743,360.079
bathrooms,7345.3917,1.43e+04,0.515,0.607,-2.06e+04,3.53e+04
bedrooms,-2925.8063,1.03e+04,-0.285,0.775,-2.3e+04,1.72e+04

0,1,2,3
Omnibus:,367.658,Durbin-Watson:,2.007
Prob(Omnibus):,0.0,Jarque-Bera (JB):,350.116
Skew:,0.536,Prob(JB):,9.4e-77
Kurtosis:,2.503,Cond. No.,11600.0


## Calculating the `coef`

$$ \beta = (X'X)^{-1}X'y $$

In [19]:
# Creating the X matrix.
X = df[['intercept', 'area', 'bathrooms', 'bedrooms']]

# Creating the X transpose.
X_t = np.transpose(X)

# Creating the y vector.
y = df['price']

$$ \beta = \underbrace{(X'X)^{-1}}_{A}X'y $$

In [31]:
# Calculating the A
A = np.linalg.inv(np.dot(X_t,X))

$$ \beta = \underbrace{AX'}_{B}y $$

In [32]:
# Calculating the B
B = np.dot(A,X_t)

$$ \beta = By $$

In [33]:
# Calculating the \beta
np.dot(B, y)

array([10072.10704673,   345.91101884,  7345.3917137 , -2925.80632467])