# Multiple Linear Regression

With simple linear regression, we had
$$\hat{y} = b_0 + b_1x_1$$

With multiple linear regression, we extend simple linear regression using both quantitative and categorical explanatory ($x$) variables to predict a quantitative response.

$$\hat{y} = b_0 + b_1x_1 + b_2x_2 + \dots + b_mx_m$$

### Example - House Prices

`1.` Using statsmodels, fit three individual simple linear regression models to predict price.  You should have a model that uses **area**, another using **bedrooms**, and a final one using **bathrooms**.

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df = pd.read_csv('./Data/house_prices.csv')

# Add intercept column
df['intercept'] = 1

df.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price,intercept
0,1112,B,1188,3,2,ranch,598291,1
1,491,B,3512,5,3,victorian,1744259,1
2,5952,B,1134,3,2,ranch,571669,1
3,3525,A,1940,4,2,ranch,493675,1
4,5108,B,2208,6,4,victorian,1101539,1


##### Simple Linear Regression - Area

In [3]:
# Create linear regression model
lm = sm.OLS(df['price'], df[['intercept', 'area']])
results = lm.fit()
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.678
Model:,OLS,Adj. R-squared:,0.678
Method:,Least Squares,F-statistic:,12690.0
Date:,"Fri, 12 Jan 2024",Prob (F-statistic):,0.0
Time:,20:07:05,Log-Likelihood:,-84517.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6026,BIC:,169100.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,9587.8878,7637.479,1.255,0.209,-5384.303,2.46e+04
area,348.4664,3.093,112.662,0.000,342.403,354.530

0,1,2,3
Omnibus:,368.609,Durbin-Watson:,2.007
Prob(Omnibus):,0.0,Jarque-Bera (JB):,349.279
Skew:,0.534,Prob(JB):,1.43e-76
Kurtosis:,2.499,Cond. No.,4930.0


##### Simple Linear Regression - Bedrooms

In [4]:
# Create linear regression model
lm = sm.OLS(df['price'], df[['intercept', 'bedrooms']])
results = lm.fit()
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.553
Model:,OLS,Adj. R-squared:,0.553
Method:,Least Squares,F-statistic:,7446.0
Date:,"Fri, 12 Jan 2024",Prob (F-statistic):,0.0
Time:,20:07:05,Log-Likelihood:,-85509.0
No. Observations:,6028,AIC:,171000.0
Df Residuals:,6026,BIC:,171000.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,-9.485e+04,1.08e+04,-8.762,0.000,-1.16e+05,-7.36e+04
bedrooms,2.284e+05,2646.744,86.289,0.000,2.23e+05,2.34e+05

0,1,2,3
Omnibus:,967.118,Durbin-Watson:,2.014
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1599.431
Skew:,1.074,Prob(JB):,0.0
Kurtosis:,4.325,Cond. No.,10.3


##### Simple Linear Regression - Bathrooms

In [5]:
# Create linear regression model
lm = sm.OLS(df['price'], df[['intercept', 'bathrooms']])
results = lm.fit()
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.541
Model:,OLS,Adj. R-squared:,0.541
Method:,Least Squares,F-statistic:,7116.0
Date:,"Fri, 12 Jan 2024",Prob (F-statistic):,0.0
Time:,20:07:05,Log-Likelihood:,-85583.0
No. Observations:,6028,AIC:,171200.0
Df Residuals:,6026,BIC:,171200.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,4.314e+04,9587.189,4.500,0.000,2.43e+04,6.19e+04
bathrooms,3.295e+05,3905.540,84.358,0.000,3.22e+05,3.37e+05

0,1,2,3
Omnibus:,915.429,Durbin-Watson:,2.003
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1537.531
Skew:,1.01,Prob(JB):,0.0
Kurtosis:,4.428,Cond. No.,5.84


Thus, we find that for each of our simple linear regression models using *area*, *bedrooms*, and *bathrooms* produces results that each is statistically significant.

However, we also can see that if we sum the R-squared values, we obtain a value greater than 1, which is not possible for independent variables. Thus, we can conclude that one or more of these variables is correlated.

`2.` Now that you have looked at the results from the simple linear regression models, let's try a multiple linear regression model using all three of these variables  at the same time.

##### Multiple Linear Regression - Quantitative Variables

In [6]:
# Create linear regression model
lm = sm.OLS(df['price'], df[['intercept', 'bathrooms', 'bedrooms', 'area']])
results = lm.fit()
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.678
Model:,OLS,Adj. R-squared:,0.678
Method:,Least Squares,F-statistic:,4230.0
Date:,"Fri, 12 Jan 2024",Prob (F-statistic):,0.0
Time:,20:07:05,Log-Likelihood:,-84517.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6024,BIC:,169100.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,1.007e+04,1.04e+04,0.972,0.331,-1.02e+04,3.04e+04
bathrooms,7345.3917,1.43e+04,0.515,0.607,-2.06e+04,3.53e+04
bedrooms,-2925.8063,1.03e+04,-0.285,0.775,-2.3e+04,1.72e+04
area,345.9110,7.227,47.863,0.000,331.743,360.079

0,1,2,3
Omnibus:,367.658,Durbin-Watson:,2.007
Prob(Omnibus):,0.0,Jarque-Bera (JB):,350.116
Skew:,0.536,Prob(JB):,9.4e-77
Kurtosis:,2.503,Cond. No.,11600.0


Using our multiple linear regression model, we find that only *area* is statistically significant. However, we need more tools to determine just why this is. Hint: [Multicollinearity](https://en.wikipedia.org/wiki/Multicollinearity).

We can also see that the R-squared value is less than 1.

`3.` Along with using the **area**, **bedrooms**, and **bathrooms** you might also want to use **style** to predict the price.  Try adding this to your multiple linear regression model.  What happens?

In [7]:
# Create linear regression model
lm = sm.OLS(df['price'], df[['intercept', 'bathrooms', 'bedrooms', 'area', 'style']])
results = lm.fit()
results.summary()

ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data). The types seen wereNone and intercept     int64
bathrooms     int64
bedrooms      int64
area          int64
style        object
dtype: object. The data was
0        598291
1       1744259
2        571669
3        493675
4       1101539
         ...   
6023     385420
6024     890627
6025     760829
6026     575515
6027     844747
Name: price, Length: 6028, dtype: int64
and
       intercept  bathrooms  bedrooms  area      style
0             1          2         3  1188      ranch
1             1          3         5  3512  victorian
2             1          2         3  1134      ranch
3             1          2         4  1940      ranch
4             1          4         6  2208  victorian
...         ...        ...       ...   ...        ...
6023          1          0         0   757      lodge
6024          1          3         5  3540  victorian
6025          1          1         2  1518      lodge
6026          1          2         4  2270      ranch
6027          1          3         5  3355  victorian

[6028 rows x 5 columns]
before. After,
[ 598291 1744259  571669 ...  760829  575515  844747]
[[1 2 3 1188 'ranch']
 [1 3 5 3512 'victorian']
 [1 2 3 1134 'ranch']
 ...
 [1 1 2 1518 'lodge']
 [1 2 4 2270 'ranch']
 [1 3 5 3355 'victorian']].

### Computing the Solution to Multiple Linear Regression

Credit: [Criteria for Estimates](https://web.stanford.edu/~mrosenfe/soc_meth_proj3/matrix_OLS_NYU_notes.pdf)

Our *estimates* f the population parameters are referred to as $\hat{\beta}$. Recall that the criteria we use for obtaining our estimates is to find the estimator $\hat{\beta}$ that minimizes the sum of squared residuals ($\sum{e_i^2}$ in scalar notation).

The vector of residuals $e$ is given by:
$$\mathbf{e} = \mathbf{y} - \mathbf{X}\hat{\mathbf{\beta}}$$

The sum of squared residuals (RSS) is $\mathbf{e}^Te$.
$$\begin{bmatrix}e_1&e_2&\dots&\dots&e_n\end{bmatrix}_{1\times n}\begin{bmatrix}e_1\\ e_2\\\vdots\\\vdots\\ e_n\end{bmatrix}_{n\times1} = \begin{bmatrix}e_1\times e_1 + e_2\times e_2 + \dots + e_n\times e_n\end{bmatrix}_{1\times1}$$

Thus, we can write
$$
\begin{equation}
    \label{eq:RSS}
    \begin{aligned}
        e^Te\;=&\;(\mathbf{y}-\mathbf{X}\hat{\mathbf{\beta}})^T(\mathbf{y}-\mathbf{X}\hat{\mathbf{\beta}}) \\
        =&\;\mathbf{y}^T\mathbf{y}-\hat{\mathbf{\beta}}^T\mathbf{X}^T\mathbf{y}-\mathbf{y}^T\mathbf{X}\hat{\mathbf{\beta}}+\hat{\mathbf{\beta}}^T\mathbf{X}^T\mathbf{X}\hat{\mathbf{\beta}}\\
        =&\;\mathbf{y}^T\mathbf{y} - 2\hat{\mathbf{\beta}}^T\mathbf{X}^T\mathbf{y}+\hat{\mathbf{\beta}}^T\mathbf{X}^T\mathbf{X}\hat{\mathbf{\beta}}
    \end{aligned}
\end{equation}
$$

where we use the fact that the transpose of a scalar is the scalar, i.e., $\mathbf{y}^T\mathbf{X}\hat{\mathbf{\beta}}=(\mathbf{y}^T\mathbf{X}\hat{\mathbf{\beta}})^T=\hat{\mathbf{\beta}}^T\mathbf{X}^T\mathbf{y}$.

To find the $\hat{\mathbf{\beta}}$ that minimizes the RSS, we take the derivative of Eq. $\eqref{eq:RSS}$