### 多元线性回归简介

在 notebook 以及下面的测试题目中，你将创建几个简单的线性回归模型以及多元线性回归模型来预测房屋价值。

首先，让我们导入必要的库并读取你将使用的数据，然后开始你的任务吧。

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sms;

df = pd.read_csv('house_prices.csv')
df.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price
0,1112,B,1188,3,2,ranch,598291
1,491,B,3512,5,3,victorian,1744259
2,5952,B,1134,3,2,ranch,571669
3,3525,A,1940,4,2,ranch,493675
4,5108,B,2208,6,4,victorian,1101539


1. 使用 statsmodels，拟合三个单独的简单线性回归模型来预测价格。 在这三个模型中，应有一个使用 **area** ，一个使用 **bedrooms** ，另一个使用 **bathrooms** 。另外，每个模型中都要使用一个截距。 使用每个模型的结果来回答下面的前两个测试题目。


In [3]:
df['intercept'] = 1
lr_area = sms.OLS(df['price'],df['area'])
result = lr_area.fit()
result.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.895
Model:,OLS,Adj. R-squared:,0.895
Method:,Least Squares,F-statistic:,51520.0
Date:,"Sat, 29 Feb 2020",Prob (F-statistic):,0.0
Time:,21:29:06,Log-Likelihood:,-84518.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6027,BIC:,169000.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
area,351.8265,1.550,226.980,0.000,348.788,354.865

0,1,2,3
Omnibus:,361.722,Durbin-Watson:,2.007
Prob(Omnibus):,0.0,Jarque-Bera (JB):,315.288
Skew:,0.49,Prob(JB):,3.44e-69
Kurtosis:,2.458,Cond. No.,1.0


In [5]:
lr_bed = sms.OLS(df['price'],df['bedrooms'])
result = lr_bed.fit()
result.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.853
Model:,OLS,Adj. R-squared:,0.853
Method:,Least Squares,F-statistic:,34880.0
Date:,"Sat, 29 Feb 2020",Prob (F-statistic):,0.0
Time:,21:30:05,Log-Likelihood:,-85547.0
No. Observations:,6028,AIC:,171100.0
Df Residuals:,6027,BIC:,171100.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
bedrooms,2.073e+05,1110.004,186.759,0.000,2.05e+05,2.09e+05

0,1,2,3
Omnibus:,1275.663,Durbin-Watson:,2.009
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2509.103
Skew:,1.276,Prob(JB):,0.0
Kurtosis:,4.866,Cond. No.,1.0


In [6]:
lr_bath = sms.OLS(df['price'],df['bathrooms'])
result = lr_bath.fit()
result.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.85
Model:,OLS,Adj. R-squared:,0.85
Method:,Least Squares,F-statistic:,34250.0
Date:,"Sat, 29 Feb 2020",Prob (F-statistic):,0.0
Time:,21:31:25,Log-Likelihood:,-85593.0
No. Observations:,6028,AIC:,171200.0
Df Residuals:,6027,BIC:,171200.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
bathrooms,3.449e+05,1863.685,185.071,0.000,3.41e+05,3.49e+05

0,1,2,3
Omnibus:,746.297,Durbin-Watson:,2.001
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1138.986
Skew:,0.894,Prob(JB):,4.7000000000000004e-248
Kurtosis:,4.158,Cond. No.,1.0


2. 你已经看到了简单线性回归模型的结果，接下来，让我们同时使用这三个变量尝试一个多元线性回归模型。在这个模型中，做一个截距。

In [7]:
df.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price,intercept
0,1112,B,1188,3,2,ranch,598291,1
1,491,B,3512,5,3,victorian,1744259,1
2,5952,B,1134,3,2,ranch,571669,1
3,3525,A,1940,4,2,ranch,493675,1
4,5108,B,2208,6,4,victorian,1101539,1


In [9]:
mlr = sms.OLS(df['price'],df[['area','bedrooms','bathrooms']])
result = mlr.fit()
result.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.895
Model:,OLS,Adj. R-squared:,0.895
Method:,Least Squares,F-statistic:,17170.0
Date:,"Sat, 29 Feb 2020",Prob (F-statistic):,0.0
Time:,21:32:55,Log-Likelihood:,-84517.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6025,BIC:,169100.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
area,344.9076,7.153,48.218,0.000,330.885,358.930
bedrooms,3333.9846,7981.383,0.418,0.676,-1.23e+04,1.9e+04
bathrooms,1586.0714,1.3e+04,0.122,0.903,-2.39e+04,2.7e+04

0,1,2,3
Omnibus:,362.518,Durbin-Watson:,2.008
Prob(Omnibus):,0.0,Jarque-Bera (JB):,331.73
Skew:,0.513,Prob(JB):,9.24e-73
Kurtosis:,2.482,Cond. No.,9540.0


3. 除了使用 **area** 、 **bedrooms** 与 **bathrooms** ，你可能还想使用 **style** 来预测房屋价格。尝试把它添加到你的多元线性回归模型中吧。你得到了什么结果？通过下面的最后一道测试题目给出你的答案吧。

In [10]:
mlr = sms.OLS(df['price'],df[['area','bedrooms','bathrooms','style']])
result = mlr.fit()
result.summary()

ValueError: Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).