### Interpreting Coefficients
It is important that not only can you fit complex linear models, but that you then know which variables you can interpret.

In some cases, the coefficients of your linear regression models wouldn't be kept due to the lack of significance. But that is not the aim of this notebook - this notebook is strictly to assure you are comfortable with how to interpret coefficients when they are interpretable at all.

In [6]:
import numpy as np
import pandas as pd
import statsmodels.api as sm;

df = pd.read_csv('./house_prices.csv')
df.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price
0,1112,B,1188,3,2,ranch,598291
1,491,B,3512,5,3,victorian,1744259
2,5952,B,1134,3,2,ranch,571669
3,3525,A,1940,4,2,ranch,493675
4,5108,B,2208,6,4,victorian,1101539


We will be fitting a number of different models to this dataset throughout this notebook. For each model, there is a quiz question that will allow you to match the interpretations of the model coefficients to the corresponding values. If there is no 'nice' interpretation, this is also an option!

### Model 1
1. For the first model, fit a model to predict price using neighborhood, style, and the area of the home. Use the output to match the correct values to the corresponding interpretation in quiz 1 below. Don't forget an intercept! You will also need to build your dummy variables, and don't forget to drop one of the columns when you are fitting your linear model. It may be 

In [7]:
df['intercept'] = 1
df[['neig_a', 'neig_b', 'neig_c']] = pd.get_dummies(df['neighborhood'])
df[['is_lodge','is_ranch','is_victorian']] = pd.get_dummies(df['style'])
df.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price,intercept,neig_a,neig_b,neig_c,is_lodge,is_ranch,is_victorian
0,1112,B,1188,3,2,ranch,598291,1,0,1,0,0,1,0
1,491,B,3512,5,3,victorian,1744259,1,0,1,0,0,0,1
2,5952,B,1134,3,2,ranch,571669,1,0,1,0,0,1,0
3,3525,A,1940,4,2,ranch,493675,1,1,0,0,0,1,0
4,5108,B,2208,6,4,victorian,1101539,1,0,1,0,0,0,1


In [5]:
lm = sm.OLS(df['price'],df[['intercept','neig_a', 'neig_b','is_lodge','is_ranch','area']])
results = lm.fit()
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.919
Model:,OLS,Adj. R-squared:,0.919
Method:,Least Squares,F-statistic:,13720.0
Date:,"Mon, 10 May 2021",Prob (F-statistic):,0.0
Time:,14:20:48,Log-Likelihood:,-80348.0
No. Observations:,6028,AIC:,160700.0
Df Residuals:,6022,BIC:,160700.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,-2.046e+05,7699.704,-26.569,0.000,-2.2e+05,-1.89e+05
neig_a,-194.2464,4965.459,-0.039,0.969,-9928.324,9539.832
neig_b,5.243e+05,4687.484,111.844,0.000,5.15e+05,5.33e+05
is_lodge,6262.7365,6893.293,0.909,0.364,-7250.586,1.98e+04
is_ranch,4288.0333,5367.032,0.799,0.424,-6233.270,1.48e+04
area,348.7375,2.205,158.177,0.000,344.415,353.060

0,1,2,3
Omnibus:,114.369,Durbin-Watson:,2.002
Prob(Omnibus):,0.0,Jarque-Bera (JB):,139.082
Skew:,0.271,Prob(JB):,6.290000000000001e-31
Kurtosis:,3.509,Cond. No.,13500.0


### Interpretation:
- The predicted difference in the price of a home in neighborhood in A as compared to neighborhood C, holding other variables constant is 194.25
- For every one unit increase in the area of a home, we predict the price of the home to increase by 348.74 (holding all other variables constant)?
- The predicted difference in price between a victorian and lodge home, holding all other variables constant is 6262.74 (Lodge is more expensive by that amount)  s