### Interpreting Coefficients

It is important that not only can you fit complex linear models, but that you then know which variables you can interpret. 

In this notebook, you will fit a few different models and use the quizzes below to match the appropriate interpretations to your coefficients when possible.

In some cases, the coefficients of your linear regression models wouldn't be kept due to the lack of significance. But that is not the aim of this notebook - **this notebook is strictly to assure you are comfortable with how to interpret coefficients when they are interpretable at all**.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm;

df = pd.read_csv('./house_prices.csv')
df.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price
0,1112,B,1188,3,2,ranch,598291
1,491,B,3512,5,3,victorian,1744259
2,5952,B,1134,3,2,ranch,571669
3,3525,A,1940,4,2,ranch,493675
4,5108,B,2208,6,4,victorian,1101539


We will be fitting a number of different models to this dataset throughout this notebook.  For each model, there is a quiz question that will allow you to match the interpretations of the model coefficients to the corresponding values.  If there is no 'nice' interpretation, this is also an option!

### Model 1

`1.` For the first model, fit a model to predict `price` using `neighborhood`, `style`, and the `area` of the home.  Use the output to match the correct values to the corresponding interpretation in quiz 1 below.  Don't forget an intercept!  You will also need to build your dummy variables, and don't forget to drop one of the columns when you are fitting your linear model. It may be easiest to connect your interpretations to the values in the first quiz by creating the baselines as neighborhood C and home style **lodge**.

In [3]:
dummy_hood = pd.get_dummies(df['neighborhood'])
dummy_style = pd.get_dummies(df['style'])
df2 = df.join(dummy_hood.join(dummy_style))
df2['intercept'] = 1
df2.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price,A,B,C,lodge,ranch,victorian,intercept
0,1112,B,1188,3,2,ranch,598291,0,1,0,0,1,0,1
1,491,B,3512,5,3,victorian,1744259,0,1,0,0,0,1,1
2,5952,B,1134,3,2,ranch,571669,0,1,0,0,1,0,1
3,3525,A,1940,4,2,ranch,493675,1,0,0,0,1,0,1
4,5108,B,2208,6,4,victorian,1101539,0,1,0,0,0,1,1


In [8]:
mod = sm.OLS(df2['price'], df2[['area', 'A', 'B', 'ranch','victorian', 'intercept']])
res = mod.fit()
print(res.summary())

OLS Regression Results                            
Dep. Variable:                  price   R-squared:                       0.919
Model:                            OLS   Adj. R-squared:                  0.919
Method:                 Least Squares   F-statistic:                 1.372e+04
Date:                Sat, 18 Apr 2020   Prob (F-statistic):               0.00
Time:                        16:30:06   Log-Likelihood:                -80348.
No. Observations:                6028   AIC:                         1.607e+05
Df Residuals:                    6022   BIC:                         1.607e+05
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
area         348.7375      2.205    158.177      0.000     344.415     353.060
A

### Model 2

`2.` Now let's try a second model for predicting price.  This time, use `area` and `area squared` to predict price.  Also use the `style` of the home, but not `neighborhood` this time. You will again need to use your dummy variables, and add an intercept to the model. Use the results of your model to answer quiz questions 2 and 3.

In [7]:
df2['area_squared'] = df2['area']**2

In [9]:
mod = sm.OLS(df2['price'], df2[['area', 'area_squared', 'ranch','victorian', 'intercept']])
res = mod.fit()
print(res.summary())

OLS Regression Results                            
Dep. Variable:                  price   R-squared:                       0.678
Model:                            OLS   Adj. R-squared:                  0.678
Method:                 Least Squares   F-statistic:                     3173.
Date:                Sat, 18 Apr 2020   Prob (F-statistic):               0.00
Time:                        16:30:20   Log-Likelihood:                -84516.
No. Observations:                6028   AIC:                         1.690e+05
Df Residuals:                    6023   BIC:                         1.691e+05
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
area           334.0146     13.525     24.696      0.000     307.501     360

In [10]:
mod = sm.OLS(df2['price'], df2[['A', 'B', 'area', 'area_squared', 'ranch','victorian', 'intercept']])
res = mod.fit()
print(res.summary())

OLS Regression Results                            
Dep. Variable:                  price   R-squared:                       0.919
Model:                            OLS   Adj. R-squared:                  0.919
Method:                 Least Squares   F-statistic:                 1.144e+04
Date:                Sat, 18 Apr 2020   Prob (F-statistic):               0.00
Time:                        16:31:31   Log-Likelihood:                -80345.
No. Observations:                6028   AIC:                         1.607e+05
Df Residuals:                    6021   BIC:                         1.608e+05
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
A             -248.3127   4963.602     -0.050      0.960   -9978.750    9482