## Predicting House Prices with Linear Regression

We will fit **linear regression** to this and explore if all these attributes are significant. There are two main interpretations of linear regression in Python:
- statsmodels
- sklearn


### Using statsmodels

In [76]:
import statsmodels.api as sm

In [77]:
X = sm.add_constant(X) # adding a constant

In [79]:
# OLS is Ordinary Least Squares
lin_reg = sm.OLS(y,X)

In [81]:
type(lin_reg)

statsmodels.regression.linear_model.OLS

In [82]:
model = lin_reg.fit()
print_model = model.summary()
print(print_model)

                            OLS Regression Results                            
Dep. Variable:              SalePrice   R-squared:                       0.835
Model:                            OLS   Adj. R-squared:                  0.834
Method:                 Least Squares   F-statistic:                     732.0
Date:                Wed, 27 Apr 2022   Prob (F-statistic):               0.00
Time:                        17:22:13   Log-Likelihood:                -17206.
No. Observations:                1458   AIC:                         3.443e+04
Df Residuals:                    1447   BIC:                         3.449e+04
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
const        -8.992e+05   8.93e+04    -10.069   

### Using sklearn

In [83]:
from sklearn.linear_model import LinearRegression

In [84]:
regressor = LinearRegression()
regressor.fit(X, y)

LinearRegression()

In [85]:
print(regressor.coef_)

[     0.           5507.54189138    392.2863556   14466.78601472
    920.78618122     42.13854481     66.85496149 -11218.59562134
  11469.89475761   9314.43585305   1078.19597724]


The above are *beta coefficients*. They are ordered the same as our columns in `X`. The first one is 0 because we have added a constant column using `statsmodel` modeling. Tis column doesn't have any meaning in `sklearn` so we could have dropped that before.

The `statsmodel` provides a more complete and organized view of the results. Note that the *p-values* are missing.

To view the R-squared value:

In [86]:
regressor.score(X,y)

0.8349492071391901

The advantage of the `sklearn` implementation is that it is consistent with all other methods and models in this library.