### Polynomial regression
A simple linear regression can be extended by constructing polynomial features from the coefficients.

#### Equation
In the standard linear regression case, you might have a model that looks like this for two-dimensional data
$$\hat{y}(w, x) = w_0 + w_1 x_1 + w_2 x_2$$
If we want to fit a paraboloid to the data instead of a plane, we can combine the features in second-order polynomials, so that the model looks like this:
$$\hat{y}(w, x) = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_1 x_2 + w_4 x_1^2 + w_5 x_2^2$$
The (sometimes surprising) observation is that this is still a linear model: to see this, imagine creating a new set of features
$$z = [x_1, x_2, x_1 x_2, x_1^2, x_2^2]$$
With this re-labeling of the data, our problem can be written
$$\hat{y}(w, z) = w_0 + w_1 z_1 + w_2 z_2 + w_3 z_3 + w_4 z_4 + w_5 z_5$$

### 1. Data Loading

In [1]:
import pandas as pd

X_train = pd.read_csv('data/house_prices/X_train.csv')
X_test = pd.read_csv('data/house_prices/X_test.csv')
y_train = pd.read_csv('data/house_prices/y_train.csv', header = None)
y_test = pd.read_csv('data/house_prices/y_test.csv', header = None)
X_train.head(5)

Unnamed: 0,1stFlrSF,2ndFlrSF,3SsnPorch,BedroomAbvGr,BldgType,BsmtCond,BsmtExposure,BsmtFinSF1,BsmtFinSF2,BsmtFinType1,...,SaleType,ScreenPorch,Street,TotRmsAbvGrd,TotalBsmtSF,Utilities,WoodDeckSF,YearBuilt,YearRemodAdd,YrSold
0,1054,0,0,3,0,4,1,763,0,2,...,8,0,1,6,936,0,120,1963,1963,2010
1,1120,0,0,3,0,4,4,206,0,0,...,6,0,1,6,1120,0,0,2007,2007,2007
2,1616,0,0,3,0,4,0,0,0,6,...,8,0,1,7,1616,0,208,2005,2005,2006
3,1073,0,0,3,0,4,4,836,0,0,...,8,0,1,6,1073,0,0,1965,1965,2007
4,1389,0,0,2,0,4,0,1071,123,0,...,8,0,1,6,1389,0,240,1974,1975,2006


### 2. Polynomial Regression

In [2]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

polynomial_features= PolynomialFeatures(degree=2)
X_train_poly = polynomial_features.fit_transform(X_train)
X_test_poly = polynomial_features.fit_transform(X_test)
regressor = LinearRegression()
regressor.fit(X_train_poly, y_train)
y_pred = regressor.predict(X_test_poly)

In [3]:
import math
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

r2_variance_weighted = r2_score(y_test, y_pred, multioutput='variance_weighted')
r2_uniform_average = r2_score(y_test, y_pred, multioutput='uniform_average')
print('R squared:{:.2f}'.format(r2_uniform_average))
mse = mean_squared_error(y_test, y_pred)
rmse = math.sqrt(mse)
print('root mean square error: {:.2f}'.format(rmse))

R squared:-1226.11
root mean square error: 3001446.96
