# Polynomial Regression

Polynomial regression can be considered as a linear regression. The non-linearity of regression model is determined by the regression coefficients' linearity, not the linearity of the independent variables.

In [1]:
#  Use PolynomialFeatures to change the features into polynomial features.

from sklearn.preprocessing import PolynomialFeatures
import numpy as np

# change [x1, x2] to [1, x1, x2, x1^2, x1x2, x2^2]
X = np.arange(4).reshape(2, 2)
print('일차 단항식 계수 피처:\n', X)

# degree = 2, to change to quadratic (polynomial with degree of 2) formula
poly = PolynomialFeatures(degree=2)
poly.fit(X)
poly_ftr = poly.transform(X)
print('변환된 2차 다항식 계수 피처:\n', poly_ftr)

일차 단항식 계수 피처:
 [[0 1]
 [2 3]]
변환된 2차 다항식 계수 피처:
 [[1. 0. 1. 0. 0. 1.]
 [1. 2. 3. 4. 6. 9.]]


for \[x1, x2\], when \[x1, x2\] = \[0, 1\], \[1, x1, x2, x1^2, x1x2, x2^2\] = \[1, 0, 1, 0, 0, 1\], and when \[x1, x2\] = \[2, 3\], \[1, x1, x2, x1^2, x1x2, x2^2\] = \[1, 2, 3, 4, 6, 9\]

### Example \#1

In [2]:
# Make sample polynomial regression, of y = 1 + 2x1 + 3x1^2 + 4x2^3

def polynomial_func(X):
    y = 1 + 2*X[:,0] + 3*X[:,0]**2 + 4*X[:,1]**3
    return y

X = np.arange(4).reshape(2, 2)
print('일차 단항식 계수 feature:\n', X)
y = polynomial_func(X)
print('삼차 다항식 결정값:\n', y)

일차 단항식 계수 feature:
 [[0 1]
 [2 3]]
삼차 다항식 결정값:
 [  5 125]


In [4]:
# change to polynomial of degree 3
poly_ftr = PolynomialFeatures(degree=3).fit_transform(X)
print('3차 다항식 계수 feature:\n', poly_ftr)

# Create LinearRegression model and find the regression coefficients of each
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(poly_ftr, y)
print('Polynomial 회귀 계수\n', np.round(model.coef_, 2))
print('Polynomial 회귀 Shape:', model.coef_.shape)

3차 다항식 계수 feature:
 [[ 1.  0.  1.  0.  0.  1.  0.  0.  0.  1.]
 [ 1.  2.  3.  4.  6.  9.  8. 12. 18. 27.]]
Polynomial 회귀 계수
 [0.   0.18 0.18 0.36 0.54 0.72 0.72 1.08 1.62 2.34]
Polynomial 회귀 Shape: (10,)


### Example \#2 (Same, but using pipeline)

In [7]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np

def polynomial_func(X):
    y = 1 + 2*X[:,0] + 3*X[:,0]**2 + 4*X[:,1]**3
    return y

# Create Pipeline
model = Pipeline([('poly', PolynomialFeatures(degree=3)),
                 ('linear', LinearRegression())])
X = np.arange(4).reshape(2, 2)
y = polynomial_func(X)

model = model.fit(X, y)

print('Polynomial 회귀 계수\n', np.round(model.named_steps['linear'].coef_, 2))

Polynomial 회귀 계수
 [0.   0.18 0.18 0.36 0.54 0.72 0.72 1.08 1.62 2.34]


## Polynomial Regression's Overfitting and Underfitting Problem

Although a high degree polynomial regression can model a very complex regression model, highering the degree of the polynomial regression would increase the risk of the model overfitting to the train dataset, which could lower the test dataset's prediction evaluation metrics.

![Example of overfitting/underfitting](https://scikit-learn.org/stable/_images/sphx_glr_plot_underfitting_overfitting_001.png)

[Link](https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html)


You can see from the graphs that when the degree is too low, the model is underfitted. When the degree is too high, the model is overfitted.

## Bias-Variance Trade Off

Often there exists a trade off relationship between bias and variance. Underfitting happens when bias is high but variance is low, and overfitting happens when bias is low but variance is high.

![Bias/Variance](https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/images/bias_variance/bullseye.png)
Illustration of bias and variance

![Bias-Variance Tradeoff](http://scott.fortmann-roe.com/docs/docs/BiasVariance/biasvariance.png)
Tradeoff between bias and variance

Hence, the key of this trade-off is to find the optimal model complexity point where the error minimises.