# Polynomial Regression

Implementing polynomial regression with scikit-learn is very similar to linear regression. There is only one extra step: you need to transform the array of inputs to include non-linear terms such as 𝑥².

## Step 1: Import packages and classes

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

## Step 2 (a): Provide data

In [3]:
x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([15, 11, 2, 8, 25, 32])

## Step 2 (b): Transform input data

As you’ve seen earlier, you need to include 𝑥² (and perhaps other terms) as additional features when implementing polynomial regression. For that reason, you should transform the input array x to contain the additional column(s) with the values of 𝑥² (and eventually more features)

In [4]:
transformer = PolynomialFeatures(degree=2, include_bias=False)

Before applying transformer, you need to fit it with .fit():

In [5]:
transformer.fit(x)

PolynomialFeatures(degree=2, include_bias=False, interaction_only=False,
                   order='C')

Once transformer is fitted, it’s ready to create a new, modified input. You apply .transform() to do that:

In [6]:
x_ = transformer.transform(x)

You can also use .fit_transform() to replace the three previous statements with only one:

In [8]:
x_ = PolynomialFeatures(degree=2, include_bias=False).fit_transform(x)
print(x_)

[[   5.   25.]
 [  15.  225.]
 [  25.  625.]
 [  35. 1225.]
 [  45. 2025.]
 [  55. 3025.]]


## Step 3: Create a model and fit it

In [9]:
model = LinearRegression().fit(x_, y)

## Step 4: Get results

In [11]:
r_sq = model.score(x_, y)
print('coefficient of determination:', r_sq)
print('intercept:', model.intercept_)
print('coefficients:', model.coef_)

coefficient of determination: 0.8908516262498564
intercept: 21.372321428571425
coefficients: [-1.32357143  0.02839286]


You can obtain a very similar result with different transformation and regression arguments:

In [13]:
x_ = PolynomialFeatures(degree=2, include_bias=True).fit_transform(x)
x_

array([[1.000e+00, 5.000e+00, 2.500e+01],
       [1.000e+00, 1.500e+01, 2.250e+02],
       [1.000e+00, 2.500e+01, 6.250e+02],
       [1.000e+00, 3.500e+01, 1.225e+03],
       [1.000e+00, 4.500e+01, 2.025e+03],
       [1.000e+00, 5.500e+01, 3.025e+03]])

The first column of x_ contains ones, the second has the values of x, while the third holds the squares of x

The intercept is already included with the leftmost column of ones, and you don’t need to include it again when creating the instance of LinearRegression. Thus, you can provide fit_intercept=False. This is how the next statement looks:

In [17]:
model = LinearRegression(fit_intercept=False).fit(x_, y)

The variable model again corresponds to the new input array x_. Therefore x_ should be passed as the first argument instead of x.

This approach yields the following results, which are similar to the previous case:

In [18]:
r_sq = model.score(x_, y)
print('coefficient of determination:', r_sq)

print('intercept:', model.intercept_)

print('coefficients:', model.coef_)

coefficient of determination: 0.8908516262498565
intercept: 0.0
coefficients: [21.37232143 -1.32357143  0.02839286]


## Step 5: Predict response

In [19]:
y_pred = model.predict(x_)
print('predicted response:', y_pred, sep='\n')

predicted response:
[15.46428571  7.90714286  6.02857143  9.82857143 19.30714286 34.46428571]


As you can see, the prediction works almost the same way as in the case of linear regression. It just requires the modified input instead of the original