# Polynomial features

Polynomial features are those features created by raising existing features to an exponent.

For example, if a dataset had one input feature X, then a polynomial feature would be the addition of a new feature (column) where values were calculated by squaring the values in X, e.g. X^2. This process can be repeated for each input variable in the dataset, creating a transformed version of each.

As such, polynomial features are a type of feature engineering, e.g. the creation of new input features based on the existing features.

The “*degree*” of the polynomial is used to control the number of features added, e.g. a degree of 3 will add two new variables for each input variable. Typically a small degree is used such as 2 or 3.

In [1]:
import numpy as np
import pandas as pd

# Generating Features and Labels

In [2]:
X = np.arange(1,11)
X

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [3]:
y = np.arange(101,111)
y

array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

In [4]:
X = X.reshape(-1,1)

In [6]:
X

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10]])

# Feature Engineering by creating Polynomial Features

In [13]:
from sklearn.preprocessing import PolynomialFeatures
poly_feat = PolynomialFeatures(degree = 3)
X_features = poly_feat.fit_transform(X)

In [14]:
X_features

array([[   1.,    1.,    1.,    1.],
       [   1.,    2.,    4.,    8.],
       [   1.,    3.,    9.,   27.],
       [   1.,    4.,   16.,   64.],
       [   1.,    5.,   25.,  125.],
       [   1.,    6.,   36.,  216.],
       [   1.,    7.,   49.,  343.],
       [   1.,    8.,   64.,  512.],
       [   1.,    9.,   81.,  729.],
       [   1.,   10.,  100., 1000.]])

# Data Splitting

In [15]:
from sklearn.model_selection import train_test_split

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X_features, y, test_size = 0.2, random_state = 42)

In [17]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_Scaled = sc.fit_transform(X_train)
X_test_Scaled = sc.transform(X_test)

In [18]:
X_train_Scaled

array([[ 0.        ,  0.18569534, -0.04897021, -0.22239992],
       [ 0.        , -1.67125804, -1.19160846, -0.9054854 ],
       [ 0.        ,  0.92847669,  0.86514039,  0.71803403],
       [ 0.        , -0.92847669, -0.930434  , -0.82287971],
       [ 0.        ,  1.67125804,  2.04042545,  2.2684792 ],
       [ 0.        , -0.18569534, -0.40808509, -0.51151982],
       [ 0.        , -0.55708601, -0.70190635, -0.70532547],
       [ 0.        ,  0.55708601,  0.37543828,  0.18109708]])

In [19]:
X_test_Scaled

array([[ 0.        ,  1.29986737,  1.42013611,  1.40747379],
       [ 0.        , -1.29986737, -1.09366804, -0.8832454 ]])