# Polynomial Regression

This method deals with models where the dependent variable is assumed to obey a *linear polynomial* relationship with the dependent variables,

\begin{equation}
    y = b_0 + b_1 x_1 + b_2 x_1^2 + \ldots + b_n x_1^n.
\end{equation}

Note that here *linear* refers to the relationship between the coefficients $b_i$. It is there
Linear polynomial regression is useful in modelling disease spread and numerous other systems.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('Position_Salaries.csv')
dataset

Unnamed: 0,Position,Level,Salary
0,Business Analyst,1,45000
1,Junior Consultant,2,50000
2,Senior Consultant,3,60000
3,Manager,4,80000
4,Country Manager,5,110000
5,Region Manager,6,150000
6,Partner,7,200000
7,Senior Partner,8,300000
8,C-level,9,500000
9,CEO,10,1000000


We want `X` to be a matrix of features, even though it's single columned. Remember that the upper limit in `1:2` is not included. As we only have 10 entries, we don't have enough data to split in to a training and test set. Also, the library we're using takes care of feature scaling.

In [2]:
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

## Fitting the Linear Regression to the dataset

As a reference model, we will first fit the data with a linear model.

In [3]:
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

## Fitting the Polynomial Regression to the dataset

We transform `X` to include polynomial terms to $n=2$ power.

In [7]:
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 2) 
X_poly = poly_reg.fit_transform(X)
X_poly

array([[  1.,   1.,   1.],
       [  1.,   2.,   4.],
       [  1.,   3.,   9.],
       [  1.,   4.,  16.],
       [  1.,   5.,  25.],
       [  1.,   6.,  36.],
       [  1.,   7.,  49.],
       [  1.,   8.,  64.],
       [  1.,   9.,  81.],
       [  1.,  10., 100.]])

Note that the column of ones for $x^0$ was added automatically. Let's now transform $X$ to include terms up to $n=4$ power.

In [8]:
poly_reg = PolynomialFeatures(degree = 4) 
X_poly = poly_reg.fit_transform(X)
X_poly

array([[1.000e+00, 1.000e+00, 1.000e+00, 1.000e+00, 1.000e+00],
       [1.000e+00, 2.000e+00, 4.000e+00, 8.000e+00, 1.600e+01],
       [1.000e+00, 3.000e+00, 9.000e+00, 2.700e+01, 8.100e+01],
       [1.000e+00, 4.000e+00, 1.600e+01, 6.400e+01, 2.560e+02],
       [1.000e+00, 5.000e+00, 2.500e+01, 1.250e+02, 6.250e+02],
       [1.000e+00, 6.000e+00, 3.600e+01, 2.160e+02, 1.296e+03],
       [1.000e+00, 7.000e+00, 4.900e+01, 3.430e+02, 2.401e+03],
       [1.000e+00, 8.000e+00, 6.400e+01, 5.120e+02, 4.096e+03],
       [1.000e+00, 9.000e+00, 8.100e+01, 7.290e+02, 6.561e+03],
       [1.000e+00, 1.000e+01, 1.000e+02, 1.000e+03, 1.000e+04]])

In [None]:
lin_reg_2.LinearRegression()
lin_reg_2.fit(X_poly, y)