# Polynomial Regression Notebook

#### *Author: Kunyu He*
#### *University of Chicago, CAPP'20*

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

%matplotlib notebook

### Load Data

In [4]:
salary = pd.read_csv("Position_Salaries.csv")
salary.head()

Unnamed: 0,Position,Level,Salary
0,Business Analyst,1,45000
1,Junior Consultant,2,50000
2,Senior Consultant,3,60000
3,Manager,4,80000
4,Country Manager,5,110000


### Data Cleaning

In [5]:
salary.isnull().sum()

Position    0
Level       0
Salary      0
dtype: int64

No value missing.

### Feature Selection

In [11]:
X = salary.iloc[:, 1:2].values
X.shape

(10, 1)

In [12]:
y = salary.Salary.values
y.shape

(10,)

### Model Training

As we only have ten observations, we are using the whole data set to train our model.

#### SLR as the benchmark

In [15]:
lin_reg = LinearRegression().fit(X, y)

#### Polynomial Regression

In [20]:
poly = PolynomialFeatures(degree=2)
X_poly2 = poly.fit_transform(X)
X_poly2

array([[  1.,   1.,   1.],
       [  1.,   2.,   4.],
       [  1.,   3.,   9.],
       [  1.,   4.,  16.],
       [  1.,   5.,  25.],
       [  1.,   6.,  36.],
       [  1.,   7.,  49.],
       [  1.,   8.,  64.],
       [  1.,   9.,  81.],
       [  1.,  10., 100.]])

Notice that `PolynomialFeatures` has inserted the constant term for us.

In [34]:
poly2_reg = LinearRegression().fit(X_poly2, y)

### Model Evaluation

In [35]:
plt.scatter(X, y, color="red")
plt.plot(X, lin_reg.predict(X), color="blue")

plt.title("Salary against Position Level (with SLR)")
plt.xlabel("Position Level")
plt.ylabel("Salary ($)")
plt.show()

<IPython.core.display.Javascript object>

In [42]:
plt.scatter(X, y, color="red")
plt.plot(X, poly2_reg.predict(X_poly2), color="blue")

plt.title("Salary against Position Level (with polynomial of degree 2)")
plt.xlabel("Position Level")
plt.ylabel("Salary ($)")
plt.show()

<IPython.core.display.Javascript object>

In [44]:
poly = PolynomialFeatures(degree=3)
X_poly3 = poly.fit_transform(X)
poly3_reg = LinearRegression().fit(X_poly3, y)

In [45]:
plt.scatter(X, y, color="red")
plt.plot(X, poly3_reg.predict(X_poly3), color="blue")

plt.title("Salary against Position Level (with polynomial of degree 3)")
plt.xlabel("Position Level")
plt.ylabel("Salary ($)")
plt.show()

<IPython.core.display.Javascript object>

In [47]:
poly = PolynomialFeatures(degree=4)
X_poly4 = poly.fit_transform(X)
poly4_reg = LinearRegression().fit(X_poly4, y)

In [48]:
plt.scatter(X, y, color="red")
plt.plot(X, poly4_reg.predict(X_poly4), color="blue")

plt.title("Salary against Position Level (with polynomial of degree 4)")
plt.xlabel("Position Level")
plt.ylabel("Salary ($)")
plt.show()

<IPython.core.display.Javascript object>