# Supervised Learning Algorithms: Linear vs Polynomial Regression

*In this template, only **data input** and **input/target variables** need to be specified (see "Data Input & Variables" section for further instructions). None of the other sections needs to be adjusted. As a data input example, .csv file from IBM Box web repository is used.*

## 1. Libraries

*Run to import the required libraries.*

In [1]:
%matplotlib notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

## 2. Data Input and Variables

*Define the data input as well as the input (X) and target (y) variables and run the code. Do not change the data & variable names **['df', 'X', 'y']** as they are used in further sections.*

In [2]:
### Data Input
# df = 

### Defining Variables  
# X = 
# y = 

### Data Input Example 
df = pd.read_csv('https://ibm.box.com/shared/static/q6iiqb1pd7wo8r3q28jvgsrprzezjqk3.csv')

X = df[['horsepower']]
y = df['price']

## 3. The Models

### 3.1. Linear Regression

*Run to build the Linear Regression model.*

In [4]:
from sklearn.linear_model import LinearRegression

# train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                   random_state = 0)

# Linear regression def
linreg = LinearRegression().fit(X_train, y_train)

### intercept & coefficient, R-squared for training & test data set
print('linear model coeff (w): {}'
     .format(linreg.coef_))
print('linear model intercept (b): {:.3f}'
     .format(linreg.intercept_))
print('R-squared score (training): {:.3f}'
     .format(linreg.score(X_train, y_train)))
print('R-squared score (test): {:.3f}'
     .format(linreg.score(X_test, y_test)))

linear model coeff (w): [157.09522969]
linear model intercept (b): -3574.121
R-squared score (training): 0.623
R-squared score (test): 0.666


### 3.2. Polynomial Regression

*Run to build the Polynomial Regression model.*

In [6]:
from sklearn.preprocessing import PolynomialFeatures

'''
Now we transform the original input data to add
polynomial features up to degree 2

'''

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_poly, y,
                                                   random_state = 0)
# Polynomial regression def
linreg = LinearRegression().fit(X_train, y_train)

### intercept & coefficient, R-squared for training & test data set
print('(poly deg 2) linear model coeff (w):\n{}'
     .format(linreg.coef_))
print('(poly deg 2) linear model intercept (b): {:.3f}'
     .format(linreg.intercept_))
print('(poly deg 2) R-squared score (training): {:.3f}'
     .format(linreg.score(X_train, y_train)))
print('(poly deg 2) R-squared score (test): {:.3f}\n'
     .format(linreg.score(X_test, y_test)))

(poly deg 2) linear model coeff (w):
[0.00000000e+00 1.40923904e+02 6.48999886e-02]
(poly deg 2) linear model intercept (b): -2683.607
(poly deg 2) R-squared score (training): 0.623
(poly deg 2) R-squared score (test): 0.670



### 3.3. Polynomial Regression with Regularization

Run to build the Polynomial Regression model with a regularization penalty.

In [7]:
from sklearn.linear_model import Ridge

'''
Addition of many polynomial features often leads to
overfitting, so we often use polynomial features in combination
with regression that has a regularization penalty, like ridge
regression.
'''

X_train, X_test, y_train, y_test = train_test_split(X_poly, y,
                                                   random_state = 0)
linreg = Ridge().fit(X_train, y_train)

### intercept & coefficient, R-squared for training & test data set
print('(poly deg 2 + ridge) linear model coeff (w):\n{}'
     .format(linreg.coef_))
print('(poly deg 2 + ridge) linear model intercept (b): {:.3f}'
     .format(linreg.intercept_))
print('(poly deg 2 + ridge) R-squared score (training): {:.3f}'
     .format(linreg.score(X_train, y_train)))
print('(poly deg 2 + ridge) R-squared score (test): {:.3f}'
     .format(linreg.score(X_test, y_test)))

(poly deg 2 + ridge) linear model coeff (w):
[0.00000000e+00 1.40908895e+02 6.49573697e-02]
(poly deg 2 + ridge) linear model intercept (b): -2682.747
(poly deg 2 + ridge) R-squared score (training): 0.623
(poly deg 2 + ridge) R-squared score (test): 0.670
