## Quiz 1
In data.csv, you can see data generated for one predictor feature ('Var_X') and one outcome feature ('Var_Y'), following a non-linear trend. Use sklearn's PolynomialFeatures class to extend the predictor feature column into multiple columns with polynomial features.

1. Load in the data
  * The data is in the file called 'poly_reg_data.csv'. Note that this data has a header line.
  * Make sure that you've split out the data into the predictor feature in X and outcome feature in y.
  * For X, make sure it is in a 2-d array of 20 rows by 1 column. You might need to use NumPy's reshape function to accomplish this.

In [3]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

train_data = pd.read_csv("poly_reg_data.csv")
print(train_data)

      Var_X     Var_Y
0  -0.33532   6.66854
1   0.02160   3.86398
2  -1.19438   5.16161
3  -0.65046   8.43823
4  -0.28001   5.57201
5   1.93258 -11.13270
6   1.22620  -5.31226
7   0.74727  -4.63725
8   3.32853   3.80650
9   2.87457  -6.06084
10 -1.48662   7.22328
11  0.37629   2.38887
12  1.43918  -7.13415
13  0.24183   2.00412
14 -2.79140   4.29794
15  1.08176  -5.86553
16  2.81555  -5.20711
17  0.54924  -3.52863
18  2.36449 -10.16202
19 -1.01925   5.31123


In [45]:
# Unlike LinearRegression.fit, which can take fit_transform() takes numpy array as input
X = train_data[['Var_X']].to_numpy()
print(X)
print(X.shape)

# Alternative approach
print(train_data['Var_X'].values.reshape(-1, 1))

[[-0.33532]
 [ 0.0216 ]
 [-1.19438]
 [-0.65046]
 [-0.28001]
 [ 1.93258]
 [ 1.2262 ]
 [ 0.74727]
 [ 3.32853]
 [ 2.87457]
 [-1.48662]
 [ 0.37629]
 [ 1.43918]
 [ 0.24183]
 [-2.7914 ]
 [ 1.08176]
 [ 2.81555]
 [ 0.54924]
 [ 2.36449]
 [-1.01925]]
(20, 1)
[[-0.33532]
 [ 0.0216 ]
 [-1.19438]
 [-0.65046]
 [-0.28001]
 [ 1.93258]
 [ 1.2262 ]
 [ 0.74727]
 [ 3.32853]
 [ 2.87457]
 [-1.48662]
 [ 0.37629]
 [ 1.43918]
 [ 0.24183]
 [-2.7914 ]
 [ 1.08176]
 [ 2.81555]
 [ 0.54924]
 [ 2.36449]
 [-1.01925]]


In [42]:
y = train_data[['Var_Y']].to_numpy()
print(y)
print(y.shape)

[[  6.66854]
 [  3.86398]
 [  5.16161]
 [  8.43823]
 [  5.57201]
 [-11.1327 ]
 [ -5.31226]
 [ -4.63725]
 [  3.8065 ]
 [ -6.06084]
 [  7.22328]
 [  2.38887]
 [ -7.13415]
 [  2.00412]
 [  4.29794]
 [ -5.86553]
 [ -5.20711]
 [ -3.52863]
 [-10.16202]
 [  5.31123]]
(20, 1)


2. Create polynomial features
  * Create an instance of sklearn's PolynomialFeatures class and assign it to the variable poly_feat. Pay attention to how to set the degree of features, since that will be how the exercise is evaluated.
  * Create the polynomial features by using the PolynomialFeatures object's .fit_transform() method. The "fit" side of the method considers how many features are needed in the output, and the "transform" side applies those considerations to the data provided to the method as an argument. Assign the new feature matrix to the X_poly variable.

In [19]:
poly_feat = PolynomialFeatures(degree=3) # degree defaults to 2
X_poly = poly_feat.fit_transform(X)
print(X_poly.shape)

(20, 3)


3. Build a polynomial regression model
  * Create a polynomial regression model by combining sklearn's LinearRegression class with the polynomial features. Assign the fit model to poly_model.

In [28]:
poly_model = LinearRegression()
poly_model.fit(X_poly, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [53]:
# Predict a value
# - Create additional features for the value to be predicted
sample = poly_feat.fit_transform([[2.875]])
# - Use 3 degree featues, to make prediction
poly_model.predict(sample)

array([[-5.81934351]])