## Codio Exercise 8.2: Using `PolynomialFeatures`

**Estimated time: 60 minutes**

**Total Points: 20 Points**


This activity focuses on using the scikitlearn transformer `PolynomialFeatures`.  As seen in video 8.4, you can use this transformer to create the modified DataFrame with appropriate column names using the `.get_feature_names_out()` method on the fit transformer.  You will focus on building second, third, and fourth degree polynomial models using `PolynomialFeatures`, and converting the results to pandas DataFrames.

## Index:

 - [Problem 1](#Problem-1)
 - [Problem 2](#Problem-2)
 - [Problem 3](#Problem-3)
 - [Problem 4](#Problem-4)

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

### The Data

Again, the automobile dataset is used.  You will build the additional features using the `horsepower` column of the data.  

In [None]:
auto = pd.read_csv('data/auto.csv')

In [None]:
auto.head()

[Back to top](#Index:) 

## Problem 1

### Creating Quadratic Features

**5 Points**

Using the `horsepower` column, instantiate, fit, and transform a `PolynomialFeatures` transformer.  Instantiate the transformer and use the `degree` argument to create second degree polynomial features.  Assign your transformer to the variable `pfeatures`, and the transformed features as a numpy array to `quad_features` below.  Note that the `PolynomialFeatures` transformer expects a 2-dimensional input; one way to accomplish this is to use two sets of brackets to select a single column -- `auto[[colname]]`.

In [None]:
### GRADED

pfeatures = ''
quad_features = ''

### BEGIN SOLUTION
pfeatures = PolynomialFeatures()
quad_features = pfeatures.fit_transform(auto[['horsepower']])
### END SOLUTION

# Answer check
print(type(quad_features))

In [None]:
### BEGIN HIDDEN TESTS
pfeatures_ = PolynomialFeatures()
quad_features_ = pfeatures_.fit_transform(auto[['horsepower']])
#
#
#
np.testing.assert_array_equal(quad_features, quad_features_, err_msg=f'Arrays are not equal, expected {quad_features_}')
assert type(pfeatures) == type(pfeatures_)
### END HIDDEN TESTS

[Back to top](#Index:) 

## Problem 2

### Creating the DataFrame

**5 Points**

Using the transformed array, create a DataFrame of the transformed data.  As in the lectures, use the `get_feature_names_out()` method of your fit transformer `pfeatures` from above.  Drop the bias term from the DataFrame so that you only have the columns `horsepower` and `horsepower^2`.  Assign your response as a DataFrame to the variable `poly_features_df` below.

In [None]:
### GRADED

poly_features_df = ''

### BEGIN SOLUTION
pfeatures = PolynomialFeatures()
quad_features = pfeatures.fit_transform(auto[['horsepower']])
poly_features_df = pd.DataFrame(quad_features, columns=pfeatures.get_feature_names_out()).iloc[:, 1:]
### END SOLUTION

# Answer check
print(poly_features_df.shape)
poly_features_df.head()

In [None]:
### BEGIN HIDDEN TESTS
pfeatures_ = PolynomialFeatures()
quad_features_ = pfeatures_.fit_transform(auto[['horsepower']])
poly_features_df_ = pd.DataFrame(quad_features_, columns=pfeatures_.get_feature_names_out()).iloc[:, 1:]
#
#
#
assert poly_features_df.shape == poly_features_df_.shape
pd.testing.assert_frame_equal(poly_features_df, poly_features_df_)
### END HIDDEN TESTS

[Back to top](#Index:) 

## Problem 3

### DataFrame with Cubic Features

**5 Points**

Now, use a transformer to create a DataFrame with features for a third degree or cubic polynomial model.  Do this by setting the `degree=3` in your transformer.  As before, drop the bias term so that your final DataFrame has the shape `(392, 3)` and has feature names `horsepower`, `horsepower^2`, and `horsepower^3`.  Assign your results as a DataFrame to `cubic_features_df` below.

In [None]:
### GRADED

cubic_features_df = ''

### BEGIN SOLUTION
pfeatures = PolynomialFeatures(degree = 3)
cubic_features = pfeatures.fit_transform(auto[['horsepower']])
cubic_features_df = pd.DataFrame(cubic_features, columns=pfeatures.get_feature_names_out()).iloc[:, 1:]
### END SOLUTION

# Answer check
print(cubic_features_df.shape)
cubic_features_df.head()

In [None]:
### BEGIN HIDDEN TESTS
pfeatures_ = PolynomialFeatures(degree = 3)
cubic_features_ = pfeatures_.fit_transform(auto[['horsepower']])
cubic_features_df_ = pd.DataFrame(cubic_features_, columns=pfeatures_.get_feature_names_out()).iloc[:, 1:]
#
#
#
assert cubic_features_df.shape == cubic_features_df_.shape
pd.testing.assert_frame_equal(cubic_features_df, cubic_features_df_)
### END HIDDEN TESTS

[Back to top](#Index:) 

## Problem 4

### Experimenting with Multiple Features

**5 Points**

Now, experiment with building polynomial features for multiple columns.  Specifically, use a transformer to create a DataFrame of quadratic features (`degree = 2`) for the columns `horsepower` and `weight`.  Drop the bias term as before and examine the column names.  Note the existence of a new column `horsepower weight`.  Assign your transformed data as a DataFrame to `two_feature_poly_df` below.

In [None]:
### GRADED

cubic_features_df = ''

### BEGIN SOLUTION
pfeatures = PolynomialFeatures(degree = 2)
two_features = pfeatures.fit_transform(auto[['horsepower', 'weight']])
two_feature_poly_df = pd.DataFrame(two_features, columns=pfeatures.get_feature_names_out()).iloc[:, 1:]
### END SOLUTION

# Answer check
print(two_feature_poly_df.shape)
two_feature_poly_df.head()

In [None]:
### BEGIN HIDDEN TESTS
pfeatures_ = PolynomialFeatures(degree = 2)
two_features_ = pfeatures_.fit_transform(auto[['horsepower', 'weight']])
two_feature_poly_df_ = pd.DataFrame(two_features_, columns=pfeatures_.get_feature_names_out()).iloc[:, 1:]
#
#
#
assert two_feature_poly_df.shape == two_feature_poly_df_.shape
assert set(list(two_feature_poly_df.columns)) == set(list(two_feature_poly_df_.columns))
### END HIDDEN TESTS

#### Summary

Now that you have the hang of using `PolynomialFeatures`, you will combine the transformer with an estimator using scikitlearn's pipeline utilities.  As demonstrated in the videos, the pipeline is a handy abstraction for combining the data transformations with the model in a single object.  This is especially handy when making predictions with new data points.