For this project we will create an algorithm that take in a pandas dataframe and creates polynomial (higher degree) features from a numeric column, and returns a new dataframe with the polynomial features.
Given a numerical feature, we may want to create higher degree features. So given a value
Now, given a dataframe
import numpy as np
import pandas as pd
# So we want construct a function that takes in a dataframe
# and returns a new one with the specified poly features.
# (df, column_name, highest number of degress we want) |---> new df with poly features
def create_poly_from_df(df,column_name,p):
df_copy = df.copy()
# so the first thing we need to do is take in the dataframe
# and isolate the chosen feature.
data_col = df[column_name].values.copy()
#Now we want to run a loop that creates each polynomial feature.
features = []
new_feature_names = []
if p == 1:
return df
for i in range(1,p+1):
features.append(data_col**i) #creating the polynomial features.
# Creating the names of our new columns.
if i == 1:
new_feature_names.append(column_name)
else:
new_feature_names.append(column_name + f'^{i}')
poly_df = pd.DataFrame(np.array(features).T) #Our dataframe of polynomial features.
poly_df.columns = new_feature_names #Assigning column names
poly_df.index = df_copy.index #Assigning the index of the inputted dataframe.
if type(df) == pd.core.series.Series:
return poly_df #Dropping the feature from new_df because it is
# included in the new dataframe.
df_copy.drop(column_name,inplace = True,axis = 1)
return pd.concat([df_copy,poly_df],axis = 1) #Concatonating or old dataframe to the new one.
We can apply our function to the iris datasets sepal_length feature.
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
create_poly_from_df(iris,'sepal_length',3).head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
sepal_width | petal_length | petal_width | species | sepal_length | sepal_length^2 | sepal_length^3 | |
---|---|---|---|---|---|---|---|
0 | 3.5 | 1.4 | 0.2 | setosa | 5.1 | 26.01 | 132.651 |
1 | 3.0 | 1.4 | 0.2 | setosa | 4.9 | 24.01 | 117.649 |
2 | 3.2 | 1.3 | 0.2 | setosa | 4.7 | 22.09 | 103.823 |
3 | 3.1 | 1.5 | 0.2 | setosa | 4.6 | 21.16 | 97.336 |
4 | 3.6 | 1.4 | 0.2 | setosa | 5.0 | 25.00 | 125.000 |
We see our new dataframe has sepal_length, sepal_length squared, and sepal_length cubed.
Now we see how we can create high degree features.
create_poly_from_df(iris[['sepal_length']],'sepal_length',6).head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
sepal_length | sepal_length^2 | sepal_length^3 | sepal_length^4 | sepal_length^5 | sepal_length^6 | |
---|---|---|---|---|---|---|
0 | 5.1 | 26.01 | 132.651 | 676.5201 | 3450.25251 | 17596.287801 |
1 | 4.9 | 24.01 | 117.649 | 576.4801 | 2824.75249 | 13841.287201 |
2 | 4.7 | 22.09 | 103.823 | 487.9681 | 2293.45007 | 10779.215329 |
3 | 4.6 | 21.16 | 97.336 | 447.7456 | 2059.62976 | 9474.296896 |
4 | 5.0 | 25.00 | 125.000 | 625.0000 | 3125.00000 | 15625.000000 |