# Feature Engineering

Feature Engineering is the process of applying domain-specific knowledge to expand the available data for subsequent steps. 

Loosely speaking, we'll be adding extra columns e.g. for categorisation or for the benefit of certain ML Models

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("../data/CompleteUG_clean.csv")
df.head(10)

In [None]:
df.shape

## Example - work out the quartile of the Overall Score

In [None]:
# create a function to work out the quartile. Could be done a number of ways, but as an illustration... 

def my_quantiles(df, col):
    q1 = df[col].quantile(.75)
    q2 = df[col].quantile(.5)
    q3 = df[col].quantile(.25)
    # nested function to evaluate the above against a given 
    def get_quantile(val):
        if val > q1: return 1
        if val > q2: return 2
        if val > q3: return 3
        else: return 4
    return get_quantile

# Assign "prepped" function to a variable
Overall_Score_Quartile = my_quantiles(df, "Overall_Score")

# Create a new column implementing this function
df["Score_Quartile"] = [ Overall_Score_Quartile( x ) for x in df["Overall_Score"]]
df.head(50)

## Example - adding categories as columns

In [None]:
used_cars_df = pd.DataFrame([
    {"Make": "Mercedes", "Model": "C-Class", "Value": 10000},
    {"Make": "Mercedes", "Model": "E-Class", "Value": 15000},
    {"Make": "BMW", "Model": "X3", "Value": 20000},
    {"Make": "Audi", "Model": "Q5", "Value": 25000},
    {"Make": "VW", "Model": "Golf", "Value": 5000},
    
    ])
used_cars_df

In [None]:
# turn categories into a matrix of columns populated with 1s and 0s

# This is required for a number of models, especially in ScikitLearn to work properly

pd.get_dummies(used_cars_df["Make"])

In [None]:
pd.get_dummies(used_cars_df["Make"], prefix="Make")

In [None]:
used_cars_df = used_cars_df.join(pd.get_dummies(used_cars_df["Make"], prefix="Make"))

#(Joins index-to-index)

In [None]:
used_cars_df

## Further Reading

### Automated Feature Tools

FeatureTools is a library to help automate the implementation of features with minimum effort.

https://github.com/Featuretools/featuretools
