# Target Encoding

Target encoding is similar to one-hot or label encoding, but uses features and this makes it a supervised feature engineering technique.

In [1]:
# Module importations
import pandas as pd

In [2]:
# Load data
autos = pd.read_csv(r'C:\Developer\scratch-pad-python\Datasets\Automobile_data.csv')

A target encoding replaces a feature's categories with a number derived from the target, such as applying the average price to each vehicle's make.

In [6]:
autos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   symboling          205 non-null    int64  
 1   normalized-losses  205 non-null    object 
 2   make               205 non-null    object 
 3   fuel-type          205 non-null    object 
 4   aspiration         205 non-null    object 
 5   num-of-doors       205 non-null    object 
 6   body-style         205 non-null    object 
 7   drive-wheels       205 non-null    object 
 8   engine-location    205 non-null    object 
 9   wheel-base         205 non-null    float64
 10  length             205 non-null    float64
 11  width              205 non-null    float64
 12  height             205 non-null    float64
 13  curb-weight        205 non-null    int64  
 14  engine-type        205 non-null    object 
 15  num-of-cylinders   205 non-null    object 
 16  engine-size        205 non

In [5]:
# Convert inappropriate data types
def convert_to_numeric(data_element):
    
    try:
        numeric_data = float(data_element)

    except:
        numeric_data = 0

    return numeric_data

autos['price'] = autos.apply(lambda row: convert_to_numeric(row['price']), axis = 1)

In [11]:
# Populate zero values with column mean
def populate_zero_with_mean(column_name, value):

    if value == 0:

        value = autos[column_name].mean()

    else:
        value = value

    return value

autos['price'] = autos.apply(lambda row: populate_zero_with_mean(column_name = 'price', value = row['price']), axis = 1)

In [12]:
# Create a target encoding on each make's mean price
autos['make_encoded'] = autos.groupby('make')['price'].transform('mean')

autos[['make', 'price', 'make_encoded']].head(10)

Unnamed: 0,make,price,make_encoded
0,alfa-romero,13495.0,15498.333333
1,alfa-romero,16500.0,15498.333333
2,alfa-romero,16500.0,15498.333333
3,audi,13950.0,17157.77561
4,audi,17450.0,17157.77561
5,audi,15250.0,17157.77561
6,audi,17710.0,17157.77561
7,audi,18920.0,17157.77561
8,audi,23875.0,17157.77561
9,audi,12949.429268,17157.77561
