# Target Guided Ordinal Encoding
It is a technique used to encode categorical variables based on their relationship with the target variable. This encoding technique is useful when we have a categorical variable with a large number of unique categories, and we want to use this variable as a feature in our machine learning model.

In Target Guided Ordinal Encoding, we replace each category in the categorical variable with a numerical value based on the mean or median of the target variable for that category. This creates a monotonic relationship between the categorical variable and the target variable, which can improve the predictive power of our model.

In [2]:
# Create a simple dataframe
import pandas as pd
df = pd.DataFrame({
                    'city' : ['New York', 'London', 'Paris', 'Tokyo', 'New York', 'Paris'],
                    'price' : [200, 150, 300, 250, 180, 320]
                 })
df

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,New York,180
5,Paris,320


City is categorical feature and price (numeric column) is target feature

In [4]:
mean_price = df.groupby('city')['price'].mean().to_dict()
mean_price

{'London': 150.0, 'New York': 190.0, 'Paris': 310.0, 'Tokyo': 250.0}

In [6]:
# adding encoded column
df['city_encoded'] = df['city'].map(mean_price)
df

Unnamed: 0,city,price,city_encoded
0,New York,200,190.0
1,London,150,150.0
2,Paris,300,310.0
3,Tokyo,250,250.0
4,New York,180,190.0
5,Paris,320,310.0


In [7]:
# for training of model we will be giving 'price', 'city_encoded'
# this is because the model will know that if value is 190 the city is New York
df[['price', 'city_encoded']]

Unnamed: 0,price,city_encoded
0,200,190.0
1,150,150.0
2,300,310.0
3,250,250.0
4,180,190.0
5,320,310.0


# Internal Assignment

In [9]:
import seaborn as sns
df = sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


Convert time based on the total bill

In [10]:
mean_price = df.groupby('time')['total_bill'].mean().to_dict()
mean_price

{'Lunch': 17.168676470588235, 'Dinner': 20.79715909090909}

In [11]:
# adding encoded column
df['time_encoded'] = df['time'].map(mean_price)
df

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,time_encoded
0,16.99,1.01,Female,No,Sun,Dinner,2,20.797159
1,10.34,1.66,Male,No,Sun,Dinner,3,20.797159
2,21.01,3.50,Male,No,Sun,Dinner,3,20.797159
3,23.68,3.31,Male,No,Sun,Dinner,2,20.797159
4,24.59,3.61,Female,No,Sun,Dinner,4,20.797159
...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,20.797159
240,27.18,2.00,Female,Yes,Sat,Dinner,2,20.797159
241,22.67,2.00,Male,Yes,Sat,Dinner,2,20.797159
242,17.82,1.75,Male,No,Sat,Dinner,2,20.797159


In [12]:
df[['time', 'time_encoded']]

Unnamed: 0,time,time_encoded
0,Dinner,20.797159
1,Dinner,20.797159
2,Dinner,20.797159
3,Dinner,20.797159
4,Dinner,20.797159
...,...,...
239,Dinner,20.797159
240,Dinner,20.797159
241,Dinner,20.797159
242,Dinner,20.797159
