# Target Guided Ordinal Encoding

Target guided ordinal encoding (also known as target encoding with ordinal mapping) is a technique used to encode categorical variables by leveraging the relationship between each category and the target variable. It’s particularly useful in supervised learning tasks where the target is either numeric (regression) or binary/class (classification).

In Target Guided Ordinal encoding , we replace each category in the categorical variable with a numerical value based on the mean or median of the target variable for that category. This creates a monotic relationship between the categorical variable and the target variable , which can improve the predictive power of our model.

In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({
    'city':['New York','London','Paris','Tokyo','New York','Paris'],
    'price':[200,150,300,250,180,320]
})

In [4]:
df.head() # price is output feature (target feature)

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,New York,180


In [9]:
## calculate the mean price of each city
mean_price=df.groupby('city')['price'].mean().to_dict()

In [10]:
mean_price

{'London': 150.0, 'New York': 190.0, 'Paris': 310.0, 'Tokyo': 250.0}

In [12]:
## replace each city with its mean price
df['city_encoded']=df['city'].map(mean_price)

In [13]:
df.head()

Unnamed: 0,city,price,city_encoded
0,New York,200,190.0
1,London,150,150.0
2,Paris,300,310.0
3,Tokyo,250,250.0
4,New York,180,190.0


In [14]:
import seaborn as sns 

In [15]:
df = sns.load_dataset('tips')

In [16]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [18]:
time_mean=df.groupby('time')['total_bill'].mean().to_dict()

  time_mean=df.groupby('time')['total_bill'].mean().to_dict()


In [19]:
df['time_encoded']=df['time'].map(time_mean)

In [20]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,time_encoded
0,16.99,1.01,Female,No,Sun,Dinner,2,20.797159
1,10.34,1.66,Male,No,Sun,Dinner,3,20.797159
2,21.01,3.5,Male,No,Sun,Dinner,3,20.797159
3,23.68,3.31,Male,No,Sun,Dinner,2,20.797159
4,24.59,3.61,Female,No,Sun,Dinner,4,20.797159


In [21]:
time_mean

{'Lunch': 17.168676470588235, 'Dinner': 20.79715909090909}

In [22]:
df = pd.DataFrame({
    'City': ['A', 'A', 'B', 'C'],
    'Price': [100, 300, 200, 400]
})

# Compute mean prices
mean_price = df.groupby('City')['Price'].mean()

# Rank cities based on mean price
ranked_cities = mean_price.sort_values().index

# Map cities to ranks (ordinal encoding)
df['City_encoded'] = df['City'].map({city: rank for rank, city in enumerate(ranked_cities)})

print(df)


  City  Price  City_encoded
0    A    100             0
1    A    300             0
2    B    200             1
3    C    400             2


Here, cities are ranked by their mean price and assigned ordinal values:

    City with the lowest mean price → rank 0.

    City with the highest mean price → rank 2.

enumerate(ranked_cities): The enumerate() function takes an iterable (like a list) and returns both the index and the value at each position in the list. The index is typically used as the rank, and the value is the city name.

Mean Encoding is same but in that we dont rank them , in target guided ordinal encoding we do