Target-guided ordinal encoding is a technique used in machine learning to convert categorical variables into ordinal numerical values based on the relationship between the categories and the target variable. It assigns ordinal labels to categories in a way that reflects their importance or likelihood of a particular outcome. This can be especially useful when dealing with categorical features that have a significant impact on the target variable.

Here's a step-by-step explanation of target-guided ordinal encoding:

1. **Calculate Target-Based Ordinal Labels**: For each unique category in the categorical variable, calculate a statistic (e.g., mean, median, or any other measure of central tendency) of the target variable (the variable you want to predict). This statistic represents the importance or likelihood of a particular outcome associated with each category.

2. **Order Categories**: Sort the categories based on the calculated statistic in ascending or descending order, depending on whether higher values of the statistic indicate a higher likelihood of the target variable.

3. **Assign Ordinal Labels**: Assign ordinal labels to the sorted categories. The category with the highest (or lowest, depending on the ordering) calculated statistic receives the highest (or lowest) label, and so on.

4. **Map Labels to the Original Data**: Replace the original categorical values in the dataset with the assigned ordinal labels.


In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({
    'city':['Newyork','London','Paris','Tokyo','Newyork','Paris'],
    'price': [200,150,300,250,180,320]
})

In [3]:
df

Unnamed: 0,city,price
0,Newyork,200
1,London,150
2,Paris,300
3,Tokyo,250
4,Newyork,180
5,Paris,320


In [6]:
df.groupby('city')['price'].mean()

city
London     150.0
Newyork    190.0
Paris      310.0
Tokyo      250.0
Name: price, dtype: float64

In [10]:
mean_price = df.groupby('city')['price'].mean().to_dict()
mean_price

{'London': 150.0, 'Newyork': 190.0, 'Paris': 310.0, 'Tokyo': 250.0}

In [11]:
df['encoded_city'] = df['city'].map(mean_price)

In [12]:
df

Unnamed: 0,city,price,encoded_city
0,Newyork,200,190.0
1,London,150,150.0
2,Paris,300,310.0
3,Tokyo,250,250.0
4,Newyork,180,190.0
5,Paris,320,310.0
