In [3]:
import pandas as pd
df = pd.DataFrame({'city': ['New York', 'London', 'Paris', 'Tokyo', 'NewYork', 'Paris'],
                    'price':[200,150,300,250,180,320]
                   })
df

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,NewYork,180
5,Paris,320


In [6]:
mean_price = df.groupby('city')['price'].mean().to_dict()
mean_price

{'London': 150.0,
 'New York': 200.0,
 'NewYork': 180.0,
 'Paris': 310.0,
 'Tokyo': 250.0}

In [9]:
df['city_encoded'] = df['city'].map(mean_price)
df

Unnamed: 0,city,price,city_encoded
0,New York,200,200.0
1,London,150,150.0
2,Paris,300,310.0
3,Tokyo,250,250.0
4,NewYork,180,180.0
5,Paris,320,310.0


## Target-Guided Encoding Summary

**What is Target-Guided Encoding?**
- Target-Guided Encoding uses the target variable to encode categorical features
- Replaces categories with aggregated statistics of the target variable
- Creates a meaningful relationship between categorical features and the target

**Key Benefits:**
1. Captures the relationship between categorical features and target
2. Can improve model performance significantly
3. Handles high-cardinality categorical variables well
4. Creates monotonic relationships

**Common Aggregation Methods:**
- Mean of target variable for each category
- Median of target variable
- Mode of target variable
- Count of target variable
- Standard deviation of target variable

**When to Use:**
- High-cardinality categorical variables
- When you want to capture target-category relationships
- Regression problems with categorical features
- When other encoding methods don't work well

**Best Practices:**
- Use cross-validation to prevent overfitting
- Consider smoothing techniques for rare categories
- Monitor for data leakage in time series data
