## Target Guided Ordinal Encoding 
It is a technique used to encode categorical variables based on their relationship with the target variable. This encoding technique is useful when we have a categorical variable with a large number of unique categories, and we want to use this variable as a feature in our machine learning model.

In Target Guided Ordinal Encoding, we replace each category in the categorical variable with a numerical value based on the mean or median of the target variable for that category. This creates a monotonic relationship between the categorical variable and the target variable, which can improve the predictive power of our model.

In [66]:
import pandas as pd

# create a sample dataframe with a categorical variable and a target variable
df = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'New York', 'Paris'],
    'price': [200, 150, 300, 250, 180, 320]
})

In [67]:
df

Unnamed: 0,city,price
0,New York,200
1,London,150
2,Paris,300
3,Tokyo,250
4,New York,180
5,Paris,320


In [68]:
list(df.groupby("city")["price"])

[('London',
  1    150
  Name: price, dtype: int64),
 ('New York',
  0    200
  4    180
  Name: price, dtype: int64),
 ('Paris',
  2    300
  5    320
  Name: price, dtype: int64),
 ('Tokyo',
  3    250
  Name: price, dtype: int64)]

In [69]:
list(df.groupby("city")["price"])[0]

('London',
 1    150
 Name: price, dtype: int64)

In [70]:
list(df.groupby("city")["price"])[0][1]

1    150
Name: price, dtype: int64

In [71]:
df.groupby("city")["price"].mean()

city
London      150.0
New York    190.0
Paris       310.0
Tokyo       250.0
Name: price, dtype: float64

In [72]:
list(df.groupby("city")["price"].mean()) ##key value pair

[150.0, 190.0, 310.0, 250.0]

In [73]:
mean_price = dict(df.groupby("city")["price"].mean()) ##key value pair

In [74]:
mean_price

{'London': 150.0, 'New York': 190.0, 'Paris': 310.0, 'Tokyo': 250.0}

In [75]:
df['city_encoded'] = df['city'].map(mean_price)

In [76]:
df

Unnamed: 0,city,price,city_encoded
0,New York,200,190.0
1,London,150,150.0
2,Paris,300,310.0
3,Tokyo,250,250.0
4,New York,180,190.0
5,Paris,320,310.0


In [77]:
import seaborn as sns

In [78]:
df =sns.load_dataset("tips")

In [79]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [80]:
list(df.groupby("time"))

[('Lunch',
       total_bill   tip     sex smoker   day   time  size
  77        27.20  4.00    Male     No  Thur  Lunch     4
  78        22.76  3.00    Male     No  Thur  Lunch     2
  79        17.29  2.71    Male     No  Thur  Lunch     2
  80        19.44  3.00    Male    Yes  Thur  Lunch     2
  81        16.66  3.40    Male     No  Thur  Lunch     2
  ..          ...   ...     ...    ...   ...    ...   ...
  222        8.58  1.92    Male    Yes   Fri  Lunch     1
  223       15.98  3.00  Female     No   Fri  Lunch     3
  224       13.42  1.58    Male    Yes   Fri  Lunch     2
  225       16.27  2.50  Female    Yes   Fri  Lunch     2
  226       10.09  2.00  Female    Yes   Fri  Lunch     2
  
  [68 rows x 7 columns]),
 ('Dinner',
       total_bill   tip     sex smoker   day    time  size
  0         16.99  1.01  Female     No   Sun  Dinner     2
  1         10.34  1.66    Male     No   Sun  Dinner     3
  2         21.01  3.50    Male     No   Sun  Dinner     3
  3         23.6

In [81]:
##now take out a particular tuple from the list

In [82]:
list(df.groupby("time"))[0]

('Lunch',
      total_bill   tip     sex smoker   day   time  size
 77        27.20  4.00    Male     No  Thur  Lunch     4
 78        22.76  3.00    Male     No  Thur  Lunch     2
 79        17.29  2.71    Male     No  Thur  Lunch     2
 80        19.44  3.00    Male    Yes  Thur  Lunch     2
 81        16.66  3.40    Male     No  Thur  Lunch     2
 ..          ...   ...     ...    ...   ...    ...   ...
 222        8.58  1.92    Male    Yes   Fri  Lunch     1
 223       15.98  3.00  Female     No   Fri  Lunch     3
 224       13.42  1.58    Male    Yes   Fri  Lunch     2
 225       16.27  2.50  Female    Yes   Fri  Lunch     2
 226       10.09  2.00  Female    Yes   Fri  Lunch     2
 
 [68 rows x 7 columns])

In [83]:
list(df.groupby("time"))[1]

('Dinner',
      total_bill   tip     sex smoker   day    time  size
 0         16.99  1.01  Female     No   Sun  Dinner     2
 1         10.34  1.66    Male     No   Sun  Dinner     3
 2         21.01  3.50    Male     No   Sun  Dinner     3
 3         23.68  3.31    Male     No   Sun  Dinner     2
 4         24.59  3.61  Female     No   Sun  Dinner     4
 ..          ...   ...     ...    ...   ...     ...   ...
 239       29.03  5.92    Male     No   Sat  Dinner     3
 240       27.18  2.00  Female    Yes   Sat  Dinner     2
 241       22.67  2.00    Male    Yes   Sat  Dinner     2
 242       17.82  1.75    Male     No   Sat  Dinner     2
 243       18.78  3.00  Female     No  Thur  Dinner     2
 
 [176 rows x 7 columns])

In [84]:
list(df.groupby("time")["total_bill"])##two objects in this list   #on the basis of bill

[('Lunch',
  77     27.20
  78     22.76
  79     17.29
  80     19.44
  81     16.66
         ...  
  222     8.58
  223    15.98
  224    13.42
  225    16.27
  226    10.09
  Name: total_bill, Length: 68, dtype: float64),
 ('Dinner',
  0      16.99
  1      10.34
  2      21.01
  3      23.68
  4      24.59
         ...  
  239    29.03
  240    27.18
  241    22.67
  242    17.82
  243    18.78
  Name: total_bill, Length: 176, dtype: float64)]

In [85]:
df.groupby("time")["total_bill"].mean()   #on the basis of bill

time
Lunch     17.168676
Dinner    20.797159
Name: total_bill, dtype: float64

In [86]:
(df.groupby("time")["total_bill"].mean()).to_dict()   #on the basis of bill

{'Lunch': 17.168676470588235, 'Dinner': 20.79715909090909}

In [87]:
mean_values = dict(df.groupby("time")["total_bill"].mean())   #same as above

In [88]:
mean_values

{'Lunch': 17.168676470588235, 'Dinner': 20.79715909090909}

In [89]:
df['time_changed'] = df['time'].map(mean_values)

In [91]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,time_changed
0,16.99,1.01,Female,No,Sun,Dinner,2,20.797159
1,10.34,1.66,Male,No,Sun,Dinner,3,20.797159
2,21.01,3.5,Male,No,Sun,Dinner,3,20.797159
3,23.68,3.31,Male,No,Sun,Dinner,2,20.797159
4,24.59,3.61,Female,No,Sun,Dinner,4,20.797159
