## Label Encoding 
Label encoding and ordinal encoding are two techniques used to encode categorical data as numerical data.

Label encoding involves assigning a unique numerical label to each category in the variable. The labels are usually assigned in alphabetical order or based on the frequency of the categories. For example, if we have a categorical variable "color" with three possible values (red, green, blue), we can represent it using label encoding as follows:

1. Red: 1
2. Green: 2
3. Blue: 3

In [2]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

df=pd.DataFrame({
    'color':['red','green','blue','red','blue','green']
})
df.head()


Unnamed: 0,color
0,red
1,green
2,blue
3,red
4,blue


In [3]:
from sklearn.preprocessing import LabelEncoder
lbl_encoder=LabelEncoder()
lbl_encoder.fit_transform(df[['color']])

  y = column_or_1d(y, warn=True)


array([2, 1, 0, 2, 0, 1])

In [6]:
lbl_encoder.transform([['red']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


array([2])

In [7]:
lbl_encoder.transform([['blue']])

array([0])

In [8]:
lbl_encoder.transform([['green']])

array([1])

##  Ordinal Encoding  Where we need to assign ranks at that point we are using ordinal Encoding
It is used to encode categorical data that have an intrinsic order or ranking. In this technique, each category is assigned a numerical value based on its position in the order. For example, if we have a categorical variable "education level" with four possible values (high school, college, graduate, post-graduate), we can represent it using ordinal encoding as follows:

1. High school: 1
2. College: 2
3. Graduate: 3
4. Post-graduate: 4

In [11]:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
df=pd.DataFrame({
    'size':['small','medium','large','medium','small','large']
})

In [12]:
df

Unnamed: 0,size
0,small
1,medium
2,large
3,medium
4,small
5,large


In [14]:
encoder=OrdinalEncoder(categories=[['small','medium','large']])

In [15]:
encoder.fit_transform(df[['size']])

array([[0.],
       [1.],
       [2.],
       [1.],
       [0.],
       [2.]])

In [17]:
encoder.transform([['small']])

array([[0.]])

## Target Guided Ordinal Encoding 
It is a technique used to encode categorical variables based on their relationship with the target variable. This encoding technique is useful when we have a categorical variable with a large number of unique categories, and we want to use this variable as a feature in our machine learning model.

In Target Guided Ordinal Encoding, we replace each category in the categorical variable with a numerical value based on the mean or median of the target variable for that category. This creates a monotonic relationship between the categorical variable and the target variable, which can improve the predictive power of our model.

In [3]:
import pandas as pd
df=pd.DataFrame({
    'city':['new_york','london','paris','tokya','new_york','paris'],
    'price':[200,150,300,250,180,320,]
})
df

Unnamed: 0,city,price
0,new_york,200
1,london,150
2,paris,300
3,tokya,250
4,new_york,180
5,paris,320


In [5]:
mean_price=df.groupby('city')['price'].mean().to_dict()

In [6]:
mean_price

{'london': 150.0, 'new_york': 190.0, 'paris': 310.0, 'tokya': 250.0}

In [7]:
df['encoded_city']=df['city'].map(mean_price)

In [8]:
df

Unnamed: 0,city,price,encoded_city
0,new_york,200,190.0
1,london,150,150.0
2,paris,300,310.0
3,tokya,250,250.0
4,new_york,180,190.0
5,paris,320,310.0


In [9]:
df[['price','encoded_city']]

Unnamed: 0,price,encoded_city
0,200,190.0
1,150,150.0
2,300,310.0
3,250,250.0
4,180,190.0
5,320,310.0


## internal Assignments

In [1]:
import seaborn as sns
df=sns.load_dataset('tips')

In [2]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [8]:
mean_time=df.groupby('time')['total_bill'].mean().to_dict()

In [9]:
df['encoded_time']=df['time'].map(mean_time)

In [10]:
df[['encoded_time','total_bill']]

Unnamed: 0,encoded_time,total_bill
0,20.797159,16.99
1,20.797159,10.34
2,20.797159,21.01
3,20.797159,23.68
4,20.797159,24.59
...,...,...
239,20.797159,29.03
240,20.797159,27.18
241,20.797159,22.67
242,20.797159,17.82
