# Label Encoding
- Labeled encoding (or label encoding) assigns each unique category (or label) in a categorical feature to a unique integer. For example:

- Labeled Encoding is a technique often used in data preprocessing, especially in machine learning, to convert categorical data into numerical representations while maintaining the integrity of the data labels.
- It's particularly useful when working with algorithms that require numerical input.

In [2]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

In [5]:
# Example dataset
data = {'Fruit': ['Apple', 'Banana', 'Cherry', 'Apple', 'Cherry', 'Banana']}
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Fruit
0,Apple
1,Banana
2,Cherry
3,Apple
4,Cherry


In [7]:
label = LabelEncoder()        # label encoder funciton used for label Encoding
df["en_Fruit"] = label.fit_transform(df["Fruit"])
df

Unnamed: 0,Fruit,en_Fruit
0,Apple,0
1,Banana,1
2,Cherry,2
3,Apple,0
4,Cherry,2
5,Banana,1


In [8]:
# Map Encoded Values Back to Original Labels
# Convert encoded values back to original labels
original_labels = label.inverse_transform(df['en_Fruit'])
print(original_labels)

['Apple' 'Banana' 'Cherry' 'Apple' 'Cherry' 'Banana']


### Real World Data Set

In [32]:
ds = pd.read_csv("Sales_data.csv")
ds.head()

Unnamed: 0,Group,Customer_Segment,Sales_Before,Sales_After,Customer_Satisfaction_Before,Customer_Satisfaction_After,Purchase_Made
0,Control,High Value,240.548359,300.007568,74.684767,,No
1,Treatment,High Value,246.862114,381.337555,100.0,100.0,Yes
2,Control,High Value,156.978084,179.330464,98.780735,100.0,No
3,Control,Medium Value,192.126708,229.278031,49.333766,39.811841,Yes
4,,High Value,229.685623,,83.974852,87.738591,Yes


In [33]:
ds["Customer_Segment"].unique()

array(['High Value', 'Medium Value', nan, 'Low Value'], dtype=object)

In [34]:
ds["Customer_Segment_Encoded"] = label.fit_transform(ds["Customer_Segment"])  # here we have created an new column
ds

Unnamed: 0,Group,Customer_Segment,Sales_Before,Sales_After,Customer_Satisfaction_Before,Customer_Satisfaction_After,Purchase_Made,Customer_Segment_Encoded
0,Control,High Value,240.548359,300.007568,74.684767,,No,0
1,Treatment,High Value,246.862114,381.337555,100.000000,100.000000,Yes,0
2,Control,High Value,156.978084,179.330464,98.780735,100.000000,No,0
3,Control,Medium Value,192.126708,229.278031,49.333766,39.811841,Yes,2
4,,High Value,229.685623,,83.974852,87.738591,Yes,0
...,...,...,...,...,...,...,...,...
9995,Treatment,,259.695935,415.181694,88.438776,98.418593,,3
9996,Control,High Value,186.488285,216.225457,92.261537,100.000000,,0
9997,Treatment,Low Value,208.107142,322.893351,55.915870,,No,1
9998,Treatment,Medium Value,,431.974901,66.082462,81.274030,No,2


In [35]:
ds["Customer_Segment"] = label.fit_transform(ds["Customer_Segment"])  # here we have saved the data in the same column
ds

Unnamed: 0,Group,Customer_Segment,Sales_Before,Sales_After,Customer_Satisfaction_Before,Customer_Satisfaction_After,Purchase_Made,Customer_Segment_Encoded
0,Control,0,240.548359,300.007568,74.684767,,No,0
1,Treatment,0,246.862114,381.337555,100.000000,100.000000,Yes,0
2,Control,0,156.978084,179.330464,98.780735,100.000000,No,0
3,Control,2,192.126708,229.278031,49.333766,39.811841,Yes,2
4,,0,229.685623,,83.974852,87.738591,Yes,0
...,...,...,...,...,...,...,...,...
9995,Treatment,3,259.695935,415.181694,88.438776,98.418593,,3
9996,Control,0,186.488285,216.225457,92.261537,100.000000,,0
9997,Treatment,1,208.107142,322.893351,55.915870,,No,1
9998,Treatment,2,,431.974901,66.082462,81.274030,No,2
