### 🔖 Label Encoding

**Label Encoding** is a method of converting categorical data into numerical values by assigning each unique category an integer label. This technique is particularly useful for ordinal data, where the categories have a meaningful order or ranking. However, for nominal data without an inherent order, label encoding can inadvertently introduce ordinal relationships, potentially leading to incorrect assumptions in predictive models.

For example, consider a dataset with a 'Size' feature:

| Size   |
|--------|
| Small  |
| Medium |
| Large  |

Applying label encoding:

| Size   | Encoded Size |
|--------|--------------|
| Small  | 0            |
| Medium | 1            |
| Large  | 2            |

In this case, the encoded values (0, 1, 2) reflect the inherent order of the 'Size' categories.

**Note:** When dealing with nominal data (categories without a meaningful order), it's advisable to use alternative encoding methods like **One-Hot Encoding** to prevent the introduction of spurious ordinal relationships.

For more details on label encoding, refer to the [scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html).


In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame(
    {
        "Name": ['cow', 'cat', 'dog', 'black', 'small', 'jack']
    }
)
df

Unnamed: 0,Name
0,cow
1,cat
2,dog
3,black
4,small
5,jack


In [3]:
from sklearn.preprocessing import LabelEncoder

In [5]:
le = LabelEncoder()
df['en_Name'] = le.fit_transform(df[['Name']])
df

  y = column_or_1d(y, warn=True)


Unnamed: 0,Name,en_Name
0,cow,2
1,cat,1
2,dog,3
3,black,0
4,small,5
5,jack,4


# Real example using dataset

In [6]:
dataset = pd.read_excel("Financial Sample.xlsx")
dataset.head(3)

Unnamed: 0,Segment,Country,Product,Discount Band,Units Sold,Manufacturing Price,Sale Price,Gross Sales,Discounts,Sales,COGS,Profit,Date,Month Number,Month Name,Year
0,Government,Canada,Carretera,,1618.5,3,20,32370.0,0.0,32370.0,16185.0,16185.0,2014-01-01,1,January,2014
1,Government,Germany,Carretera,,1321.0,3,20,26420.0,0.0,26420.0,13210.0,13210.0,2014-01-01,1,January,2014
2,Midmarket,France,Carretera,,2178.0,3,15,32670.0,0.0,32670.0,21780.0,10890.0,2014-06-01,6,June,2014


In [8]:
la = LabelEncoder()
dataset['en_segment'] = la.fit_transform(dataset['Segment'])
dataset.head(10)

Unnamed: 0,Segment,Country,Product,Discount Band,Units Sold,Manufacturing Price,Sale Price,Gross Sales,Discounts,Sales,COGS,Profit,Date,Month Number,Month Name,Year,en_segment
0,Government,Canada,Carretera,,1618.5,3,20,32370.0,0.0,32370.0,16185.0,16185.0,2014-01-01,1,January,2014,2
1,Government,Germany,Carretera,,1321.0,3,20,26420.0,0.0,26420.0,13210.0,13210.0,2014-01-01,1,January,2014,2
2,Midmarket,France,Carretera,,2178.0,3,15,32670.0,0.0,32670.0,21780.0,10890.0,2014-06-01,6,June,2014,3
3,Midmarket,Germany,Carretera,,888.0,3,15,13320.0,0.0,13320.0,8880.0,4440.0,2014-06-01,6,June,2014,3
4,Midmarket,Mexico,Carretera,,2470.0,3,15,37050.0,0.0,37050.0,24700.0,12350.0,2014-06-01,6,June,2014,3
5,Government,Germany,Carretera,,1513.0,3,350,529550.0,0.0,529550.0,393380.0,136170.0,2014-12-01,12,December,2014,2
6,Midmarket,Germany,Montana,,921.0,5,15,13815.0,0.0,13815.0,9210.0,4605.0,2014-03-01,3,March,2014,3
7,Channel Partners,Canada,Montana,,2518.0,5,12,30216.0,0.0,30216.0,7554.0,22662.0,2014-06-01,6,June,2014,0
8,Government,France,Montana,,1899.0,5,20,37980.0,0.0,37980.0,18990.0,18990.0,2014-06-01,6,June,2014,2
9,Channel Partners,Germany,Montana,,1545.0,5,12,18540.0,0.0,18540.0,4635.0,13905.0,2014-06-01,6,June,2014,0
