# Label Encoding
Label encoding and ordinal encoding are two techniques used to encode categorical data as numerical data.

Label encoding involves assigning a unique numerical label to each category in the variable. The labels are usually assigned in alphabetical order or based on the frequency of the categories. For example, if we have a categorical variable "color" with three possible values (red, green, blue), we can represent it using label encoding as follows:

1. Red: 2
2. Green: 1
3. Blue: 0

In [2]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

In [3]:
# Create a simple dataframe
df = pd.DataFrame({
                      'color' : ['red', 'blue', 'green', 'red', 'blue']
                  })
df.head()

Unnamed: 0,color
0,red
1,blue
2,green
3,red
4,blue


In [6]:
# Create an instance of label encoder
lbl_encoder = LabelEncoder()
# Fit and transform
lbl_encoder.fit_transform(df[['color']])

  y = column_or_1d(y, warn=True)


array([2, 0, 1, 2, 0])

In [7]:
lbl_encoder.transform([['red']]), lbl_encoder.transform([['blue']]), lbl_encoder.transform([['green']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)
  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)
  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


(array([2]), array([0]), array([1]))

The problem with Label Encoder it assigns unique values.
1. Red: 2
2. Green: 1
3. Blue: 0

ML Model may assume that red is greater than green greater than blue

# Ordinal Encoding
It is used to encode categorical data that have an intrinsic order or ranking. In this technique, each category is assigned a numerical value based on its position in the order. For example, if we have a categorical variable "education level" with four possible values (high school, college, graduate, post-graduate), we can represent it using ordinal encoding as follows:

1. High school: 1
2. College: 2
3. Graduate: 3
4. Post-graduate: 4

In [8]:
from sklearn.preprocessing import OrdinalEncoder

In [10]:
# Creating a simple dataframe
df = pd.DataFrame({
                      'size' : ['small', 'medium', 'large', 'medium', 'large']
                  })
df

Unnamed: 0,size
0,small
1,medium
2,large
3,medium
4,large


In [12]:
# Creating an instance of OrdinalEncoder
encoder = OrdinalEncoder(categories=[['small', 'medium', 'large']]) # Assign ranks
# Fit and Transform
encoder.fit_transform(df[['size']])

array([[0.],
       [1.],
       [2.],
       [1.],
       [2.]])

In [13]:
# creating a data frame
encoded_df = pd.DataFrame(encoder.fit_transform(df[['size']]), columns=encoder.get_feature_names_out())

In [14]:
encoded_df

Unnamed: 0,size
0,0.0
1,1.0
2,2.0
3,1.0
4,2.0


In [15]:
# concat original dataframe and encoded dataframe
pd.concat([df, encoded_df], axis=1)

Unnamed: 0,size,size.1
0,small,0.0
1,medium,1.0
2,large,2.0
3,medium,1.0
4,large,2.0
