# Label And Ordinal Encoding

Label And Ordinal Encoding used to encode categorical data as numerical data.

# 1) Label Encoding

Label encoding involves assigning a unique numerical label to each category in the variable. The labels are usually assigned in alphabetical order or based on the frequency of the categories. For example, if we have a categorical variable "color" with three possible values (red, green, blue), we can represent it using label encoding as follows:

1. Red: 1
2. Green: 2
3. Blue: 3

It treats categories as nominal (no specific order). Works well for nominal data (categories without a meaningful order, e.g., Color or City).

In [41]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# LabelEncoder : Encode target labels with value between 0 and n_classes-1...

In [42]:
## creating a simple dataframe..

df = pd.DataFrame({
    'color' : ['red', 'blue', 'green', 'green', 'red', 'blue']
})

df

Unnamed: 0,color
0,red
1,blue
2,green
3,green
4,red
5,blue


In [43]:
lbl_encoder = LabelEncoder()

In [44]:
lbl_encoder.fit_transform(df[['color']])

  y = column_or_1d(y, warn=True)


array([2, 0, 1, 1, 2, 0])

In [45]:
## lbl_encoder.transform([]) :  this gives the value assigned to each colour...

lbl_encoder.transform(['red']) ## this will give the value assigned to red...

array([2])

In [46]:
lbl_encoder.transform(['blue'])

array([0])

In [47]:
lbl_encoder.transform(['green'])

array([1])

### Disadvantage of Label Encoding

When we are assigning labels for nominal data , for example, here with colours red , blue and green , red has value 2, so the model might think red may have a higher value as compared to blue or green, but this should not happen as they are nominal values. Nominal values are a type of categorical data where the categories have no inherent order or ranking. Each category is unique, but there's no sense of greater-than or less-than relationships between them.

The disadvantage of Label Encoding lies in how it assigns numeric values to nominal data, which can mislead the machine learning model into interpreting the categories as having an ordinal relationship, even when they are nominal

# 2) Ordinal Encoding

It is used to encode categorical data that have an intrinsic order or ranking. In this technique, each category is assigned a numerical value based on its position in the order. For example, if we have a categorical variable "education level" with four possible values (high school, college, graduate, post-graduate), we can represent it using ordinal encoding as follows:

1. High school: 1
2. College: 2
3. Graduate: 3
4. Post-graduate: 4

In [48]:
from sklearn.preprocessing import OrdinalEncoder

In [49]:
# create a sample dataframe with an ordinal variable

df = pd.DataFrame({
    'size': ['small', 'medium', 'large', 'medium', 'small', 'large']
})

In [50]:
## creating an instance of ordinal encoderand then performing fit_transform....

encoder = OrdinalEncoder(categories=[['small','medium','large']])
# 'categories' Parameter : Specifies the order of categories for each feature. Ensures the categories are encoded in the correct order (small < medium < large)...

encoder.fit_transform(df)

array([[0.],
       [1.],
       [2.],
       [1.],
       [0.],
       [2.]])

In [51]:
(encoder.transform([['small']]),encoder.transform([['medium']]),encoder.transform([['large']]))



(array([[0.]]), array([[1.]]), array([[2.]]))