### Label Encoding

Label encoding and ordinal encoding are two techniques used to encode categorical data as numerical data.   

Label encoding involves assigning a unique numerical label to each category in the variable.    
The labels are usually assigned in alphabetical order or based on the frequency of the categories.  
For example, if we have a categorical variable 'color' with three possible values (red,green,blue), we can represent it using label encoding as follows:    
1. Red: 1
2. Green: 2
3. Blue: 3

In [7]:
import pandas as pd

In [8]:
df = pd.DataFrame({
    'color':['red','blue','green','green','red']
})

In [9]:
df.head()

Unnamed: 0,color
0,red
1,blue
2,green
3,green
4,red


In [10]:
from sklearn.preprocessing import LabelEncoder

In [11]:
lbl_encoder = LabelEncoder()

In [None]:
lbl_encoder.fit_transform(df[['color']])
# Blue has been assigned 0, green 1 and red 2 based on alphabetical order

  y = column_or_1d(y, warn=True)


array([2, 0, 1, 1, 2])

In [13]:
lbl_encoder.transform([['red']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


array([2])

In [14]:
lbl_encoder.transform([['green']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


array([1])

In [16]:
lbl_encoder.transform([['blue']])

  y = column_or_1d(y, dtype=self.classes_.dtype, warn=True)


array([0])

In [None]:
## In this case, the machine learning might get confused and interpret that red is greater than green and blue, green greater than blue, etc.
## When such a situation arises, we use ordinal encoding
## To purposely give ranks

### Ordinal Encoding

It is used to encode categorical data that have an intrinsic order or ranking.  
In this technique, each category is assigned a numerical value based on its position in the order.  
For example, if we have a categorical variable "education level" with four possible values (high school, college, graduate, post-graduate), we can represent it using ordinal encoding as follows:  
1. High School: 1
2. College: 2
3. Graduate: 3
4. Post-Graduate: 4

In [18]:
from sklearn.preprocessing import OrdinalEncoder

In [19]:
df = pd.DataFrame({
    'size':['small','medium','large','medium','small','large']
})

In [20]:
df

Unnamed: 0,size
0,small
1,medium
2,large
3,medium
4,small
5,large


In [None]:
## Create an instance of OrdinalEncoder and then perform fit_transform
encoder = OrdinalEncoder(categories=[['small','medium','large']]) #Using the categories parameter, we can specify the order of ranking

In [None]:
encoder.fit_transform(df[['size']])
# Small - 0, medium - 1, large - 2

array([[0.],
       [1.],
       [2.],
       [1.],
       [0.],
       [2.]])

In [25]:
encoder.transform([['large']])



array([[2.]])