# Ordinal Encoding & Label Encoding

Encoding categorical data is a crucial step in preparing data for machine learning models, as many algorithms require numerical input. Two common methods for encoding categorical data are Ordinal Encoding and Label Encoding.

1. **Ordinal Encoding:**
   - **When to use:** Ordinal Encoding is suitable when the categorical data has an inherent order or hierarchy. In other words, the categories have a clear ranking.
   - Use for Feature Columns
   - **How it works:** Assigns a unique numerical value to each category based on its order or ranking. The assigned values maintain the ordinal relationship between categories.
   - **Example:** Suppose you have a "Size" feature with categories "Small," "Medium," and "Large." You might assign numerical values like 1, 2, and 3, respectively, preserving the order.

   ```python
   Size:  Small   Medium   Large
   Ordinal:   1       2        3
   ```

2. **Label Encoding:**
   - **When to use:** Label Encoding is suitable when the categorical data does not have a clear order or when the order is not important for the model.
   - Use for Target Column
   - **How it works:** Assigns a unique numerical label to each category. The assigned values do not imply any inherent order or ranking among the categories.
   - **Example:** Suppose you have a "Color" feature with categories "Red," "Green," and "Blue." You might assign numerical labels like 1, 2, and 3, respectively.

   ```python
   Color:  Red   Green   Blue
   Label:   1      2      3
   ```
Remember to choose the encoding method based on the nature of your categorical data and the requirements of your machine learning model.

In Python, you can use libraries such as scikit-learn to perform Ordinal Encoding and Label Encoding. Here's a simple example using scikit-learn:

In [1]:
from sklearn.preprocessing import OrdinalEncoder, LabelEncoder

# Ordinal Encoding
ordinal_encoder = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
ordinal_encoded_size = ordinal_encoder.fit_transform([['Medium'], ['Small'], ['Large']])
print("Ordinal Encoded Size:", ordinal_encoded_size)

# Label Encoding
label_encoder = LabelEncoder()
label_encoded_color = label_encoder.fit_transform(['Red', 'Green', 'Blue'])
print("Label Encoded Color:", label_encoded_color)


Ordinal Encoded Size: [[1.]
 [0.]
 [2.]]
Label Encoded Color: [2 1 0]


In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv('customer.csv')

In [4]:
df.sample(5)

Unnamed: 0,age,gender,review,education,purchased
31,22,Female,Poor,School,Yes
46,64,Female,Poor,PG,No
18,19,Male,Good,School,No
29,83,Female,Average,UG,Yes
35,74,Male,Poor,School,Yes


In [5]:
# gender is nominal column use OneHot Encoder
# review is Ordinal column use OrdinalEncoder
# education is Ordinal column use OrdinalEncoder
# purchased is nominal column use LabelEncoder
df = df.iloc[:,2:]

In [6]:
df.head()

Unnamed: 0,review,education,purchased
0,Average,School,No
1,Poor,UG,No
2,Good,PG,No
3,Good,PG,No
4,Average,UG,No


In [7]:
x= df.iloc[:,:-1]
y= df['purchased']

In [8]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x,y,test_size=0.2,
                                               random_state=0)

In [9]:
xtrain

Unnamed: 0,review,education
33,Good,PG
35,Poor,School
26,Poor,PG
34,Average,School
18,Good,School
7,Poor,School
14,Poor,PG
45,Poor,PG
48,Good,UG
29,Average,UG


In [10]:
#OrdinalEncoder
oe = OrdinalEncoder(categories=[['Poor','Average','Good'],
                               ['School','UG','PG']])


In [11]:
xtrain = oe.fit_transform(xtrain)
xtest = oe.fit_transform(xtest)

In [12]:
xtest

array([[0., 0.],
       [2., 1.],
       [2., 1.],
       [2., 2.],
       [2., 2.],
       [0., 2.],
       [2., 0.],
       [0., 0.],
       [0., 2.],
       [1., 1.]])

In [13]:
xtrain

array([[2., 2.],
       [0., 0.],
       [0., 2.],
       [1., 0.],
       [2., 0.],
       [0., 0.],
       [0., 2.],
       [0., 2.],
       [2., 1.],
       [1., 1.],
       [0., 1.],
       [1., 1.],
       [1., 1.],
       [0., 1.],
       [2., 2.],
       [1., 0.],
       [0., 2.],
       [1., 1.],
       [1., 0.],
       [2., 0.],
       [1., 0.],
       [0., 1.],
       [2., 0.],
       [2., 1.],
       [0., 1.],
       [0., 0.],
       [1., 2.],
       [1., 2.],
       [2., 0.],
       [2., 0.],
       [2., 1.],
       [1., 2.],
       [0., 2.],
       [2., 1.],
       [0., 2.],
       [0., 2.],
       [2., 2.],
       [1., 0.],
       [2., 2.],
       [1., 1.]])

In [14]:
oe.categories_

[array(['Poor', 'Average', 'Good'], dtype=object),
 array(['School', 'UG', 'PG'], dtype=object)]

In [15]:
# LabelEncoder
# ONLY USE FOR TARGET COLUMNS
le = LabelEncoder()

In [16]:
ytrain = le.fit_transform(ytrain)
ytest = le.fit_transform(ytest)

In [17]:
le.classes_

array(['No', 'Yes'], dtype=object)

In [18]:
ytest

array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0])

In [19]:
ytrain

array([1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0])