# Ordinal Encoding

ChatGPT said:  
📘 What Is Ordinal Data?  
Ordinal data is a type of categorical data where the order or rank matters, but the exact difference between the values is not known or not meaningful.  

🧠 Simple Definition:  
Ordinal data represents categories that have a natural order, but the intervals between the categories are not necessarily equal.  
  
✅ Examples of Ordinal Data:  
Example	Categories  
Education Level	High School < Bachelor’s < Master’s < PhD  
Survey Ratings	Poor < Fair < Good < Very Good < Excellent  
Shirt Sizes	Small < Medium < Large < XL  
Customer Satisfaction	1 (Very Dissatisfied) to 5 (Very Satisfied)  
  
❌ Not Ordinal (for contrast):  
"Red", "Blue", "Green" (no inherent order) → Nominal  

10°C, 20°C, 30°C (with meaningful numerical intervals) → Interval  

5kg, 10kg, 20kg (ordered & has meaningful ratios) → Ratio  

💡 Key Characteristics:  
Feature	Ordinal Data  
Ordered?	✅ Yes  
Equal intervals?	❌ Not guaranteed  
Mathematical operations?	❌ Only comparisons like <, >  
Can compute mean?	❌ Not recommended  
Can compute median or mode?	✅ Yes  



Two methods
1) sklearn
2) map function

In [1]:
import pandas as pd

In [18]:
df = pd.DataFrame({'Size': ['s','m','l','xl','s','m','l','s','s','l','xl','m']})
df

Unnamed: 0,Size
0,s
1,m
2,l
3,xl
4,s
5,m
6,l
7,s
8,s
9,l


In [19]:
df['Size'].unique()

array(['s', 'm', 'l', 'xl'], dtype=object)

In [20]:
# for encoding we need to deside the order beforehand (Note: Do it in 2D)
ord_data = [['s','m','l','xl']]
ord_data

[['s', 'm', 'l', 'xl']]

In [10]:
from sklearn.preprocessing import OrdinalEncoder

Here if you see in shift+tab you can see categories=auto mean the order will be autometically decided by the alphabetical order  
so we need to change it for ord_data (i.e our on order)

In [25]:
oe = OrdinalEncoder(categories=ord_data)
oe.fit(df[['Size']])

In [27]:
df['Ecoding'] = oe.transform(df[['Size']])
df

Unnamed: 0,Size,Ecoding
0,s,0.0
1,m,1.0
2,l,2.0
3,xl,3.0
4,s,0.0
5,m,1.0
6,l,2.0
7,s,0.0
8,s,0.0
9,l,2.0


**Using Map Function**

In [28]:
ord_data1 = {'s':0, 'm':1, 'l':2, 'xl':3}

In [32]:
df['encoded through map'] = df['Size'].map(ord_data1)

In [33]:
df

Unnamed: 0,Size,Ecoding,encoded through map
0,s,0.0,0
1,m,1.0,1
2,l,2.0,2
3,xl,3.0,3
4,s,0.0,0
5,m,1.0,1
6,l,2.0,2
7,s,0.0,0
8,s,0.0,0
9,l,2.0,2
