### 🔖 Ordinal Encoding

**Ordinal Encoding** is a technique used to convert categorical data into numerical values by assigning a unique integer to each category, preserving the inherent order of the categories. This method is particularly useful for ordinal data, where the categories have a meaningful sequence.

#### Example:

Consider a dataset with an 'Education Level' feature:

| Education Level |
|-----------------|
| High School     |
| Bachelor's      |
| Master's        |
| Doctorate       |

Applying ordinal encoding:

| Education Level | Encoded Value |
|-----------------|---------------|
| High School     | 0             |
| Bachelor's      | 1             |
| Master's        | 2             |
| Doctorate       | 3             |

In this encoding, the order is maintained: High School < Bachelor's < Master's < Doctorate.


In [1]:
# importing required packages
import pandas as pd

In [3]:
# preparing the data
df = pd.DataFrame({'Size':['s', 'm', 'l','xl', '2xl', '3xl','4xl', '5xl','s', 'm', 'l','xl', '2xl', '3xl','4xl', '5xl']})
df.head(3)

Unnamed: 0,Size
0,s
1,m
2,l


In [4]:
# deciding the order, 2d data
ord_data = [['s', 'm', 'l','xl', '2xl', '3xl','4xl', '5xl']]

## using sclearn to perform Ordinal Encoding

In [5]:
from sklearn.preprocessing import OrdinalEncoder

In [6]:
ordinal_encoding = OrdinalEncoder(categories=ord_data) # if you not set the categories it will go through the alphabetical order
ordinal_encoding.fit(df[['Size']])

In [7]:
df['Size_EN'] = ordinal_encoding.transform(df[['Size']])
df

Unnamed: 0,Size,Size_EN
0,s,0.0
1,m,1.0
2,l,2.0
3,xl,3.0
4,2xl,4.0
5,3xl,5.0
6,4xl,6.0
7,5xl,7.0
8,s,0.0
9,m,1.0


## Using map function

In [10]:
ord_data2 = {'s':10, 'm':11, 'l':12, 'xl':13, '2xl':14, '3xl':15, '4xl':16, '5xl':17}

In [11]:
df['Size_en_map'] = df['Size'].map(ord_data2)
df

Unnamed: 0,Size,Size_EN,Size_en_map
0,s,0.0,10
1,m,1.0,11
2,l,2.0,12
3,xl,3.0,13
4,2xl,4.0,14
5,3xl,5.0,15
6,4xl,6.0,16
7,5xl,7.0,17
8,s,0.0,10
9,m,1.0,11


## Using real data

In [12]:
# importing the data
dataset = pd.read_excel("Financial Sample.xlsx")
dataset.head(3)

Unnamed: 0,Segment,Country,Product,Discount Band,Units Sold,Manufacturing Price,Sale Price,Gross Sales,Discounts,Sales,COGS,Profit,Date,Month Number,Month Name,Year
0,Government,Canada,Carretera,,1618.5,3,20,32370.0,0.0,32370.0,16185.0,16185.0,2014-01-01,1,January,2014
1,Government,Germany,Carretera,,1321.0,3,20,26420.0,0.0,26420.0,13210.0,13210.0,2014-01-01,1,January,2014
2,Midmarket,France,Carretera,,2178.0,3,15,32670.0,0.0,32670.0,21780.0,10890.0,2014-06-01,6,June,2014


In [None]:
# getting the unique data of segment column

# if column contain the null, then fill with mode
dataset["Segment"].fillna(dataset["Segment"].mode()[0], inplace=True)

en_data_ord = [dataset["Segment"].unique()]
en_data_ord


[array(['Government', 'Midmarket', 'Channel Partners', 'Enterprise',
        'Small Business'], dtype=object)]

In [17]:
on = OrdinalEncoder(categories=en_data_ord)
on.fit(dataset[["Segment"]])

In [19]:
dataset['Segment_oen'] = on.transform(dataset[["Segment"]])
dataset.head(10)

Unnamed: 0,Segment,Country,Product,Discount Band,Units Sold,Manufacturing Price,Sale Price,Gross Sales,Discounts,Sales,COGS,Profit,Date,Month Number,Month Name,Year,Segment_oen
0,Government,Canada,Carretera,,1618.5,3,20,32370.0,0.0,32370.0,16185.0,16185.0,2014-01-01,1,January,2014,0.0
1,Government,Germany,Carretera,,1321.0,3,20,26420.0,0.0,26420.0,13210.0,13210.0,2014-01-01,1,January,2014,0.0
2,Midmarket,France,Carretera,,2178.0,3,15,32670.0,0.0,32670.0,21780.0,10890.0,2014-06-01,6,June,2014,1.0
3,Midmarket,Germany,Carretera,,888.0,3,15,13320.0,0.0,13320.0,8880.0,4440.0,2014-06-01,6,June,2014,1.0
4,Midmarket,Mexico,Carretera,,2470.0,3,15,37050.0,0.0,37050.0,24700.0,12350.0,2014-06-01,6,June,2014,1.0
5,Government,Germany,Carretera,,1513.0,3,350,529550.0,0.0,529550.0,393380.0,136170.0,2014-12-01,12,December,2014,0.0
6,Midmarket,Germany,Montana,,921.0,5,15,13815.0,0.0,13815.0,9210.0,4605.0,2014-03-01,3,March,2014,1.0
7,Channel Partners,Canada,Montana,,2518.0,5,12,30216.0,0.0,30216.0,7554.0,22662.0,2014-06-01,6,June,2014,2.0
8,Government,France,Montana,,1899.0,5,20,37980.0,0.0,37980.0,18990.0,18990.0,2014-06-01,6,June,2014,0.0
9,Channel Partners,Germany,Montana,,1545.0,5,12,18540.0,0.0,18540.0,4635.0,13905.0,2014-06-01,6,June,2014,2.0
