## Ordinal numbering encoding

**Ordinal categorical variables**

Categorical variable which categories can be meaningfully ordered are called ordinal. For example:

- Student's grade in an exam (A, B, C or Fail).
- Days of the week can be ordinal with Monday = 1, and Sunday = 7.
- Educational level, with the categories: Elementary school,  High school, College graduate, PhD ranked from 1 to 4.

When the categorical variable is ordinal, the most straightforward approach is to replace the labels by some ordinal number.

### Advantages

- Keeps the semantical information of the variable (human readable content)
- Straightforward

### Disadvantage

- Does not add machine learning valuable information

I will simulate some data below to demonstrate this exercise

In [1]:
import pandas as pd
import datetime

In [2]:
# create a variable with dates, and from that extract the weekday
# I create a list of dates with 30 days difference from today
# and then transform it into a datafame

base = datetime.datetime.today()
date_list = [base - datetime.timedelta(days=x) for x in range(0, 30)]
df = pd.DataFrame(date_list)
df.columns = ['day']
df

Unnamed: 0,day
0,2020-04-15 16:48:53.029012
1,2020-04-14 16:48:53.029012
2,2020-04-13 16:48:53.029012
3,2020-04-12 16:48:53.029012
4,2020-04-11 16:48:53.029012
5,2020-04-10 16:48:53.029012
6,2020-04-09 16:48:53.029012
7,2020-04-08 16:48:53.029012
8,2020-04-07 16:48:53.029012
9,2020-04-06 16:48:53.029012


In [3]:
# extract the week day name

df['day_of_week'] = df['day'].dt.weekday_name
df.head()

Unnamed: 0,day,day_of_week
0,2020-04-15 16:48:53.029012,Wednesday
1,2020-04-14 16:48:53.029012,Tuesday
2,2020-04-13 16:48:53.029012,Monday
3,2020-04-12 16:48:53.029012,Sunday
4,2020-04-11 16:48:53.029012,Saturday


In [4]:
# Engineer categorical variable by ordinal number replacement

weekday_map = {'Monday':1,
               'Tuesday':2,
               'Wednesday':3,
               'Thursday':4,
               'Friday':5,
               'Saturday':6,
               'Sunday':7
}

df['day_ordinal'] = df.day_of_week.map(weekday_map)
df.head(10)

Unnamed: 0,day,day_of_week,day_ordinal
0,2020-04-15 16:48:53.029012,Wednesday,3
1,2020-04-14 16:48:53.029012,Tuesday,2
2,2020-04-13 16:48:53.029012,Monday,1
3,2020-04-12 16:48:53.029012,Sunday,7
4,2020-04-11 16:48:53.029012,Saturday,6
5,2020-04-10 16:48:53.029012,Friday,5
6,2020-04-09 16:48:53.029012,Thursday,4
7,2020-04-08 16:48:53.029012,Wednesday,3
8,2020-04-07 16:48:53.029012,Tuesday,2
9,2020-04-06 16:48:53.029012,Monday,1


We can now use the variable day_ordinal in sklearn to build machine learning models.