## Ordinal numbering encoding

**Ordinal categorical variables**

Categorical variable which categories can be meaningfully ordered are called ordinal. For example:

- Student's grade in an exam (A, B, C or Fail).
- Days of the week can be ordinal with Monday = 1, and Sunday = 7.
- Educational level, with the categories: Elementary school,  High school, College graduate, PhD ranked from 1 to 4.

When the categorical variable is ordinal, the most straightforward approach is to replace the labels by some ordinal number.

### Advantages

- Keeps the semantical information of the variable (human readable content)
- Straightforward

### Disadvantage

- Does not add machine learning valuable information

I will simulate some data below to demonstrate this exercise

In [1]:
import pandas as pd
import datetime

In [2]:
%cd ../data_set/
# create a variable with dates, and from that extract the weekday
# I create a list of dates with 30 days difference from today
# and then transform it into a datafame

base = datetime.datetime.today()
date_list = [base - datetime.timedelta(days=x) for x in range(0, 30)]
df = pd.DataFrame(date_list)
df.columns = ['day']
df

/home/pat/Desktop/Udemy_FeatureEngineering/data_set


Unnamed: 0,day
0,2019-03-17 22:19:23.605826
1,2019-03-16 22:19:23.605826
2,2019-03-15 22:19:23.605826
3,2019-03-14 22:19:23.605826
4,2019-03-13 22:19:23.605826
5,2019-03-12 22:19:23.605826
6,2019-03-11 22:19:23.605826
7,2019-03-10 22:19:23.605826
8,2019-03-09 22:19:23.605826
9,2019-03-08 22:19:23.605826


In [3]:
# extract the week day name

df['day_of_week'] = df['day'].dt.weekday_name
df.head()

Unnamed: 0,day,day_of_week
0,2019-03-17 22:19:23.605826,Sunday
1,2019-03-16 22:19:23.605826,Saturday
2,2019-03-15 22:19:23.605826,Friday
3,2019-03-14 22:19:23.605826,Thursday
4,2019-03-13 22:19:23.605826,Wednesday


In [4]:
# Engineer categorical variable by ordinal number replacement

weekday_map = {'Monday':1,
               'Tuesday':2,
               'Wednesday':3,
               'Thursday':4,
               'Friday':5,
               'Saturday':6,
               'Sunday':7
}

df['day_ordinal'] = df.day_of_week.map(weekday_map)
df.head(10)

Unnamed: 0,day,day_of_week,day_ordinal
0,2019-03-17 22:19:23.605826,Sunday,7
1,2019-03-16 22:19:23.605826,Saturday,6
2,2019-03-15 22:19:23.605826,Friday,5
3,2019-03-14 22:19:23.605826,Thursday,4
4,2019-03-13 22:19:23.605826,Wednesday,3
5,2019-03-12 22:19:23.605826,Tuesday,2
6,2019-03-11 22:19:23.605826,Monday,1
7,2019-03-10 22:19:23.605826,Sunday,7
8,2019-03-09 22:19:23.605826,Saturday,6
9,2019-03-08 22:19:23.605826,Friday,5


We can now use the variable day_ordinal in sklearn to build machine learning models.