Types of Categorical Data
Not all categories are created equal. We generally have two types:

Nominal: These are categories with no inherent order. _Ex: "_Outlook" (sunny, overcast, rainy) is nominal. There’s no natural ranking between these weather conditions.

Ordinal: These categories have a meaningful order. Ex: "Temperature" (Very Low, Low, High, Very High) is ordinal. There’s a clear progression from coldest to hottest.

Reference:
https://towardsdatascience.com/encoding-categorical-data-explained-a-visual-guide-with-code-example-for-beginners-b169ac4193ae/

In [3]:
import pandas as pd
import numpy as np

In [4]:
# The Dataset

data = {
    'Date': ['03-25', '03-26', '03-27', '03-28', '03-29', '03-30', '03-31', '04-01', '04-02', '04-03', '04-04', '04-05'],
    'Weekday': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
    'Month': ['Mar', 'Mar', 'Mar', 'Mar', 'Mar', 'Mar', 'Mar', 'Apr', 'Apr', 'Apr', 'Apr', 'Apr'],
    'Temperature': ['High', 'Low', 'High', 'Extreme', 'Low', 'High', 'High', 'Low', 'High', 'Extreme', 'High', 'Low'],
    'Humidity': ['Dry', 'Humid', 'Dry', 'Dry', 'Humid', 'Humid', 'Dry', 'Humid', 'Dry', 'Dry', 'Humid', 'Dry'],
    'Wind': ['No', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'Yes'],
    'Outlook': ['sunny', 'rainy', 'overcast', 'sunny', 'rainy', 'overcast', 'sunny', 'rainy', 'sunny', 'overcast', 'sunny', 'rainy'],
    'Crowdedness': [85, 30, 65, 45, 25, 90, 95, 35, 70, 50, 80, 45]
}

In [6]:
df = pd.DataFrame(data)

In [7]:
df

Unnamed: 0,Date,Weekday,Month,Temperature,Humidity,Wind,Outlook,Crowdedness
0,03-25,Mon,Mar,High,Dry,No,sunny,85
1,03-26,Tue,Mar,Low,Humid,Yes,rainy,30
2,03-27,Wed,Mar,High,Dry,Yes,overcast,65
3,03-28,Thu,Mar,Extreme,Dry,Yes,sunny,45
4,03-29,Fri,Mar,Low,Humid,No,rainy,25
5,03-30,Sat,Mar,High,Humid,No,overcast,90
6,03-31,Sun,Mar,High,Dry,Yes,sunny,95
7,04-01,Mon,Apr,Low,Humid,No,rainy,35
8,04-02,Tue,Apr,High,Dry,Yes,sunny,70
9,04-03,Wed,Apr,Extreme,Dry,Yes,overcast,50


# Method 1: Label Encoding


Label Encoding assigns a unique integer to each category in a categorical variable.

In [8]:
# label Encoding for weekday
df["Weekday_label"] = pd.factorize(df['Weekday'])[0]

In [9]:
df

Unnamed: 0,Date,Weekday,Month,Temperature,Humidity,Wind,Outlook,Crowdedness,Weekday_label
0,03-25,Mon,Mar,High,Dry,No,sunny,85,0
1,03-26,Tue,Mar,Low,Humid,Yes,rainy,30,1
2,03-27,Wed,Mar,High,Dry,Yes,overcast,65,2
3,03-28,Thu,Mar,Extreme,Dry,Yes,sunny,45,3
4,03-29,Fri,Mar,Low,Humid,No,rainy,25,4
5,03-30,Sat,Mar,High,Humid,No,overcast,90,5
6,03-31,Sun,Mar,High,Dry,Yes,sunny,95,6
7,04-01,Mon,Apr,Low,Humid,No,rainy,35,0
8,04-02,Tue,Apr,High,Dry,Yes,sunny,70,1
9,04-03,Wed,Apr,Extreme,Dry,Yes,overcast,50,2


# Method 2: One-Hot Encoding

One-Hot Encoding creates a new binary column for each category in a categorical variable.

In [10]:
df = pd.get_dummies(df, columns=['Outlook'], prefix="Outlook", dtype=int)

In [11]:
df

Unnamed: 0,Date,Weekday,Month,Temperature,Humidity,Wind,Crowdedness,Weekday_label,Outlook_overcast,Outlook_rainy,Outlook_sunny
0,03-25,Mon,Mar,High,Dry,No,85,0,0,0,1
1,03-26,Tue,Mar,Low,Humid,Yes,30,1,0,1,0
2,03-27,Wed,Mar,High,Dry,Yes,65,2,1,0,0
3,03-28,Thu,Mar,Extreme,Dry,Yes,45,3,0,0,1
4,03-29,Fri,Mar,Low,Humid,No,25,4,0,1,0
5,03-30,Sat,Mar,High,Humid,No,90,5,1,0,0
6,03-31,Sun,Mar,High,Dry,Yes,95,6,0,0,1
7,04-01,Mon,Apr,Low,Humid,No,35,0,0,1,0
8,04-02,Tue,Apr,High,Dry,Yes,70,1,0,0,1
9,04-03,Wed,Apr,Extreme,Dry,Yes,50,2,1,0,0


# Method 3: Binary Encoding

Binary Encoding represents each category as a binary number (0 and 1).

In [12]:
df.columns

Index(['Date', 'Weekday', 'Month', 'Temperature', 'Humidity', 'Wind',
       'Crowdedness', 'Weekday_label', 'Outlook_overcast', 'Outlook_rainy',
       'Outlook_sunny'],
      dtype='object')

In [13]:
df['wind_binary_encoding'] = (df['Wind'] == 'Yes').astype(int)

In [14]:
df

Unnamed: 0,Date,Weekday,Month,Temperature,Humidity,Wind,Crowdedness,Weekday_label,Outlook_overcast,Outlook_rainy,Outlook_sunny,wind_binary_encoding
0,03-25,Mon,Mar,High,Dry,No,85,0,0,0,1,0
1,03-26,Tue,Mar,Low,Humid,Yes,30,1,0,1,0,1
2,03-27,Wed,Mar,High,Dry,Yes,65,2,1,0,0,1
3,03-28,Thu,Mar,Extreme,Dry,Yes,45,3,0,0,1,1
4,03-29,Fri,Mar,Low,Humid,No,25,4,0,1,0,0
5,03-30,Sat,Mar,High,Humid,No,90,5,1,0,0,0
6,03-31,Sun,Mar,High,Dry,Yes,95,6,0,0,1,1
7,04-01,Mon,Apr,Low,Humid,No,35,0,0,1,0,0
8,04-02,Tue,Apr,High,Dry,Yes,70,1,0,0,1,1
9,04-03,Wed,Apr,Extreme,Dry,Yes,50,2,1,0,0,1


# Method 4: Target Encoding
Target Encoding replaces each category with the mean of the target variable for that category.

In [15]:
df['Humidity_target_encoding'] = df.groupby('Humidity')['Crowdedness'].transform('mean')

In [16]:
df

Unnamed: 0,Date,Weekday,Month,Temperature,Humidity,Wind,Crowdedness,Weekday_label,Outlook_overcast,Outlook_rainy,Outlook_sunny,wind_binary_encoding,Humidity_target_encoding
0,03-25,Mon,Mar,High,Dry,No,85,0,0,0,1,0,65.0
1,03-26,Tue,Mar,Low,Humid,Yes,30,1,0,1,0,1,52.0
2,03-27,Wed,Mar,High,Dry,Yes,65,2,1,0,0,1,65.0
3,03-28,Thu,Mar,Extreme,Dry,Yes,45,3,0,0,1,1,65.0
4,03-29,Fri,Mar,Low,Humid,No,25,4,0,1,0,0,52.0
5,03-30,Sat,Mar,High,Humid,No,90,5,1,0,0,0,52.0
6,03-31,Sun,Mar,High,Dry,Yes,95,6,0,0,1,1,65.0
7,04-01,Mon,Apr,Low,Humid,No,35,0,0,1,0,0,52.0
8,04-02,Tue,Apr,High,Dry,Yes,70,1,0,0,1,1,65.0
9,04-03,Wed,Apr,Extreme,Dry,Yes,50,2,1,0,0,1,65.0


# Method 5: Ordinal Encoding

Ordinal Encoding assigns ordered integers to ordinal categories based on their inherent order.

In [17]:
temp_order = {'Low': 1, 'High': 2, 'Extreme': 3}
df['Temperature_ordinal_encoding'] = df['Temperature'].map(temp_order)

In [18]:
df

Unnamed: 0,Date,Weekday,Month,Temperature,Humidity,Wind,Crowdedness,Weekday_label,Outlook_overcast,Outlook_rainy,Outlook_sunny,wind_binary_encoding,Humidity_target_encoding,Temperature_ordinal_encoding
0,03-25,Mon,Mar,High,Dry,No,85,0,0,0,1,0,65.0,2
1,03-26,Tue,Mar,Low,Humid,Yes,30,1,0,1,0,1,52.0,1
2,03-27,Wed,Mar,High,Dry,Yes,65,2,1,0,0,1,65.0,2
3,03-28,Thu,Mar,Extreme,Dry,Yes,45,3,0,0,1,1,65.0,3
4,03-29,Fri,Mar,Low,Humid,No,25,4,0,1,0,0,52.0,1
5,03-30,Sat,Mar,High,Humid,No,90,5,1,0,0,0,52.0,2
6,03-31,Sun,Mar,High,Dry,Yes,95,6,0,0,1,1,65.0,2
7,04-01,Mon,Apr,Low,Humid,No,35,0,0,1,0,0,52.0,1
8,04-02,Tue,Apr,High,Dry,Yes,70,1,0,0,1,1,65.0,2
9,04-03,Wed,Apr,Extreme,Dry,Yes,50,2,1,0,0,1,65.0,3


# Conclusion 

1. Label Encoding: Turned our ‘Weekday’ into numbers, making Monday 0 and Sunday 6 – simple but potentially misleading.
2. One-Hot Encoding: Gave ‘Outlook’ its own columns, letting ‘sunny’, ‘overcast’, and ‘rainy’ stand independently.
3. Binary Encoding: Compressed our ‘Humidity’ into efficient binary code, saving space without losing information.
4. Target Encoding: Replaced ‘Windy’ categories with average ‘Crowdedness’, capturing hidden relationships.
5. Ordinal Encoding: Respected the natural order of ‘Temperature’, from ‘Very Low’ to ‘Very High’.