<a href="https://colab.research.google.com/github/chrismarkella/Kaggle-access-from-Google-Colab/blob/master/nominal_vs_ordinal_variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Categorical data

`Nominal` vs `Ordinal`

- Nominal data has no order or ranking between the categories.
For example colors: `Yellow`, `Red`, `Green`
- Ordinal data has order or ranking between the categories.
For example `Cold(1)` < `Warm(2)` < `Hot(3)`

In [0]:
import numpy as np
import pandas as pd

In [2]:
df = pd.DataFrame(
    data={
        'Breakfast': ['Every day',
                      'Never', 
                      'Rarely',
                      'Most days',
                      'Never',
                      ]
    }
 )
df

Unnamed: 0,Breakfast
0,Every day
1,Never
2,Rarely
3,Most days
4,Never


In [3]:
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()

label_encoder.fit_transform(df['Breakfast'])

array([0, 2, 3, 1, 2])

Using the `LabelEncoder` will choose for us the labels.

This approach is okay for `nominal` categories, but does not preserve the order or ranking of `ordinal` categories.

In [4]:
df['Breakfast_encoded'] = label_encoder.fit_transform(df['Breakfast'])

df

Unnamed: 0,Breakfast,Breakfast_encoded
0,Every day,0
1,Never,2
2,Rarely,3
3,Most days,1
4,Never,2


If the categorical data is `ordinal`, then we can custom label it to keep the order of the categories.

This could be done using a `dictionary` and `mapping` the categories to the custom labels.

In [5]:
breakfast_dict = {
    'Never': 1,
    'Rarely': 2,
    'Most days': 3,
    'Every day': 4,
}

df['Breakfast_ordinal'] = df.Breakfast.map(breakfast_dict)
df

Unnamed: 0,Breakfast,Breakfast_encoded,Breakfast_ordinal
0,Every day,0,4
1,Never,2,1
2,Rarely,3,2
3,Most days,1,3
4,Never,2,1
