# Ordinal Encoder
**Categorical data** 
Types of categorical data:
- Nominal --> Previous presentation: One-hot encoder
- Ordinal 

**Ordinal categorical data** are a type of categorical data where the categories have a natural order or ranking, but the intervals between the categories are not necessarily equal or meaningful. This type of data is used when there is a relative ranking or hierarchy among the categories, but arithmetic operations like addition or subtraction do not make sense.

#### Examples:
1. **Education Level**: High school < Bachelor's < Master's < PhD  
2. **Survey Ratings**: Strongly Disagree < Disagree < Neutral < Agree < Strongly Agree  
3. **Socioeconomic Class**: Low < Middle < High  

#### Characteristics:
- **Order matters**: Categories have a meaningful order.
- **No equal intervals**: The distance between the categories is not uniform or quantifiable.
- **Non-numeric or numeric labels**: Categories may be represented by words or numbers, but the numbers are just labels (e.g., 1 = Poor, 2 = Fair, 3 = Good).

Ordinal data is often analyzed using non-parametric statistical methods, as it does not meet the assumptions of interval or ratio data.

## Ordinal Encoding

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OrdinalEncoder.html

__Convert an ordinal column (or any categorical column with a meaningful order) into numerical values. It assigns each category in the ordinal column a unique integer, maintaining the natural order of the categories.__

Here an example:



### Import Libraries

In [1]:
from sklearn.preprocessing import OrdinalEncoder
import pandas as pd




### Read data into a dataframe df

In [2]:
# Example dataset with ordinal categorical feature
data = {'feature': ['Low', 'Medium', 'High', 'Medium', 'Low', 'High']}

df = pd.DataFrame(data)
df

Unnamed: 0,feature
0,Low
1,Medium
2,High
3,Medium
4,Low
5,High


In [6]:
df.shape

(6, 1)

### Encode an oridinal feature

In [3]:
# Define the transformer instance
ordinal_encoder = OrdinalEncoder(categories=[['Low', 'Medium', 'High']])

The categories parameter is used to specify the exact order of the categories in a feature when you want to ensure that the encoder assigns numerical values based on that specific order.

#### "Fitting" the ordinal encoder
During the fit, the ordinal_encoder transformer learns the unique values of the feature. This values than are stored in the "categories_" attribute of the *ordinal_encoder* object.

In [4]:
ordinal_encoder.fit(df[['feature']])            

In [5]:
ordinal_encoder.categories_

[array(['Low', 'Medium', 'High'], dtype=object)]

#### Transforming the columns
Only during the transformation each category is converted to a a value that represent the order 

In [7]:
t = ordinal_encoder.transform(df[['feature']])
print(t.shape)
print()
t

(6, 1)



array([[0.],
       [1.],
       [2.],
       [1.],
       [0.],
       [2.]])

In [8]:
# format output as a DataFame
encoded_feature = pd.DataFrame(t, columns=ordinal_encoder.get_feature_names_out())
encoded_feature.head()

Unnamed: 0,feature
0,0.0
1,1.0
2,2.0
3,1.0
4,0.0
