# Encoding Categorical Data - Ordinal Encoding


Ordinal encoding is a method used to convert categorical variables with an inherent order or hierarchy into numerical values. It assigns a unique numerical label to each distinct category, preserving the ordinal relationship between the categories. This encoding is useful when the categorical variable has a meaningful order but does not have a numerical representation.

For example, consider a variable "education level" with categories "High School," "Bachelor's," "Master's," and "Ph.D." In ordinal encoding, we could assign the labels 1, 2, 3, and 4 to these categories, respectively, based on their increasing educational attainment.

The advantage of ordinal encoding is that it allows statistical models to understand and utilize the ordinal relationship between categories. However, it assumes an equal interval between the encoded values, which may not always hold true. If the intervals between categories are not uniform or the data does not exhibit a clear order, alternative encoding methods like one-hot encoding or target encoding may be more appropriate.

In [16]:
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

In [17]:
# Create a sample dataset
data = pd.DataFrame({
    'Education': ['High School', 'Bachelor\'s', 'Master\'s', 'Ph.D.', 'Bachelor\'s', 'Master\'s']
})

In [18]:
# Initialize the ordinal encoder
encoder = OrdinalEncoder()

In [19]:
# Fit and transform the data
encoded_data = encoder.fit_transform(data)

In [20]:
# Convert the encoded data back to a DataFrame
encoded_df = pd.DataFrame(encoded_data, columns=data.columns)

In [21]:
# Print the original and encoded data
print("Original Data:")
print(data)
print("\nEncoded Data:")
print(encoded_df)

Original Data:
     Education
0  High School
1   Bachelor's
2     Master's
3        Ph.D.
4   Bachelor's
5     Master's

Encoded Data:
   Education
0        1.0
1        0.0
2        2.0
3        3.0
4        0.0
5        2.0


In this example, we have a dataset with a single categorical variable "Education" that represents different levels of education. We use the OrdinalEncoder from scikit-learn to perform the ordinal encoding.

The dataset is defined as a pandas DataFrame, and we import the necessary libraries, including OrdinalEncoder from scikit-learn. We then create a sample dataset with the "Education" variable containing different education levels.

Next, we initialize the OrdinalEncoder and use the fit_transform method to fit the encoder to the data and simultaneously transform the data into its encoded form.

Finally, we convert the encoded data back to a DataFrame for better visualization and print both the original and encoded data.