Encoding is useful when your attributes have categorical, textual data. Many machine learning models and algorithms don't work well with textual data directly. You need to encode them to a numeric form for the algorithms to work on them

## Ordinal Encoding

Use when there's some inherent order/ranking in the categorical data, like ratings: [good, better, best]
or sizes: [small, medium, large]
or difficulties: [easy, medium, hard]

Used to encode categorical features as an integer array.
For example, you have a column named size that can take the following values: ['small', 'medium', 'large'].
Then ordinal encoding will encode them using integers like so:
- small - 0
- medium - 1
- large - 2

In [1]:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

In [2]:
# Suppose we have this sample DataFrame here:
df = pd.DataFrame({
    "product_id": [1, 2, 3, 4, 5, 6, 7],
    "product_name": ["Phone A", "Laptop B", "Camera C", "Phone D", "TV E", "Camera F", "TV G"],
    "rating": ["poor", "average", "good", "excellent", "average", "excellent", "poor"],
})
df

Unnamed: 0,product_id,product_name,rating
0,1,Phone A,poor
1,2,Laptop B,average
2,3,Camera C,good
3,4,Phone D,excellent
4,5,TV E,average
5,6,Camera F,excellent
6,7,TV G,poor


In [66]:
df['rating'].unique()

array(['poor', 'average', 'good', 'excellent'], dtype=object)

In [67]:
ratings = ['poor', 'average', 'good', 'excellent']

In [68]:
# create an OrdinalEncoder object
oe = OrdinalEncoder(categories=[ratings])

# the categories parameter can be used to specify the ordering in which the values should be assigned
# ratings will be assigned integer values starting from 0 to n-1
# so poor will be 0, average will be 1 and so on..


In [69]:
# apply the encoding on the ratings column
# the fit_transform method of the encoder, fits to data then transforms it
oe.fit_transform(df[['rating']])
# each of the ratings is assigned a numeric value: 0 - poor, 1 - average, 2 - good, 3 - excellent

array([[0.],
       [1.],
       [2.],
       [3.],
       [1.],
       [3.],
       [0.]])

In [70]:
# merge the numeric ratings back into the original dataframe
df['rating'] = oe.fit_transform(df[['rating']])
df

Unnamed: 0,product_id,product_name,rating
0,1,Phone A,0.0
1,2,Laptop B,1.0
2,3,Camera C,2.0
3,4,Phone D,3.0
4,5,TV E,1.0
5,6,Camera F,3.0
6,7,TV G,0.0
