# **Ordinal Encoder**

Ordinal Encoder is a type of encoder that converts categorical data into numerical data by assigning a unique integer value to each category. This is useful when you want to perform numerical operations on categorical data.

It is a common data preprocessing step in most data science projects. 
Ordinal encoding is particularly useful when an inherent ordering or ranking is present within the categorical variable.


In [27]:
import pandas as pd

d = {'sales': [100000,222000,1000000,522000,111111,222222,1111111,20000,75000,90000,1000000,10000], 'city': ['Tampa','Tampa','Orlando','Jacksonville','Miami','Jacksonville','Miami','Miami','Orlando','Orlando','Orlando','Orlando'], 'size': ['Small', 'Medium','Large','Large','Small','Medium','Large','Small','Medium','Medium','Medium','Small',]}
display(d)

{'sales': [100000,
  222000,
  1000000,
  522000,
  111111,
  222222,
  1111111,
  20000,
  75000,
  90000,
  1000000,
  10000],
 'city': ['Tampa',
  'Tampa',
  'Orlando',
  'Jacksonville',
  'Miami',
  'Jacksonville',
  'Miami',
  'Miami',
  'Orlando',
  'Orlando',
  'Orlando',
  'Orlando'],
 'size': ['Small',
  'Medium',
  'Large',
  'Large',
  'Small',
  'Medium',
  'Large',
  'Small',
  'Medium',
  'Medium',
  'Medium',
  'Small']}

In [28]:
df = pd.DataFrame(data=d)
df.head()

Unnamed: 0,sales,city,size
0,100000,Tampa,Small
1,222000,Tampa,Medium
2,1000000,Orlando,Large
3,522000,Jacksonville,Large
4,111111,Miami,Small


In [29]:
display(df['size'].unique())
display(df['city'].unique())

array(['Small', 'Medium', 'Large'], dtype=object)

array(['Tampa', 'Orlando', 'Jacksonville', 'Miami'], dtype=object)

In [30]:
sizes = ['Small', 'Medium', 'Large']

In [31]:
from sklearn.preprocessing import OrdinalEncoder

# Ordinal Encoder
# The OrdinalEncoder is used to convert categorical features into ordinal integers.
# It assigns a unique integer to each category based on the order of the categories.
# In this case, we are encoding the 'size' column with the specified order of categories
enc  = OrdinalEncoder(categories=[sizes])
# Fit and transform the 'size' column
enc.fit_transform(df[['size']])

array([[0.],
       [1.],
       [2.],
       [2.],
       [0.],
       [1.],
       [2.],
       [0.],
       [1.],
       [1.],
       [1.],
       [0.]])

In [32]:
df['size'] = enc.fit_transform(df[['size']])
df.head(11)

Unnamed: 0,sales,city,size
0,100000,Tampa,0.0
1,222000,Tampa,1.0
2,1000000,Orlando,2.0
3,522000,Jacksonville,2.0
4,111111,Miami,0.0
5,222222,Jacksonville,1.0
6,1111111,Miami,2.0
7,20000,Miami,0.0
8,75000,Orlando,1.0
9,90000,Orlando,1.0


In [33]:
df.head(10)

Unnamed: 0,sales,city,size
0,100000,Tampa,0.0
1,222000,Tampa,1.0
2,1000000,Orlando,2.0
3,522000,Jacksonville,2.0
4,111111,Miami,0.0
5,222222,Jacksonville,1.0
6,1111111,Miami,2.0
7,20000,Miami,0.0
8,75000,Orlando,1.0
9,90000,Orlando,1.0
