## Ordinal Encoding 

#### One-Hot Encoding is for nominal data & Ordinal Encoding is for ordinal data

##### when you have ordinal data in X columns : then we use ordinal encoder but in target variable is categorical , label Encoding 

| Aspect                | Label Encoding | Ordinal Encoding |
| --------------------- | -------------- | ---------------- |
| Order meaningful      | No             | Yes              |
| Mapping based on      | Arbitrary      | Defined order    |
| Safe for nominal data | ❌              | ❌                |
| Common use            | Target labels  | Ordered features |


### Ordinal Encoding 

######  Example 

In [1]:
import numpy as np 
import pandas as pd

In [2]:
df = pd.read_csv("customer.csv")

In [3]:
df.sample(6)

Unnamed: 0,age,gender,review,education,purchased
25,57,Female,Good,School,No
44,77,Female,Average,UG,No
36,34,Female,Good,UG,Yes
45,61,Male,Poor,PG,Yes
11,74,Male,Good,UG,Yes
20,57,Female,Average,School,Yes


In [4]:
print(df["gender"].value_counts())
print(df["review"].value_counts())
print(df["purchased"].value_counts())
print(df["education"].value_counts())

gender
Female    29
Male      21
Name: count, dtype: int64
review
Poor       18
Good       18
Average    14
Name: count, dtype: int64
purchased
No     26
Yes    24
Name: count, dtype: int64
education
PG        18
School    16
UG        16
Name: count, dtype: int64


| Column        | Type               | Encoding         |
| ------------- | ------------------ | ---------------- |
| age           | Numerical          | None             |
| gender        | Nominal            | One-Hot          |
| review        | Ordinal            | Ordinal Encoding |
| education     | Ordinal            | Ordinal Encoding |
| purchased (y) | Categorical target | Label Encoding   |


In [5]:
df = df.iloc[:,2:]

In [6]:
df.head()

Unnamed: 0,review,education,purchased
0,Average,School,No
1,Poor,UG,No
2,Good,PG,No
3,Good,PG,No
4,Average,UG,No


In [7]:
X = df[["review", "education"]]
y = df["purchased"]

In [8]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

In [10]:
X_train

Unnamed: 0,review,education
16,Poor,UG
13,Average,School
47,Good,PG
27,Poor,PG
2,Good,PG
40,Good,School
42,Good,PG
21,Average,PG
31,Poor,School
36,Good,UG


In [11]:
from sklearn.preprocessing import OrdinalEncoder

In [12]:
from sklearn.preprocessing import OrdinalEncoder

oe = OrdinalEncoder(
    categories=[['Poor', 'Average', 'Good'],['School', 'UG', 'PG']]
)

In [13]:
oe.fit(X_train)

0,1,2
,categories,"[['Poor', 'Average', ...], ['School', 'UG', ...]]"
,dtype,<class 'numpy.float64'>
,handle_unknown,'error'
,unknown_value,
,encoded_missing_value,
,min_frequency,
,max_categories,


In [14]:
X_train = oe.transform(X_train)
X_test = oe.transform(X_test)

In [15]:
X_train

array([[0., 1.],
       [1., 0.],
       [2., 2.],
       [0., 2.],
       [2., 2.],
       [2., 0.],
       [2., 2.],
       [1., 2.],
       [0., 0.],
       [2., 1.],
       [2., 1.],
       [1., 2.],
       [0., 0.],
       [0., 2.],
       [1., 0.],
       [0., 2.],
       [0., 2.],
       [0., 2.],
       [2., 0.],
       [0., 1.],
       [2., 1.],
       [2., 2.],
       [2., 1.],
       [0., 2.],
       [0., 2.],
       [1., 1.],
       [1., 1.],
       [1., 0.],
       [1., 1.],
       [0., 1.],
       [2., 2.],
       [2., 0.],
       [2., 0.],
       [1., 1.],
       [2., 1.],
       [0., 0.],
       [1., 0.],
       [1., 1.],
       [2., 2.],
       [0., 1.]])

In [16]:
oe.categories_

[array(['Poor', 'Average', 'Good'], dtype=object),
 array(['School', 'UG', 'PG'], dtype=object)]

## LabelEncoding 

###### always remember 

In [17]:
from sklearn.preprocessing import LabelEncoder

In [18]:
le = LabelEncoder()

In [19]:
le.fit(y_train)

In [20]:
le.classes_

array(['No', 'Yes'], dtype=object)

In [22]:
y_train = le.transform(y_train)
y_test = le.transform(y_test)

In [23]:
y_train

array([1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1,
       1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1])