# Ordinal Encoding
- used on input labels
- you provide the hierarchy of the categories and based on it, it assigns numeric values to each category

## Data collection

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('../Datasets/customer.csv')

In [3]:
df.sample(5)

Unnamed: 0,age,gender,review,education,purchased
6,18,Male,Good,School,No
37,94,Male,Average,PG,Yes
20,57,Female,Average,School,Yes
33,89,Female,Good,PG,Yes
31,22,Female,Poor,School,Yes


In [4]:
df.shape

(50, 5)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   age        50 non-null     int64 
 1   gender     50 non-null     object
 2   review     50 non-null     object
 3   education  50 non-null     object
 4   purchased  50 non-null     object
dtypes: int64(1), object(4)
memory usage: 2.1+ KB


## Initial observation
- Columns gender, review, education and purchased are categorical data
- gender can be transformed using ohe
- review,education can be transformed using ordinal encoding
- purchased being the output label can be transfromed using label encoding

In [6]:
df = df.iloc[:,2:]

In [7]:
df.sample(5)

Unnamed: 0,review,education,purchased
34,Average,School,No
22,Poor,PG,Yes
7,Poor,School,Yes
20,Average,School,Yes
31,Poor,School,Yes


## Train Test Split

In [8]:
from sklearn.model_selection import train_test_split

In [9]:
X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,0:2],df.iloc[:,-1],test_size = 0.2,random_state = 0)

In [10]:
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(40, 2) (10, 2) (40,) (10,)


## Applying OrdinalEncoder

In [11]:
from sklearn.preprocessing import OrdinalEncoder

In [12]:
encoder = OrdinalEncoder(categories = [['Poor','Average','Good'],['School','UG','PG']])

In [13]:
encoder.fit(X_train)

In [14]:
X_train.head()

Unnamed: 0,review,education
33,Good,PG
35,Poor,School
26,Poor,PG
34,Average,School
18,Good,School


In [15]:
X_train_transform = encoder.transform(X_train)
X_test_transform = encoder.transform(X_test)

In [16]:
X_train_transform = pd.DataFrame(X_train_transform,columns = X_train.columns)

In [17]:
X_train_transform.head()

Unnamed: 0,review,education
0,2.0,2.0
1,0.0,0.0
2,0.0,2.0
3,1.0,0.0
4,2.0,0.0


## Applying LabelEncoder on y

In [18]:
from sklearn.preprocessing import LabelEncoder

In [19]:
le = LabelEncoder()

In [20]:
le.fit(y_train)

In [21]:
y_train_transform = le.transform(y_train)
y_test_transform = le.transform(y_test)

In [22]:
y_train_transform = pd.DataFrame(y_train_transform)

In [23]:
y_train_transform.columns = ['Purchased']

In [24]:
y_train_transform.head()

Unnamed: 0,Purchased
0,1
1,1
2,0
3,0
4,0
