### Notebook for Ordinal Encoding and Label Encoding.

In this notebook, we will learn how to handle <b style = "color:orange">Ordinal Categorical Columns and transform them into <b style = "color:orange">inumeric columns before feeding those data into our ML model.

In [129]:
import numpy as np
import pandas as pd

In [130]:
"""
In this dataset, we have 5 features namely age, gender, review, education and purchased. Here review, education and 
purchased are ordinal features. We will only work with the ordinal categorical columns and apply ordinal encoding 
to review and education features and label encoding to purchased column. 

FYI, The purchased column tells us when people buy things, we recommended them to buy some other things and if they bought 
our recommended products or not.
"""

df = pd.read_csv('customer.csv')
df.sample(5)

Unnamed: 0,age,gender,review,education,purchased
40,39,Male,Good,School,No
15,75,Male,Poor,UG,No
20,57,Female,Average,School,Yes
1,68,Female,Poor,UG,No
43,27,Male,Poor,PG,No


In [131]:
df = df.iloc[:,2:]

In [132]:
df.sample(5)

Unnamed: 0,review,education,purchased
40,Good,School,No
37,Average,PG,Yes
28,Poor,School,No
31,Poor,School,Yes
36,Good,UG,Yes


### <b style = "color:green">Ordinal encoding</b>

Ordinal encoding converts each label into <b style = "color:orange">integer values</b> and the encoded data represents the sequence of labels. 

In [133]:
"""
For feature transformation, the first thing we need to do is we separate our training and test data. And apply fit 
method to our training data and then transform our training and test data afterwards. 
"""
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,:2],df.iloc[:,-1],test_size=0.2,random_state=2)

In [134]:
X_train.head()

Unnamed: 0,review,education
24,Average,PG
48,Good,UG
17,Poor,UG
12,Poor,School
27,Poor,PG


In [135]:
from sklearn.preprocessing import OrdinalEncoder

In [136]:
"""
Creating a object of ordinal encoder class. We will pass a list of categories by order to define which will hold
what value between the lowest and highest position sequentially. And also we pass these category in a list in the order 
similar to the one's which we have in our dataframe.

"""

oe = OrdinalEncoder(categories=[['Poor','Average','Good'],['School','UG','PG']])

### <b style = "color:green">fit_transform () method</b> 

In [137]:
"""
Using fit_trasform() method we can easily do fitting(learning) a transforming our data in one go.
"""
#X_train = oe.fit_transform(X_train)
#X_train

'\nUsing fit_trasform() method we can easily do fitting(learning) a transforming our data in one go.\n'

In [138]:
oe.fit(X_train)

OrdinalEncoder(categories=[['Poor', 'Average', 'Good'], ['School', 'UG', 'PG']])

In [139]:
oe.categories_

[array(['Poor', 'Average', 'Good'], dtype=object),
 array(['School', 'UG', 'PG'], dtype=object)]

In [140]:
X_train = oe.transform(X_train)
X_test = oe.transform(X_test)

In [141]:
X_train

array([[1., 2.],
       [2., 1.],
       [0., 1.],
       [0., 0.],
       [0., 2.],
       [2., 2.],
       [0., 1.],
       [2., 2.],
       [2., 0.],
       [0., 2.],
       [1., 1.],
       [1., 0.],
       [1., 1.],
       [0., 1.],
       [1., 1.],
       [0., 0.],
       [2., 2.],
       [0., 2.],
       [1., 2.],
       [2., 1.],
       [1., 1.],
       [2., 0.],
       [2., 2.],
       [1., 0.],
       [0., 2.],
       [2., 0.],
       [1., 2.],
       [2., 2.],
       [0., 0.],
       [1., 0.],
       [0., 0.],
       [2., 1.],
       [2., 1.],
       [2., 0.],
       [0., 2.],
       [0., 2.],
       [1., 1.],
       [0., 2.],
       [0., 1.],
       [2., 0.]])

In [142]:
X_test

array([[2., 1.],
       [2., 2.],
       [0., 0.],
       [2., 1.],
       [1., 0.],
       [1., 0.],
       [1., 1.],
       [0., 2.],
       [0., 2.],
       [2., 0.]])

### <b style = "color:green">Label Encoding</b>

In Label Encoding, we encode target labels with value between <b style = "color:orange">0 and n_classes-1.</b> This transformer should be used to encode target values, i.e. <b style = "color:orange">y, and not the input X.</b>

In [143]:
y_train.head(3)

24    Yes
48    Yes
17    Yes
Name: purchased, dtype: object

In [144]:
y_test.head(3)

36    Yes
47    Yes
28     No
Name: purchased, dtype: object

In [105]:
from sklearn.preprocessing import LabelEncoder

In [145]:
"""
LabelEncoder class does't need any parameters while creating a object of this class. It will put values automatically
to our target categories.
"""
le = LabelEncoder()

In [146]:
#y_train = le.fit_transform(y_train)

In [147]:
le.fit(y_train)

LabelEncoder()

In [148]:
y_train = le.transform(y_train)
y_test = le.transform(y_test)

In [149]:
y_train

array([1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0,
       0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0])