## Data Encoding:

1. Nominal/OHE Encoding
2. Lable and Ordinal Encoding
3. Target Guided Ordinal Encoding

---

### 1. Nominal or One Hot Encoding(OHE):

One hot encoding is a technique used to represent categorical data as numerical data, which is more suitable for ML algorithms. In this technique, each category is represented as a binary vector where each beat corresponds to a uniqe category. For example, if we have categorical variable "colour" with 3 possible values (Red,Green,Blue), then it can be represented in one hot encoding as below:

1. Red = [1,0,0]
2. Green = [0,1,0]
3. Blue = [0,0,1]

In [1]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

In [2]:
# create a simple dataframe
df = pd.DataFrame(
    {
        'color':['red','green','blue','green','red','blue']
    }
)
df.head()

Unnamed: 0,color
0,red
1,green
2,blue
3,green
4,red


In [7]:
# create an instance of ohe
ohe = OneHotEncoder()

# perform fit and transform
encoded = ohe.fit_transform(df[['color']]).toarray()
encoded

array([[0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.]])

In [8]:
encoded_df = pd.DataFrame(encoded,columns=ohe.get_feature_names_out())
encoded_df.head()

Unnamed: 0,color_blue,color_green,color_red
0,0.0,0.0,1.0
1,0.0,1.0,0.0
2,1.0,0.0,0.0
3,0.0,1.0,0.0
4,0.0,0.0,1.0


In [9]:
pd.concat([encoded_df,df],axis = 1)

Unnamed: 0,color_blue,color_green,color_red,color
0,0.0,0.0,1.0,red
1,0.0,1.0,0.0,green
2,1.0,0.0,0.0,blue
3,0.0,1.0,0.0,green
4,0.0,0.0,1.0,red
5,1.0,0.0,0.0,blue


In [None]:
# for new data
import numpy as np
color = np.random.choice(["red","green","blue"])
print(color)
ohe.transform([[color]]).toarray()

red




array([[0., 0., 1.]])