### Data Encoding

#### 1. Nominal/OHE Encoding
#### 2. Label and Ordinal Encoding
#### 3. Target Guided Ordinal Encoding

### Nominal/OHE Encoding

#### One Hot encoding, also known as nominal encoding, is a technique used to represent categorical data as numerical data, which is more suitable for machine learning algorithms. In this techinque, each category is represented as a binary vector where each bit corresponds to a unique category. For example, if we have a categorical variable "color" with three possible values(red,green,blue), we can represent it one hot coding as follows:

#### 1.Red:[1,0,0]
#### 2.Green[0,1,0]
#### 3.Blue[0,0,1]


In [2]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
import numpy as np

In [6]:
## create a simple dataframe
df = pd.DataFrame({
    'color': ['red', 'green', 'blue', 'blue', 'red']})

In [7]:
df.head()

Unnamed: 0,color
0,red
1,green
2,blue
3,blue
4,red


In [19]:
## create an instance of onehotencoder 

encoder = OneHotEncoder()

In [20]:
##perform fit and transform

encoded = encoder.fit_transform(df[['color']]).toarray()

In [30]:
encoded_df = pd.DataFrame(encoded, columns = encoder.get_feature_names_out())

In [31]:
encoded_df

Unnamed: 0,color_blue,color_green,color_red
0,0.0,0.0,1.0
1,0.0,1.0,0.0
2,1.0,0.0,0.0
3,1.0,0.0,0.0
4,0.0,0.0,1.0


In [26]:
##for new data

encoder.transform([['green']]).toarray()



array([[0., 1., 0.]])

In [29]:
encoder.get_feature_names_out()

array(['color_blue', 'color_green', 'color_red'], dtype=object)

In [32]:
pd.concat([df, encoded_df], axis = 1)

Unnamed: 0,color,color_blue,color_green,color_red
0,red,0.0,0.0,1.0
1,green,0.0,1.0,0.0
2,blue,1.0,0.0,0.0
3,blue,1.0,0.0,0.0
4,red,0.0,0.0,1.0


In [33]:
import seaborn as sns

tips_df =sns.load_dataset('tips')

In [34]:
tips_df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [37]:
##perform fit and transform # encoding

tips_encoded = encoder.fit_transform(tips_df[['sex', 'smoker', 'day', 'time']]).toarray()

tips_encoded_df = pd.DataFrame(tips_encoded, columns = encoder.get_feature_names_out())

pd.concat([tips_df, tips_encoded_df], axis = 1)



Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,sex_Female,sex_Male,smoker_No,smoker_Yes,day_Fri,day_Sat,day_Sun,day_Thur,time_Dinner,time_Lunch
0,16.99,1.01,Female,No,Sun,Dinner,2,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
1,10.34,1.66,Male,No,Sun,Dinner,3,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
2,21.01,3.50,Male,No,Sun,Dinner,3,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
3,23.68,3.31,Male,No,Sun,Dinner,2,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
4,24.59,3.61,Female,No,Sun,Dinner,4,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
240,27.18,2.00,Female,Yes,Sat,Dinner,2,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
241,22.67,2.00,Male,Yes,Sat,Dinner,2,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0
242,17.82,1.75,Male,No,Sat,Dinner,2,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
