# Feature Encoding / Data Encoding
- `Feature Encoding` is used to change the categorical data into the numerical form
- `Data Encoding` is used to change the data into the numerical form


## Types of Encoding
- 1. Label Encoding
- 2. One Hot Encoding
- 3. Ordinal Encoding
- 4. Count Encoding
- 5. Binary Encoding

## Why we need the Encoding?
- The encoding is a way to represent the data in a computer.
- To samplify the proformance of the Algorithm we use the Encoders
- The Encoder is a function that maps the data to a number
- Computer easly understand the data in numerical form
- It fast the data imputation in computer
- It fast the data processing in computer
- It fast the data storage in computer
- It change the different categories in simple way
- Efficient & fast performance of data imputation in computer & algorithm

***Python libraries used in feature encoding***

In [34]:
#Import the libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [35]:
#Import the sklearn libraries:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder

In [36]:
#Load the dataset with seaborn of tips:
df = sns.load_dataset('tips')

In [37]:
#Check the rows of dataset:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [38]:
#Value Counts:
df['time'].value_counts()

time
Dinner    176
Lunch      68
Name: count, dtype: int64

In [39]:
#Another column value counts:
df['sex'].value_counts()

sex
Male      157
Female     87
Name: count, dtype: int64

### Label Encoder
- It makes the labels of dataset
- It is used to convert the labels into numbers
- It uses the numbers like 1, 2, 3, 4

In [40]:
#Create the model of label Encoder:
le = LabelEncoder()
#Fit & transform the model:
df['time_encode'] = le.fit_transform(df['time'])

In [41]:
#Check the le rows:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,time_encode
0,16.99,1.01,Female,No,Sun,Dinner,2,0
1,10.34,1.66,Male,No,Sun,Dinner,3,0
2,21.01,3.5,Male,No,Sun,Dinner,3,0
3,23.68,3.31,Male,No,Sun,Dinner,2,0
4,24.59,3.61,Female,No,Sun,Dinner,4,0


In [42]:
#check only encoded column of time:
df['time'].head() 

0    Dinner
1    Dinner
2    Dinner
3    Dinner
4    Dinner
Name: time, dtype: category
Categories (2, object): ['Lunch', 'Dinner']

In [43]:
#Check only encoded column of sex:
df['sex'].head()

0    Female
1      Male
2      Male
3      Male
4    Female
Name: sex, dtype: category
Categories (2, object): ['Male', 'Female']

#### Ordinal Encoding of data
- It encoded the categorical data into order form
- It is used to convert the categorical data into numerical data
- It used to convert the data in 1, 2, 3, 4 from

In [44]:
#Check the column:
df['day'].value_counts()

day
Sat     87
Sun     76
Thur    62
Fri     19
Name: count, dtype: int64

In [45]:
# Make Ordinal model on the day column:
oe = OrdinalEncoder(categories= [['Thur', 'Fri', 'Sat', 'Sun']])
#Fit the model:
df['ordinal_day'] = oe.fit_transform(df[['day']])

In [46]:
#Check the head of model:
df['ordinal_day'].head()

0    3.0
1    3.0
2    3.0
3    3.0
4    3.0
Name: ordinal_day, dtype: float64

In [47]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,time_encode,ordinal_day
0,16.99,1.01,Female,No,Sun,Dinner,2,0,3.0
1,10.34,1.66,Male,No,Sun,Dinner,3,0,3.0
2,21.01,3.5,Male,No,Sun,Dinner,3,0,3.0
3,23.68,3.31,Male,No,Sun,Dinner,2,0,3.0
4,24.59,3.61,Female,No,Sun,Dinner,4,0,3.0


### One-hot Encoder
- It is used to convert the categorical data into the numeric form
- It is used to convert the data into the binary form
- It is used to convert the data into the one-hot encoded form
- It is used to convert the data into the identity matrix form
- It covert the data into 0, 1 form


In [48]:
#Apply the one-hot encoder
ohe = OneHotEncoder()
#fit the model:
ohe.fit_transform(df[['day']]).toarray()

array([[0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 0., 1., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.],


In [61]:
#Load the dataset of titanic with seaborn
df = sns.load_dataset('titanic')

In [50]:
#Check the head:
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [51]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Assuming 'df' is your original DataFrame
one_hot_encoder = OneHotEncoder()
embarked_onehot = one_hot_encoder.fit_transform(df[['embarked']])

# Convert the sparse matrix to a dense NumPy array
embarked_onehot_array = embarked_onehot.toarray()

# Create a new DataFrame from the one-hot encoded data
embarked_onehot_df = pd.DataFrame(embarked_onehot_array, columns=one_hot_encoder.get_feature_names_out())

# Concatenate the original DataFrame with the one-hot encoded DataFrame
df = pd.concat([df, embarked_onehot_df], axis=1)

In [52]:
#Check the head:
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,embarked_C,embarked_Q,embarked_S,embarked_nan
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False,0.0,0.0,1.0,0.0
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False,1.0,0.0,0.0,0.0
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True,0.0,0.0,1.0,0.0
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False,0.0,0.0,1.0,0.0
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True,0.0,0.0,1.0,0.0


### Binary Encoding
Binary encoding is a method of encoding data in a binary format. It is used to represent data in
a compact and efficient manner. Binary encoding is used in many applications, such as data
compression, data storage, and data transmission. It is also used in many programming languages
to represent data in a compact and efficient manner.

***import the library***

In [53]:
#Import the sklearn library:
from category_encoders import BinaryEncoder

In [54]:
#Check the who column head.
df['who'].head()

0      man
1    woman
2    woman
3    woman
4      man
Name: who, dtype: object

In [55]:
#Apply method:
binary_encoder = BinaryEncoder()
#Fit the model:
df_binary = binary_encoder.fit_transform(df['who'])

In [56]:
#Check the cloumn:
df_binary

Unnamed: 0,who_0,who_1
0,0,1
1,1,0
2,1,0
3,1,0
4,0,1
...,...,...
886,0,1
887,1,0
888,1,0
889,0,1


# Feature Encoding with Pandas
Methods used for Feature Encoding
- Get dummies
- Using binn method

***Import the library***

In [57]:
#Import the library:
import pandas as pd

In [62]:
#Apply the method of get dummies on the titanic dataset column who:
get_dummies = pd.get_dummies(df, columns=['who'])
#Check the head of dataset:
get_dummies.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,adult_male,deck,embark_town,alive,alone,who_child,who_man,who_woman
0,0,3,male,22.0,1,0,7.25,S,Third,True,,Southampton,no,False,False,True,False
1,1,1,female,38.0,1,0,71.2833,C,First,False,C,Cherbourg,yes,False,False,False,True
2,1,3,female,26.0,0,0,7.925,S,Third,False,,Southampton,yes,True,False,False,True
3,1,1,female,35.0,1,0,53.1,S,First,False,C,Southampton,yes,False,False,False,True
4,0,3,male,35.0,0,0,8.05,S,Third,True,,Southampton,no,True,False,True,False
