## **Feature Encoding**
Feature encoding is the process of transforming categorical features into numeric features. This is necessary because machine learning algorithms can only handle numeric features. There are many different ways to encode categorical features, and each method has its own advantages and disadvantages. In this notebook, we will explore some of the most popular methods for encoding categorical features, such as:

Label encoding<br>
Ordinal encoding<br>
One-hot encoding<br>
Binary encoding<br>
Manual Encoding<br>
This youtube video lecture can help you understand it better.

Feature Encoding in Python | Learn One-Hot, Label Encoding & More!

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns


In [3]:
df = sns.load_dataset("tips")
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


## **Label Encoding**

In [4]:
from sklearn.preprocessing import LabelEncoder,OneHotEncoder,OrdinalEncoder
le = LabelEncoder()
df['encoded_time'] = le.fit_transform(df['time'])
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,encoded_time
0,16.99,1.01,Female,No,Sun,Dinner,2,0
1,10.34,1.66,Male,No,Sun,Dinner,3,0
2,21.01,3.5,Male,No,Sun,Dinner,3,0
3,23.68,3.31,Male,No,Sun,Dinner,2,0
4,24.59,3.61,Female,No,Sun,Dinner,4,0


## **Ordinal Encoding**

In [6]:
oe = OrdinalEncoder(categories=[['Thur', 'Fri', 'Sat', 'Sun']])
df['encoded_day'] = oe.fit_transform(df[['day']])
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,encoded_time,encoded_day
0,16.99,1.01,Female,No,Sun,Dinner,2,0,3.0
1,10.34,1.66,Male,No,Sun,Dinner,3,0,3.0
2,21.01,3.5,Male,No,Sun,Dinner,3,0,3.0
3,23.68,3.31,Male,No,Sun,Dinner,2,0,3.0
4,24.59,3.61,Female,No,Sun,Dinner,4,0,3.0


## **OneHot Encoding**

In [10]:
ohe = OneHotEncoder(sparse_output=False)
encoded = ohe.fit_transform(df[['sex']])
encoded_df = pd.DataFrame(encoded,columns=ohe.get_feature_names_out(['sex']))
df = pd.concat([df, encoded_df], axis=1)
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,encoded_time,encoded_day,sex_Female,sex_Male
0,16.99,1.01,Female,No,Sun,Dinner,2,0,3.0,1.0,0.0
1,10.34,1.66,Male,No,Sun,Dinner,3,0,3.0,0.0,1.0
2,21.01,3.5,Male,No,Sun,Dinner,3,0,3.0,0.0,1.0
3,23.68,3.31,Male,No,Sun,Dinner,2,0,3.0,0.0,1.0
4,24.59,3.61,Female,No,Sun,Dinner,4,0,3.0,1.0,0.0


## **Dummies**

In [11]:
df = pd.get_dummies(df,columns=['day'])
df

Unnamed: 0,total_bill,tip,sex,smoker,time,size,encoded_time,encoded_day,sex_Female,sex_Male,day_Thur,day_Fri,day_Sat,day_Sun
0,16.99,1.01,Female,No,Dinner,2,0,3.0,1.0,0.0,False,False,False,True
1,10.34,1.66,Male,No,Dinner,3,0,3.0,0.0,1.0,False,False,False,True
2,21.01,3.50,Male,No,Dinner,3,0,3.0,0.0,1.0,False,False,False,True
3,23.68,3.31,Male,No,Dinner,2,0,3.0,0.0,1.0,False,False,False,True
4,24.59,3.61,Female,No,Dinner,4,0,3.0,1.0,0.0,False,False,False,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Dinner,3,0,2.0,0.0,1.0,False,False,True,False
240,27.18,2.00,Female,Yes,Dinner,2,0,2.0,1.0,0.0,False,False,True,False
241,22.67,2.00,Male,Yes,Dinner,2,0,2.0,0.0,1.0,False,False,True,False
242,17.82,1.75,Male,No,Dinner,2,0,2.0,0.0,1.0,False,False,True,False
