# Feature Encoding

<h2 style="font-family: 'poppins'; font-weight: bold;">👨‍💻Author: Abid Hussain</h2>

[![GitHub](https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github)](https://github.com/abid4850) 
[![Kaggle](https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle)](https://www.kaggle.com/abidhussai512) 
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/abid-hussain-101846339/)  
[![Facebook](https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook)](https://www.facebook.com/profile.php?id=abid.hussainnoul) 
[![Twitter/X](https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter)](https://twitter.com/AbidHussai76533) 
[![Instagram](https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram)](https://www.instagram.com/abidhussainnoul/)

Feature encoding is the process of transforming categorical features into numeric features. This is necessary because machine learning algorithms can only handle numeric features. There are many different ways to encode categorical features, and each method has its own advantages and disadvantages. In this notebook, we will explore some of the most popular methods for encoding categorical features, such as:

1. Label encoding
2. Ordinal encoding
3. One-hot encoding
4. Binary encoding
5. Manual Encoding
This youtube video lecture can help you understand it better.

Feature Encoding in Python | Learn One-Hot, Label Encoding & More!

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# data load
df = sns.load_dataset('tips')
df.head()

In [None]:
df['time'].value_counts()

## 1. Label Encoding

In [None]:
# let's encode the time in labelencoder with sklearn
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
le = LabelEncoder()
df['encoded_time'] = le.fit_transform(df['time'])
df.head()

In [None]:
df['encoded_time'].value_counts()

In [None]:
df['day'].value_counts()

## 2. Ordinal Encoding

In [None]:
# ordinal encoding the day column using specific order
oe = OrdinalEncoder(categories=[['Thur', 'Fri', 'Sat', 'Sun']])
df['encoded_day'] = oe.fit_transform(df[['day']])
df.head()

In [None]:
df['encoded_day'].value_counts()


## 3. One hot encoding

In [None]:
# one hot encoding on day column
ohe = OneHotEncoder()
encoded_sex = ohe.fit_transform(df[['sex']]).toarray()

In [None]:
# example of one hot encoding
titanic = sns.load_dataset('titanic')
titanic.head()

In [None]:
# example of one hot encoding
titanic = sns.load_dataset('titanic')

onehot_encoder = OneHotEncoder()
embarked_onehot = onehot_encoder.fit_transform(titanic[['embarked']]).toarray()
embarked_onehot_df = pd.DataFrame(embarked_onehot, columns=onehot_encoder.get_feature_names_out(['embarked']))
titanic = pd.concat([titanic.reset_index(drop=True), embarked_onehot_df.reset_index(drop=True)], axis=1)
titanic.head()

In [None]:
titanic['embarked'].value_counts()
# titanic['embarked'].isnull().sum()

## 4. Binary Encoding

In [None]:
# !pip install category_encoders

In [None]:
df = sns.load_dataset('tips')
df.head()

In [None]:
df['day'].value_counts()

In [None]:
from category_encoders import BinaryEncoder

binary_encoder = BinaryEncoder()
df_binary = binary_encoder.fit_transform(df['time'])
df_binary.head()

## Following Assignments may help you to practice.
1. Assignment: how many types of feature encoding are there?
2. Assignment: When to use which type of feature encoding?

In [None]:
# use pandas for feature encoding

df = sns.load_dataset('tips')
df.head()

In [None]:
# use pandas get dummies
get_dummies = pd.get_dummies(df, columns=['day'])
get_dummies.head()

In [None]:
df['day'].value_counts()

## 5. Manual Encoding

In [None]:
# manual encoding using pandas
df = sns.load_dataset('tips')
df['day_encoded'] = df['day'].map({'Thur': 0, 'Fri': 1, 'Sat': 2, 'Sun': 3})
df.head()

# 👨‍💻Author: Dr. Muhammad Aamamr Tufail
GitHub Kaggle LinkedIn

YouTube Facebook TikTok

Twitter/X Instagram Email