# **Feature Encoding**

### **Introduction**
>In machine learning, feature encoding is the process of converting categorical or non-numeric data into a numerical format that can be used by machine learning algorithms. Features are the input variables used to make predictions, and these features can be of different types, such as numerical or categorical. Since many machine learning algorithms require numerical input, feature encoding is crucial for handling categorical data.

### **Importance of feature encoding in machine learning**


> - **Algorithm Compatibility**: Many machine learning algorithms, such as linear regression or support vector machines, work with numerical input. Feature encoding ensures that these algorithms can handle categorical data effectively.

> - **Improved Performance**: Feature encoding can lead to better model performance. Models trained on encoded features often capture underlying patterns in the data more accurately.

> - **Consistency in Data Representation**: Feature encoding provides a consistent numerical representation of data, making it easier to compare, analyze, and process information.



## **Types of feature encoding**

![img](feature_encoding.png)


## **1. One Hot Encoding**

In [1]:
import pandas as pd
# Sample data
data = {'Color': ['Red', 'Green', 'Blue', 'Red']}
df = pd.DataFrame(data)
print(df)
# One-Hot Encoding
encoded_data = pd.get_dummies(df, columns=['Color'])
print(encoded_data)

   Color
0    Red
1  Green
2   Blue
3    Red
   Color_Blue  Color_Green  Color_Red
0       False        False       True
1       False         True      False
2        True        False      False
3       False        False       True


## **2. Label Encoding**

In [2]:
from sklearn.preprocessing import LabelEncoder
# Sample data
data = {'Animal': ['Dog', 'Cat', 'Bird', 'Dog', "Bird"]}
df = pd.DataFrame(data)
print(df)

# Label Encoding
label_encoder = LabelEncoder()
df['Animal_encoded'] = label_encoder.fit_transform(df['Animal'])
print(df)

  Animal
0    Dog
1    Cat
2   Bird
3    Dog
4   Bird
  Animal  Animal_encoded
0    Dog               2
1    Cat               1
2   Bird               0
3    Dog               2
4   Bird               0


## **3. Ordinal Encoding**

In [3]:
from sklearn.preprocessing import OrdinalEncoder
# Sample data
data = {'Size': ['Small', 'Medium', 'Large', 'Medium']}
df = pd.DataFrame(data)
print(df)

# Ordinal Encoding
ordinal_encoder = OrdinalEncoder(categories=[['Small', 'Medium', 'Large']])
df['Size_encoded'] = ordinal_encoder.fit_transform(df[['Size']])
print(df)

     Size
0   Small
1  Medium
2   Large
3  Medium
     Size  Size_encoded
0   Small           0.0
1  Medium           1.0
2   Large           2.0
3  Medium           1.0
