# One Hot Encoding vs Label Encoding

One Hot Encoding and Label Encoding are machine learning techniques for converting categorical data into numerical format. 

Since most machine learning algorithms require numerical input to make predictions, these encoding methods simplify categorical variables, enabling algorithms to identify patterns and relationships in the data.

### Label Encoding: Converting Categories into Numbers

- Each unique category is assigned an integer value.

When working with data like customer feedback — for example, **"Poor," "Fair," "Good," "Very Good," and "Excellent"** — computers find it easier to process numbers instead of words.

Label Encoding helps by assigning a unique number to each category.

For example:

Poor → 1

Fair → 2

Good → 3

Very Good → 4

Excellent → 5

So, your feedback list becomes a list of numbers like [1, 2, 3, 4, 5], making it easier for machine learning models to use the data.

![image.png](attachment:image.png)

In [None]:
from sklearn.preprocessing import LabelEncoder

# Example list of customer feedback
feedback = ['Poor', 'Fair', 'Good', 'Very Good', 'Excellent']

# Create LabelEncoder object
label_encoder = LabelEncoder()

# Fit and transform the feedback list
encoded_feedback = label_encoder.fit_transform(feedback)

print("Original Feedback:", feedback)
print("Encoded Feedback:", encoded_feedback)


### One-Hot Encoding: Giving Each Category Its Own Column

One-Hot Encoding is a method to convert categorical data into numbers, but in a different way than label encoding.

Instead of assigning one number to each category, it creates a new binary column (with values 0 or 1) for every category.

For example, if you have fruits like "Apple" and "Pear," One-Hot Encoding creates two new columns:

- Apple
- Pear

For each row:

- If the fruit is "Apple," the Apple column will have 1 and Pear will have 0.

- If the fruit is "Pear," the Pear column will have 1 and Apple will have 0.

This way, the data becomes clear and machine-readable without assuming any order between categories.

![image.png](attachment:image.png)

In [None]:
import pandas as pd

# Example data
data = {'Fruit': ['Apple', 'Pear', 'Apple', 'Pear', 'Apple']}

# Create a DataFrame
df = pd.DataFrame(data)

# One-Hot Encoding using pandas
encoded_df = pd.get_dummies(df, columns=['Fruit'])

print(encoded_df)

#### Comparing Label Encoding and One-Hot Encoding



##### Key Differences:
- Interpretation of Categories:

    - Label Encoding assigns numbers, which can sometimes confuse the model if the categories are nominal (no order).

    - One-Hot Encoding creates separate binary columns for each category, making it clearer for the model and preventing any misinterpretation of order.

- Memory Consumption:

    - Label Encoding is memory efficient as it only uses a single column to represent all categories.

    - One-Hot Encoding increases the number of columns, which could lead to higher memory usage, especially for datasets with many categories.

- Performance with Machine Learning Algorithms:

    - Label Encoding might cause performance issues for some algorithms (like linear models) due to the artificial relationship it imposes between categories.

    - One-Hot Encoding works better with most algorithms, especially those that don't assume any order in the features (e.g., decision trees, random forests).

- Use Case:

    - Label Encoding is preferred when there is a natural ordinal relationship between the categories (e.g., low, medium, high).

    - One-Hot Encoding is best used for nominal categories (e.g., color, weather, type of animal) where no order exists between categories.

#### Conclusion:
- Label Encoding is simpler and more memory-efficient but may lead to problems with non-ordinal categorical data due to the creation of an artificial relationship between categories.

- One-Hot Encoding is more suitable for nominal data, as it prevents any misinterpretation of the category relationship, but it requires more memory and could lead to a higher number of features.

![image.png](attachment:image.png)