### One-Hot Encoding

One-Hot Encoding creates a binary column for each category and assigns a value of 1 to the column corresponding to the category for each instance, and 0 to all other columns. For example:

* Red -> [1, 0, 0]
* Green -> [0, 1, 0]
* Blue -> [0, 0, 1]

#### Pros

1. Avoids the problem of introducing ordinal relationships between categories.
2. Suitable for nominal data where categories do not have an inherent order.

#### Cons

Can lead to a significant increase in the dimensionality of the dataset, especially with high cardinality features.

#### When to use

Ideally for categorical features with less than 10 categories (max 50 categories can be considered)

In [1]:
from sklearn.preprocessing import OneHotEncoder

# Sample data
colors = [['Red'], ['Green'], ['Blue'], ['Green'], ['Red'], ['Blue']]

# Initialize One-Hot Encoder
one_hot_encoder = OneHotEncoder(sparse=False)

# Fit and transform the data
encoded_colors = one_hot_encoder.fit_transform(colors)

print(encoded_colors)

[[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]]


