### 1. **Label Encoding**

**Label Encoding** converts categorical labels into numerical values. This method assigns a unique number to each category.

**Use Case**: Suitable for ordinal data where the categories have a meaningful order.

**Example**:

Imagine you have a feature called `color` with values `['red', 'blue', 'green']`. Label Encoding will map these values to integers.

In [1]:
from sklearn.preprocessing import LabelEncoder

# Sample data
colors = ['red', 'blue', 'green', 'blue', 'red']

# Initialize the LabelEncoder
label_encoder = LabelEncoder()

# Fit and transform the data
encoded_colors = label_encoder.fit_transform(colors)

print(f"Original colors: {colors}")
print(f"Encoded colors: {encoded_colors}")

Original colors: ['red', 'blue', 'green', 'blue', 'red']
Encoded colors: [2 0 1 0 2]


### 2. **One-Hot Encoding**

**One-Hot Encoding** converts categorical values into binary vectors. Each category is represented by a binary vector where only one element is `1` and the rest are `0`.

**Use Case**: Suitable for nominal data where categories do not have a meaningful order.

**Example**:

For the same `color` feature, One-Hot Encoding will create a vector for each color.

In [2]:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd

# Sample data
colors = [['red'], ['blue'], ['green'], ['blue'], ['red']]

# Initialize the OneHotEncoder
one_hot_encoder = OneHotEncoder(sparse=False)

# Fit and transform the data
encoded_colors = one_hot_encoder.fit_transform(colors)

# Convert to DataFrame for easier visualization
encoded_df = pd.DataFrame(encoded_colors, columns=one_hot_encoder.get_feature_names_out(['color']))

print(f"Original colors: {colors}")
print(f"One-Hot Encoded colors:\n{encoded_df}")

Original colors: [['red'], ['blue'], ['green'], ['blue'], ['red']]
One-Hot Encoded colors:
   color_blue  color_green  color_red
0         0.0          0.0        1.0
1         1.0          0.0        0.0
2         0.0          1.0        0.0
3         1.0          0.0        0.0
4         0.0          0.0        1.0




### 3. **Ordinal Encoding**

**Ordinal Encoding** is used for ordinal categorical features where the categories have a meaningful order. Each category is mapped to an integer representing its rank.

**Use Case**: Suitable for ordinal data where the order of categories matters.

**Example**:

Consider a feature `education_level` with values `['highschool', 'bachelor', 'master', 'phd']`.

In [7]:
from sklearn.preprocessing import OrdinalEncoder

# Sample data
education_levels = [['bachelor'], ['master'], ['highschool'], ['phd'], ['bachelor']]

# Define the order of categories
categories = [['highschool', 'bachelor', 'master', 'phd']]

# Initialize the OrdinalEncoder
ordinal_encoder = OrdinalEncoder(categories=categories)

# Fit and transform the data
encoded_levels = ordinal_encoder.fit_transform(education_levels)

print(f"Original education levels: {education_levels}")
print(f"Ordinal Encoded education levels:\n{encoded_levels}")

Original education levels: [['bachelor'], ['master'], ['highschool'], ['phd'], ['bachelor']]
Ordinal Encoded education levels:
[[1.]
 [2.]
 [0.]
 [3.]
 [1.]]


### 5. **MultiLabelBinarizer**

**MultiLabelBinarizer** is used to encode multi-label data. It converts each label in the list of labels into binary form (like One-Hot Encoding), but it can handle multiple labels per sample.

**Use Case**: When you have a feature with multiple labels per instance (e.g., genres of a movie).

**Example**:

Let's say you have a list of favorite colors per person, where each person can have more than one favorite color.


In [12]:
from sklearn.preprocessing import MultiLabelBinarizer

# Sample data: a list of colors (multiple labels per sample)
favorite_colors = [['red', 'blue'], ['blue'], ['green', 'red'], ['green'], ['blue', 'green']]

# Initialize the MultiLabelBinarizer
mlb = MultiLabelBinarizer()

# Fit and transform the data
encoded_colors = mlb.fit_transform(favorite_colors)

# Get the class labels (columns)
classes = mlb.classes_

print(f"Original favorite colors: {favorite_colors}")
print(f"Encoded favorite colors:\n{encoded_colors}")
print(f"Class labels: {classes}")

Original favorite colors: [['red', 'blue'], ['blue'], ['green', 'red'], ['green'], ['blue', 'green']]
Encoded favorite colors:
[[1 0 1]
 [1 0 0]
 [0 1 1]
 [0 1 0]
 [1 1 0]]
Class labels: ['blue' 'green' 'red']


### 6. **LabelBinarizer**

**LabelBinarizer** is similar to One-Hot Encoding, but it works for a single-label classification task and also supports converting binary data back to labels.

**Use Case**: When you want to binarize a single-label categorical feature.


**Explanation**:
- This works similarly to One-Hot Encoding but is more streamlined for binary classification.
- Each category (`red`, `blue`, `green`) is represented as a binary vector.



In [13]:
from sklearn.preprocessing import LabelBinarizer

# Sample data: single label per instance
colors = ['red', 'blue', 'green', 'blue', 'red']

# Initialize the LabelBinarizer
lb = LabelBinarizer()

# Fit and transform the data
encoded_colors = lb.fit_transform(colors)

# Get the class labels (columns)
classes = lb.classes_

print(f"Original colors: {colors}")
print(f"Encoded colors:\n{encoded_colors}")
print(f"Class labels: {classes}")

Original colors: ['red', 'blue', 'green', 'blue', 'red']
Encoded colors:
[[0 0 1]
 [1 0 0]
 [0 1 0]
 [1 0 0]
 [0 0 1]]
Class labels: ['blue' 'green' 'red']


### Summary of All Techniques:

- **LabelEncoder**: Converts categorical labels into integers.
- **OneHotEncoder**: Converts categorical values into binary vectors (suitable for non-ordinal data).
- **OrdinalEncoder**: Converts ordinal categorical values into integers (suitable for ordered categories).
- **TargetEncoder**: Encodes categorical features based on their relationship with the target variable.
- **MultiLabelBinarizer**: Binarizes multi-label data (multiple labels per instance).
- **LabelBinarizer**: Binarizes single-label categorical data (like a simplified One-Hot Encoding).