## Q1. What is the difference between Ordinal Encoding and Label Encoding? Provide an example of when you
might choose one over the other.

In [None]:

Ordinal Encoding and Label Encoding are both techniques used to convert categorical variables into numerical representations, but they differ in how they handle the relationship between the categories.

Ordinal Encoding:

Ordinal Encoding assigns a unique integer value to each category, but it also considers the order or rank of the categories.
It is suitable for categorical variables where the categories have a meaningful order or hierarchy.
For example, ordinal encoding might assign the labels {"Low": 0, "Medium": 1, "High": 2} to a variable indicating the level of risk.
Label Encoding:

Label Encoding also assigns a unique integer value to each category, but it does not consider any inherent order or hierarchy among the categories.
It is appropriate for categorical variables where there is no meaningful order or hierarchy.
For example, label encoding might assign the labels {"Red": 0, "Blue": 1, "Green": 2} to a variable indicating colors.
Here's an example illustrating the difference between Ordinal Encoding and Label Encoding:


from sklearn.preprocessing import OrdinalEncoder, LabelEncoder

# Example data: car sizes with ordinal relationship
car_sizes = ["Compact", "Midsize", "Full-size", "Compact", "Midsize"]

# Ordinal Encoding
ordinal_encoder = OrdinalEncoder(categories=[["Compact", "Midsize", "Full-size"]])
ordinal_encoded = ordinal_encoder.fit_transform([[size] for size in car_sizes])

print("Ordinal Encoded car sizes:", ordinal_encoded.ravel())

# Label Encoding
label_encoder = LabelEncoder()
label_encoded = label_encoder.fit_transform(car_sizes)

print("Label Encoded car sizes:", label_encoded)
Output:


Ordinal Encoded car sizes: [0. 1. 2. 0. 1.]
Label Encoded car sizes: [0 1 2 0 1]
In this example:

Ordinal Encoding considers the order of the categories ("Compact" < "Midsize" < "Full-size") and assigns numeric labels accordingly.
Label Encoding does not consider any order among the categories and simply assigns numeric labels arbitrarily.
When to choose one over the other:

Use Ordinal Encoding when the categorical variable has an inherent order or hierarchy that needs to be preserved. For example, levels of education ("High School" < "Bachelor's" < "Master's" < "PhD").
Use Label Encoding when there is no meaningful order among the categories, and they can be treated as nominal. For example, types of fruits ("Apple", "Banana", "Orange").

## Q2. Explain how Target Guided Ordinal Encoding works and provide an example of when you might use it in
a machine learning project.

In [None]:
Calculate Mean/Median/Mode of Target Variable: For each category in the categorical variable, calculate the mean, median, or mode of the target variable within that category. This step captures the relationship between the category and the target.

Order Categories by Target Variable: Order the categories based on their mean, median, or mode of the target variable. Categories with higher mean (or median/mode) values of the target variable are assigned higher numeric labels, while categories with lower mean (or median/mode) values are assigned lower numeric labels.

Assign Numeric Labels: Assign ordinal numeric labels to the categories based on their order in terms of the target variable.

Encode Categorical Variable: Replace the original categorical variable with the assigned numeric labels.

Here's an example scenario where you might use Target Guided Ordinal Encoding in a machine learning project:

Suppose you're working on a classification task to predict customer churn in a telecom company. One of the features is "Tenure," which represents the duration of time a customer has been with the company. You believe that there is a monotonic relationship between tenure and the likelihood of churn—customers who have been with the company for a longer tenure are less likely to churn.

In this scenario, you can use Target Guided Ordinal Encoding to encode the "Tenure" feature based on its relationship with the target variable (churn). The categories of "Tenure" can be ordered based on the average churn rate within each category. Categories with lower churn rates are assigned lower numeric labels, while categories with higher churn rates are assigned higher numeric labels. This encoding captures the monotonic relationship between tenure and churn, allowing the model to effectively learn from this feature.

import pandas as pd
from sklearn.model_selection import train_test_split
from feature_engine.encoding import OrdinalEncoder

# Sample data
data = {
    'Tenure': [2, 5, 8, 1, 10, 3, 7, 6, 4, 9],
    'Churn': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]  # 0: No churn, 1: Churn
}

df = pd.DataFrame(data)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df[['Tenure']], df['Churn'], test_size=0.2, random_state=42)

# Apply Target Guided Ordinal Encoding
encoder = OrdinalEncoder(encoding_method='ordered')
X_train_encoded = encoder.fit_transform(X_train, y_train)
X_test_encoded = encoder.transform(X_test)

print("Encoded Training Data:")
print(X_train_encoded)

print("\nEncoded Testing Data:")
print(X_test_encoded)