# Label Encoding

It Assigns a unique integer to each category in the feature. It converts categories into ordinal integers.

**Advantages:**

- Preserves the ordinality of the categories which can be useful for certain algorithms.
- Saves memory as it requires only one column to represent the categories.

**Disadvantages:**

- When there's no inherent order among categories, Label Encoding may lead to incorrect results.
- In non-ordinal data, the assigned numbers may mislead algorithms into assuming an order.

**Use Cases:**

- Suitable for ordinal categorical variables where the order matters such as low, medium and high.

In [4]:
from sklearn.preprocessing import LabelEncoder

# Sample categorical data
data = ['red', 'green', 'blue', 'blue', 'green']

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# label encoder and transform categories
encoded_data = label_encoder.fit_transform(data)

print("Original data:", data)
print("Encoded data:", encoded_data)


Original data: ['red', 'green', 'blue', 'blue', 'green']
Encoded data: [2 1 0 0 1]


# One-Hot Encoding 

It Creates binary dummy variables for each category where each variable represents a unique category and has a value of 0 or 1.

**Advantages:**

- Maintains the independence between categories which is important for algorithms that assume numerical independence.
- Avoids imposing arbitrary ordinality on categorical variables.

**Disadvantages:**

- Can lead to the curse of dimensionality especially if the categorical feature has a large number of unique categories.
- Increases the size of the dataset significantly as it creates a new binary variable for each category.

**Use Cases:**

- Preferred for nominal categorical variables where there is no inherent order among categories.

In [7]:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd

# Sample categorical data
data = ['red', 'green', 'blue', 'blue', 'green']

# Converting data to DataFrame format
df = pd.DataFrame(data, columns=['Color'])

# OneHotEncoder
onehot_encoder = OneHotEncoder()

# Fitting and transforming one-hot encoder on data
onehot_encoded = onehot_encoder.fit_transform(df[['Color']]).toarray()

# Converting one-hot encoded data back to DataFrame
onehot_df = pd.DataFrame(onehot_encoded, columns=onehot_encoder.get_feature_names_out(['Color']))

# Concatenating riginal DataFrame with one-hot encoded DataFrame
df_encoded = pd.concat([df, onehot_df], axis=1)

print("Original DataFrame:")
print(df)
print("\nOne-Hot Encoded DataFrame:")
print(df_encoded)


Original DataFrame:
   Color
0    red
1  green
2   blue
3   blue
4  green

One-Hot Encoded DataFrame:
   Color  Color_blue  Color_green  Color_red
0    red         0.0          0.0        1.0
1  green         0.0          1.0        0.0
2   blue         1.0          0.0        0.0
3   blue         1.0          0.0        0.0
4  green         0.0          1.0        0.0
