# Label Encoding:
Label encoding assigns each unique category value a numerical code. 
It is straightforward but introduces a new problem: the model might infer a natural 
ordering in categories, which might not be intended. For example: ["red"
< "blue" < "green"] to [O, 1, 2]

In [2]:
# List of unique classes
classes = ['ClassA', 'ClassB', 'ClassC', 'ClassD']  # Define the categories/classes

# List of instances to encode
instances = ['ClassA', 'ClassB', 'ClassC', 'ClassC', 'ClassD',
             'ClassD', 'ClassA', 'ClassB', 'ClassC', 'ClassD',
             'ClassA', 'ClassB']  # The data you want to encode using the class mapping

# Step 1: Create a mapping of label to integer
# Use enumerate to assign a unique integer to each class
label_to_int = {label: index for index, label in enumerate(classes)}

# Step 2: Encode instances into integers
# Replace each class label in the instances list with its corresponding integer
encoded_labels = [label_to_int[label] for label in instances]

# Step 3: Create a reverse mapping (integer to label)
# Create a dictionary to map integers back to their original class labels
int_to_label = {index: label for label, index in label_to_int.items()}

# Step 4: Decode integers back to labels
# Replace each integer in encoded_labels with its corresponding class label
decoded_labels = [int_to_label[index] for index in encoded_labels]

# Print results
print("Encoded labels:", encoded_labels)  # Display the encoded numerical labels
print("Decoded labels:", decoded_labels)  # Verify that decoding matches the original instances


Encoded labels: [0, 1, 2, 2, 3, 3, 0, 1, 2, 3, 0, 1]
Decoded labels: ['ClassA', 'ClassB', 'ClassC', 'ClassC', 'ClassD', 'ClassD', 'ClassA', 'ClassB', 'ClassC', 'ClassD', 'ClassA', 'ClassB']


# Sklearn - Label Encoder

In [3]:
from sklearn.preprocessing import LabelEncoder

In [5]:
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(instances)

print('Encoded labels: ',encoded_labels)

Encoded labels:  [0 1 2 2 3 3 0 1 2 3 0 1]


In [6]:
original_labels= label_encoder.inverse_transform(encoded_labels)
print('Encoded labels: ',encoded_labels)
print('Original labels: ',original_labels)

Encoded labels:  [0 1 2 2 3 3 0 1 2 3 0 1]
Original labels:  ['ClassA' 'ClassB' 'ClassC' 'ClassC' 'ClassD' 'ClassD' 'ClassA' 'ClassB'
 'ClassC' 'ClassD' 'ClassA' 'ClassB']
