##  Machine Learning 38: Inverse Encoding/Transform /Decoding Categorical Variables

### What is Inverse Encoding or Inverse Transform?

**Inverse encoding** (also called **inverse transform** or **decoding**) is the process of **converting encoded categorical data back into its original categorical labels**.

When we encode categorical variables (like “Red”, “Blue”, “Green”) into numerical or binary formats for machine learning models, we sometimes need to **revert those encoded values back to their original categories** for interpretation or presentation.
This reverse process is known as **inverse encoding**.


### Why is it Necessary After Encoding?

Inverse encoding is necessary because machine learning models only understand **numerical inputs**, not text or categorical strings.
However, when interpreting the model’s output, evaluating results, or visualizing predictions, we need to **map the numeric predictions back to their original categorical labels**.

For example:

* Encoded: `[0, 1, 2]`
* Original labels: `["Low", "Medium", "High"]`

After prediction, you’ll want to **decode 0 → "Low"**, etc., to make results meaningful to humans.

## **Relation to Encoding**

### Relation with Common Encoding Techniques

| Encoding Technique   | Description                                                                          | Inverse Transform Meaning                                          |
| -------------------- | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------ |
| **Label Encoding**   | Converts categories into numeric integers (e.g., A→0, B→1, C→2).                     | Converts numbers back into category labels.                        |
| **One-Hot Encoding** | Converts each category into a binary vector (e.g., A→[1,0,0], B→[0,1,0], C→[0,0,1]). | Converts binary vectors back to their original category names.     |
| **Ordinal Encoding** | Assigns integer values based on an ordered scale (e.g., Low→1, Medium→2, High→3).    | Converts these integers back into the original ordered categories. |

###  When and Why We Might Need to Reverse Encodings

We use inverse encoding when:

1. **Interpreting Model Predictions:**
   E.g., converting predicted encoded labels back to category names.
2. **Presenting Results or Reports:**
   E.g., showing “Yes” / “No” instead of 1 / 0 in a confusion matrix.
3. **Post-Processing Pipelines:**
   E.g., when combining predictions with the original dataset for analysis.
4. **Debugging or Validation:**
   E.g., verifying if the encoding-decoding pipeline preserves the original categories correctly.



In [1]:
# import libraries
import pandas as pd
import numpy as numpy
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder

In [2]:
# load the data
df = sns.load_dataset('titanic')
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [3]:
# impute missing values for age, embarked, embark_town and deck
df['age'].fillna(df['age'].median(), inplace=True)
df['embarked'].fillna(df['embarked'].mode()[0], inplace=True)
df['embark_town'].fillna(df['embark_town'].mode()[0], inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['age'].fillna(df['age'].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['embarked'].fillna(df['embarked'].mode()[0], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we

In [4]:
# drop deck column
df.drop('deck', axis=1, inplace=True)

In [5]:
# encode all categorical variables and object variables

le_sex = LabelEncoder()
le_embarked = LabelEncoder()
le_class = LabelEncoder()
le_who = LabelEncoder()
le_embark_town = LabelEncoder()
le_alive = LabelEncoder()

In [6]:
df['sex'] = le_sex.fit_transform(df['sex'])
df['embarked'] = le_embarked.fit_transform(df['embarked'])
df['class'] = le_class.fit_transform(df['class'])
df['who'] = le_who.fit_transform(df['who'])
df['embark_town'] = le_embark_town.fit_transform(df['embark_town'])
df['alive'] = le_alive.fit_transform(df['alive'])
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,embark_town,alive,alone
0,0,3,1,22.0,1,0,7.25,2,2,1,True,2,0,False
1,1,1,0,38.0,1,0,71.2833,0,0,2,False,0,1,False
2,1,3,0,26.0,0,0,7.925,2,2,2,False,2,1,True
3,1,1,0,35.0,1,0,53.1,2,0,2,False,2,1,False
4,0,3,1,35.0,0,0,8.05,2,2,1,True,2,0,True


In [7]:
# inverse transform the data

df['sex'] = le_sex.inverse_transform(df['sex'])
df['embarked'] = le_embarked.inverse_transform(df['embarked'])
df['class'] = le_class.inverse_transform(df['class'])
df['who'] = le_who.inverse_transform(df['who'])
df['embark_town'] = le_embark_town.inverse_transform(df['embark_town'])
df['alive'] = le_alive.inverse_transform(df['alive'])
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,Southampton,no,True


# One Hot Encoding

In [8]:
from sklearn.preprocessing import OneHotEncoder

In [9]:
df = sns.load_dataset('titanic')

In [10]:
cat_columns = ['sex', 'embarked']

In [11]:
# Changed 'sparse' to 'sparse_output' to match the updated scikit-learn API
encoder = OneHotEncoder(sparse_output=False)
encoded_df = pd.DataFrame(encoder.fit_transform(df[cat_columns]))

In [12]:
# concatenate the dataframes 
df = pd.concat([df, encoded_df], axis=1)
# df.drop(cat_columns, axis=1, inplace=True)
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,...,deck,embark_town,alive,alone,0,1,2,3,4,5
0,0,3,male,22.0,1,0,7.25,S,Third,man,...,,Southampton,no,False,0.0,1.0,0.0,0.0,1.0,0.0
1,1,1,female,38.0,1,0,71.2833,C,First,woman,...,C,Cherbourg,yes,False,1.0,0.0,1.0,0.0,0.0,0.0
2,1,3,female,26.0,0,0,7.925,S,Third,woman,...,,Southampton,yes,True,1.0,0.0,0.0,0.0,1.0,0.0
3,1,1,female,35.0,1,0,53.1,S,First,woman,...,C,Southampton,yes,False,1.0,0.0,0.0,0.0,1.0,0.0
4,0,3,male,35.0,0,0,8.05,S,Third,man,...,,Southampton,no,True,0.0,1.0,0.0,0.0,1.0,0.0


In [13]:
# We need to extract the original categories for each encoded column
original_categories = {col: encoder.categories_[i] for i, col in enumerate(cat_columns)}

In [14]:
# Manual creation of feature names
feature_names = []
for i, col in enumerate(cat_columns):
    for category in encoder.categories_[i]:
        feature_names.append(f"{col}_{category}")

In [15]:
encoded_df = pd.DataFrame(encoded_df, columns=feature_names)

df.head()


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,...,deck,embark_town,alive,alone,0,1,2,3,4,5
0,0,3,male,22.0,1,0,7.25,S,Third,man,...,,Southampton,no,False,0.0,1.0,0.0,0.0,1.0,0.0
1,1,1,female,38.0,1,0,71.2833,C,First,woman,...,C,Cherbourg,yes,False,1.0,0.0,1.0,0.0,0.0,0.0
2,1,3,female,26.0,0,0,7.925,S,Third,woman,...,,Southampton,yes,True,1.0,0.0,0.0,0.0,1.0,0.0
3,1,1,female,35.0,1,0,53.1,S,First,woman,...,C,Southampton,yes,False,1.0,0.0,0.0,0.0,1.0,0.0
4,0,3,male,35.0,0,0,8.05,S,Third,man,...,,Southampton,no,True,0.0,1.0,0.0,0.0,1.0,0.0
