In [3]:
import numpy as np 
import pandas as pd
import re

# Extension Processing and Categorization in WDI Indicator Data

## Objective
The goal of this notebook is to process and categorize the **extensions** found in the WDI (World Development Indicators) dataset. Extensions provide additional details such as **gender, demographic groups, education level, and geographic location**. By extracting and structuring these extensions into separate columns ( or jsut one column), we aim to enhance the hierarchical organization of the data for better analysis and visualization.

## Challenges Encountered when analysing the data 
1. **Missing Extensions**: Some indicators in the WDI dataset lack extensions, making it difficult to determine additional attributes for certain data points.  
2. **Ambiguous Extensions**: Certain extensions, such as **"FE"**, do not always correspond to a single category (e.g., "FE" may indicate **Female** in some cases but represent something else in other contexts). This requires additional context or mapping strategies to ensure accurate classification.  




In [None]:
codes_dict = {
    "Gender": ["FE", "MA", "FM", "MF", "WF", "WH"],
    "Age": ["14", "YG", "OL", "C3"],
    "Degree": ["BA", "DO", "MS"],
    "Geography": ["R1", "R2", "R3", "R4", "R5", "R6", "RU", "UR"],
    "Time": ["5Y", "DY", "FY", "MO", "YR"],
    "Population": ["P1", "P2", "P3", "P5", "P6", "SP"],

}