# Data load
Data is collected from OECD database: https://gitlab.algobank.oecd.org/public-documentation/dotstat-migration/-/raw/main/OECD_Data_API_documentation.pdf
## Documentation

- Wage gap: https://data-explorer.oecd.org/vis?lc=en&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_EARNINGS%40AGE_WAGE_GAP&df[ag]=OECD.ELS.SAE&df[vs]=1.0&dq=.S_PA_WP....MEAN._T&pd=2000%2C&to[TIME_PERIOD]=false
- SIGI: https://data-explorer.oecd.org/vis?tm=DF_SIGI_2023&pg=0&snb=1&vw=tb&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_SIGI%40DF_SIGI_2023&df[ag]=OECD.DEV.NPG&df[vs]=&lo=5&lom=LASTNPERIODS&dq=..&ly[rw]=REF_AREA&ly[cl]=MEASURE&to[TIME_PERIOD]=false
- Representation: https://data-explorer.oecd.org/vis?lc=en&pg=0&snb=1&vw=ov&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_GOV%40DF_GOV_EMPPS_REP_2023&df[ag]=OECD.GOV.GIP&df[vs]=&pd=2007%2C&dq=A.AUT.....&ly[rw]=MEASURE%2CSECTOR%2CUNIT_MEASURE&ly[cl]=TIME_PERIOD&to[TIME_PERIOD]=false

In [14]:
from io import StringIO

# Define URLs for the datasets
datasets = {
    "wage_gap": "https://sdmx.oecd.org/public/rest/data/OECD.ELS.SAE,DSD_EARNINGS@GENDER_WAGE_GAP,/......_T?startPeriod=2005&dimensionAtObservation=AllDimensions&format=csvfilewithlabels",
    "SIGI": "https://sdmx.oecd.org/public/rest/data/OECD.DEV.NPG,DSD_SIGI@DF_SIGI_2023,/all?startPeriod=2019&dimensionAtObservation=AllDimensions&format=csvfilewithlabels",
    "representation": "https://sdmx.oecd.org/public/rest/data/OECD.GOV.GIP,DSD_GOV@DF_GOV_EMPPS_REP_2023,/A.AUT.....?startPeriod=2007&dimensionAtObservation=AllDimensions&format=csvfilewithlabels"
}

# Dictionary to store the loaded dataframes
dataframes = {}

# Loop through the datasets and load them
for name, url in datasets.items():
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        dataframes[name] = pd.read_csv(StringIO(response.text))
        print(f"Loaded dataset: {name}")
        print(dataframes[name].head())
    else:
        print(f"Failed to retrieve data for {name}: {response.status_code}")

Loaded dataset: wage_gap
  STRUCTURE                                    STRUCTURE_ID  STRUCTURE_NAME  \
0  DATAFLOW  OECD.ELS.SAE:DSD_EARNINGS@GENDER_WAGE_GAP(1.0)             NaN   
1  DATAFLOW  OECD.ELS.SAE:DSD_EARNINGS@GENDER_WAGE_GAP(1.0)             NaN   
2  DATAFLOW  OECD.ELS.SAE:DSD_EARNINGS@GENDER_WAGE_GAP(1.0)             NaN   
3  DATAFLOW  OECD.ELS.SAE:DSD_EARNINGS@GENDER_WAGE_GAP(1.0)             NaN   
4  DATAFLOW  OECD.ELS.SAE:DSD_EARNINGS@GENDER_WAGE_GAP(1.0)             NaN   

  ACTION REF_AREA  Unnamed: 5 MEASURE  Unnamed: 7   UNIT_MEASURE  Unnamed: 9  \
0      I      IND         NaN     GWP         NaN  PT_WG_SAL_M_D         NaN   
1      I      IND         NaN     GWP         NaN  PT_WG_SAL_M_D         NaN   
2      I      AUS         NaN     GWP         NaN  PT_WG_SAL_M_D         NaN   
3      I      AUS         NaN     GWP         NaN  PT_WG_SAL_M_D         NaN   
4      I      AUS         NaN     GWP         NaN  PT_WG_SAL_M_D         NaN   

   ...  OBS_VALUE  

In [12]:
for name, df in dataframes.items():
    print(f"Dimensions of {name}: {df.shape}")

Dimensions of wage_gap: (1834, 30)
Dimensions of SIGI: (940, 16)
Dimensions of representation: (69, 30)
