01 Data cleaning: HDR
===

Human Development Reports (HDR) calculates the Gender Inequality Index (GII) since 1990 by combining different indicators. That suits to our gender related questions regarding politics, jobs and university, giving as an unique possibility for a reality check later on.
However, the data has to be collected using REST services. Here we combine all of the requests into one convinient csv-file.  

# Downloads
We need to call the api mentioned on https://hdr.undp.org/data-center/documentation-and-downloads, as well as the available country codes, years and indicators, which are discoverable in the file from the link "All composite indices and components time series (1990-2021) Metadata" on the same website.

# References
UNDP (United Nations Development Programme). 2021. 2021 Global Multidimensional Poverty Index (MPI): Unmasking disparities by ethnicity, caste and gender. New York.

In [15]:
#bb trys stuff
import requests
import pandas as pd

years = range(1990,2022,1)
countryCodes = ['DEU','CHN','TUR','NGA','PER','IRN','USA','SRB','ZAF','PAK']
list10Countries = ['Germany','China','Turkey','Nigeria','Peru','Iran','United States','Serbia','South Africa','Pakistan']
indicators = ['gii', 'gii_rank', 'lfpr_f', 'lfpr_m', 'pr_f', 'pr_m', 'se_f', 'se_m']
"""
"gii - Gender Inequality Index (value)"
"gii_rank - GII Rank"

"lfpr_f - Labour force participation rate, female (% ages 15 and older)"
"lfpr_m - Labour force participation rate, male (% ages 15 and older)"
"pr_f - Share of seats in parliament, female (% held by women)"
"pr_m - Share of seats in parliament, male (% held by men)"
"se_f - Population with at least some secondary education, female (% ages 25 and older)"
"se_m - Population with at least some secondary education, male (% ages 25 and older)"

"mmr - Maternal Mortality Ratio (deaths per 100,000 live births)"
"abr - Adolescent Birth Rate (births per 1,000 women ages 15-19)"
"""
dfHDR = pd.DataFrame(columns=['code', 'year', 'gii', 'gii_rank', 'lfpr_f', 'lfpr_m', 'pr_f', 'pr_m', 'se_f', 'se_m'])

for idCode, code in enumerate(countryCodes):
    for idYear, year in enumerate(years):
        #api = 'https://api.hdrdata.org/CountryIndicators/'+code+'/'+str(year)
        newRow=[code, year]
        for idIndicator, indicator in enumerate(indicators):
            api = 'https://api.hdrdata.org/CountryIndicators/filter?country='+code+'&year='+str(year)+'&indicator='+indicator
            r = requests.get(api, auth=('user', 'pass')) 
            print('  === status code ===  '+str(r.status_code)+'     country='+code+'&year='+str(year)+'&indicator='+indicator)
            print(r.json())
            if r.json()==[]:
                newRow.append('NAN')
            else:
                newRow.append(r.json()[0]['value'])
        dfHDR.loc[idCode*len(years)+idYear]=newRow

dfHDR.to_csv('..//data//HDR.csv')

  === status code ===  200     country=DEU&year=1990&indicator=gii
[{'country': 'DEU - Germany', 'index': 'GII - Gender Inequality Index', 'indicator': 'gii - Gender Inequality Index (value)', 'year': '1990', 'value': '0.183'}]
  === status code ===  200     country=DEU&year=1990&indicator=gii_rank
[]
  === status code ===  200     country=DEU&year=1990&indicator=lfpr_f
[{'country': 'DEU - Germany', 'index': 'GII - Gender Inequality Index', 'indicator': 'lfpr_f - Labour force participation rate, female (% ages 15 and older)', 'year': '1990', 'value': '45.352'}]
  === status code ===  200     country=DEU&year=1990&indicator=lfpr_m
[{'country': 'DEU - Germany', 'index': 'GII - Gender Inequality Index', 'indicator': 'lfpr_m - Labour force participation rate, male (% ages 15 and older)', 'year': '1990', 'value': '71.957'}]
  === status code ===  200     country=DEU&year=1990&indicator=pr_f
[{'country': 'DEU - Germany', 'index': 'GII - Gender Inequality Index', 'indicator': 'pr_f - Share of