UNData API Exercise
In this exercise, you'll redo the data gathering phase of the UNData Exploration project by using APIs instead of downloading csv files.

You'll make use of the World Bank Indicators API. Note that this API does not require an API key. Before attempting the exercise, it would be a good idea to skim through the Documentation page and to check out the Basic Call Structure article.

In [226]:
# Import libraries
import pandas as pd
import requests

1. Use the API to get all available data for the GDP per capita, PPP (constant 2017 international $) indicator. Hint: this indicator has code "NY.GDP.PCAP.PP.KD". Adjust the query parameters so that you can retrieve all available rows. Convert the results to a DataFrame.

In [227]:
# Set API endpoint
indicator = 'NY.GDP.PCAP.PP.KD'
# Use &per_page=1000 to bring back 1000 rows per page instead of the default 50 rows to reduce API requests from 346 to 18
endpoint = f'https://api.worldbank.org/v2/country/all/indicator/{indicator}?format=json&per_page=1000'

# Request to API using GET
response = requests.get(endpoint)

# Check response
print(response)

<Response [200]>


In [228]:
# Save json response in a variable
gdpdata = response.json()

# Look at what gdpdata variable contains
print('API response (gdpdata)')
print(type(gdpdata))         # <class 'list'>
print(len(gdpdata))          # 2 (metadata + actual data)
print(gdpdata[0])            # metadata: page info, total rows
print(gdpdata[1][:2])        # first 2 rows of actual data

API response (gdpdata)
<class 'list'>
2
{'page': 1, 'pages': 18, 'per_page': 1000, 'total': 17290, 'sourceid': '2', 'lastupdated': '2025-10-07'}
[{'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2021 international $)'}, 'country': {'id': 'ZH', 'value': 'Africa Eastern and Southern'}, 'countryiso3code': 'AFE', 'date': '2024', 'value': 3968.96375122681, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per capita, PPP (constant 2021 international $)'}, 'country': {'id': 'ZH', 'value': 'Africa Eastern and Southern'}, 'countryiso3code': 'AFE', 'date': '2023', 'value': 3948.14272105098, 'unit': '', 'obs_status': '', 'decimal': 0}]


In [229]:
# Get total number of pages
total_pages = gdpdata[0]['pages']
print(f'Total pages to fetch: {total_pages}')

Total pages to fetch: 18


In [230]:
# Create list to store all results
all_gdp_data = []

# Loop through all pages and collect data
for page in range(1, total_pages + 1):
    page_response = requests.get(f'{endpoint}&page={page}')
    page_data = page_response.json()
    all_gdp_data += page_data[1]

# Progress print: every 3 pages and on the last page
    if page % 3 == 0 or page == total_pages:
        print(f'Fetching page {page} of {total_pages}...')

# Final confirmation
print('All pages fetched successfully!')
print(f'Total records = 17290')
print(f'Records collected = {len(all_gdp_data)}')

Fetching page 3 of 18...
Fetching page 6 of 18...
Fetching page 9 of 18...
Fetching page 12 of 18...
Fetching page 15 of 18...
Fetching page 18 of 18...
All pages fetched successfully!
Total records = 17290
Records collected = 17290


In [231]:
# Create an empty dataframe and include all data
gdp_df = pd.DataFrame(all_gdp_data)
print("Empty DataFrame created and all data added")

print('Shape:', gdp_df.shape)
print(gdp_df.info())
gdp_df.head()

Empty DataFrame created and all data added
Shape: (17290, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17290 entries, 0 to 17289
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   indicator        17290 non-null  object 
 1   country          17290 non-null  object 
 2   countryiso3code  17290 non-null  object 
 3   date             17290 non-null  object 
 4   value            8461 non-null   float64
 5   unit             17290 non-null  object 
 6   obs_status       17290 non-null  object 
 7   decimal          17290 non-null  int64  
dtypes: float64(1), int64(1), object(6)
memory usage: 1.1+ MB
None


Unnamed: 0,indicator,country,countryiso3code,date,value,unit,obs_status,decimal
0,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2024,3968.963751,,,0
1,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2023,3948.142721,,,0
2,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2022,3974.244214,,,0
3,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2021,3933.580905,,,0
4,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2020,3861.068816,,,0


In [232]:
# Convert the list of dictionaries to a normalized dataframe
gdp_df = pd.json_normalize(all_gdp_data)
# Check dataframe
print('DataFrame head after normalization')
gdp_df

DataFrame head after normalization


Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2024,3968.963751,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
1,AFE,2023,3948.142721,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
2,AFE,2022,3974.244214,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
3,AFE,2021,3933.580905,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
4,AFE,2020,3861.068816,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZH,Africa Eastern and Southern
...,...,...,...,...,...,...,...,...,...,...
17285,ZWE,1964,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZW,Zimbabwe
17286,ZWE,1963,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZW,Zimbabwe
17287,ZWE,1962,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZW,Zimbabwe
17288,ZWE,1961,,,,0,NY.GDP.PCAP.PP.KD,"GDP per capita, PPP (constant 2021 internation...",ZW,Zimbabwe


2. Now, use the API to get all available data for Life expectancy at birth, total (years). This indicator has code "SP.DYN.LE00.IN". Again, convert the results to a DataFrame.

In [233]:
# Set API endpoint for life expectancy at birth, total years
indicator = 'SP.DYN.LE00.IN'
endpoint = f'https://api.worldbank.org/v2/country/all/indicator/{indicator}?format=json&per_page=1000' # Use per_page=1000 to bring back 1000 rows per page instead of the default of 50 rows to reduce API requests from 346 to 18

# Request to API using GET
response = requests.get(endpoint)

# Check response
print(response)

<Response [200]>


In [234]:
# Save json response in variable
lifedata = response.json()

# Look at what lifedata variable contains
print('API response (lifedata)')
print(type(lifedata))         # <class 'list'>
print(len(lifedata))          # 2 (metadata + actual data)
print(lifedata[0])            # metadata: page info, total rows
print(lifedata[1][:2])        # first 2 rows of actual data

API response (lifedata)
<class 'list'>
2
{'page': 1, 'pages': 18, 'per_page': 1000, 'total': 17290, 'sourceid': '2', 'lastupdated': '2025-10-07'}
[{'indicator': {'id': 'SP.DYN.LE00.IN', 'value': 'Life expectancy at birth, total (years)'}, 'country': {'id': 'ZH', 'value': 'Africa Eastern and Southern'}, 'countryiso3code': 'AFE', 'date': '2024', 'value': None, 'unit': '', 'obs_status': '', 'decimal': 0}, {'indicator': {'id': 'SP.DYN.LE00.IN', 'value': 'Life expectancy at birth, total (years)'}, 'country': {'id': 'ZH', 'value': 'Africa Eastern and Southern'}, 'countryiso3code': 'AFE', 'date': '2023', 'value': 65.146290855385, 'unit': '', 'obs_status': '', 'decimal': 0}]


In [235]:
# Get total number of pages
total_pages = lifedata[0]['pages']
print(f'Total pages to fetch: {total_pages}')

Total pages to fetch: 18


In [236]:
# Create list to store all results
all_life_data = []

# Loop through all pages and collect data
for page in range(1, total_pages + 1):
    page_response = requests.get(f'{endpoint}&page={page}')
    page_data = page_response.json()
    all_life_data += page_data[1]

    # Progress print: every 3 pages and on the last page
    if page % 3 == 0 or page == total_pages:
        print(f'Fetching page {page} of {total_pages}...')

# Final confirmation
print('All pages fetched successfully!')
print(f'Total records = 17290')
print(f'Records collected = {len(all_life_data)}')

Fetching page 3 of 18...
Fetching page 6 of 18...
Fetching page 9 of 18...
Fetching page 12 of 18...
Fetching page 15 of 18...
Fetching page 18 of 18...
All pages fetched successfully!
Total records = 17290
Records collected = 17290


In [237]:
# Create an empty DataFrame and include all data
life_df = pd.DataFrame(all_life_data)
print('Empty DataFrame created and all data added')

print('Shape:', life_df.shape)
print(life_df.info())
life_df.head()

Empty DataFrame created and all data added
Shape: (17290, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17290 entries, 0 to 17289
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   indicator        17290 non-null  object 
 1   country          17290 non-null  object 
 2   countryiso3code  17290 non-null  object 
 3   date             17290 non-null  object 
 4   value            16926 non-null  float64
 5   unit             17290 non-null  object 
 6   obs_status       17290 non-null  object 
 7   decimal          17290 non-null  int64  
dtypes: float64(1), int64(1), object(6)
memory usage: 1.1+ MB
None


Unnamed: 0,indicator,country,countryiso3code,date,value,unit,obs_status,decimal
0,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2024,,,,0
1,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2023,65.146291,,,0
2,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2022,64.48702,,,0
3,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2021,62.979999,,,0
4,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2020,63.766484,,,0


In [238]:
# Convert the list of dictionaries to a normalized dataframe
life_df = pd.json_normalize(all_life_data)

# Check DataFrame
print('DataFrame head after normalization')
life_df

DataFrame head after normalization


Unnamed: 0,countryiso3code,date,value,unit,obs_status,decimal,indicator.id,indicator.value,country.id,country.value
0,AFE,2024,,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
1,AFE,2023,65.146291,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
2,AFE,2022,64.487020,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
3,AFE,2021,62.979999,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
4,AFE,2020,63.766484,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZH,Africa Eastern and Southern
...,...,...,...,...,...,...,...,...,...,...
17285,ZWE,1964,55.431000,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZW,Zimbabwe
17286,ZWE,1963,54.942000,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZW,Zimbabwe
17287,ZWE,1962,54.453000,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZW,Zimbabwe
17288,ZWE,1961,53.966000,,,0,SP.DYN.LE00.IN,"Life expectancy at birth, total (years)",ZW,Zimbabwe


3. Merge the two results DataFrames together. You may want to rename or drop columns prior to merging.

In [254]:
# Check columns in GDP DataFrame
print('GDP DataFrame columns:')
print(gdp_df.columns.tolist())

# Check columns in Life Expectancy DataFrame
print('\nLife Expectancy DataFrame columns:')
print(life_df.columns.tolist())

# Preview first few rows of each
print('\nGDP DataFrame head:')
print(gdp_df.head())

print('\nLife Expectancy DataFrame head:')
print(life_df.head())

GDP DataFrame columns:
['countryiso3code', 'date', 'value', 'unit', 'obs_status', 'decimal', 'indicator.id', 'indicator.value', 'country.id', 'country.value']

Life Expectancy DataFrame columns:
['countryiso3code', 'date', 'value', 'unit', 'obs_status', 'decimal', 'indicator.id', 'indicator.value', 'country.id', 'country.value']

GDP DataFrame head:
  countryiso3code  date        value unit obs_status  decimal  \
0             AFE  2024  3968.963751                        0   
1             AFE  2023  3948.142721                        0   
2             AFE  2022  3974.244214                        0   
3             AFE  2021  3933.580905                        0   
4             AFE  2020  3861.068816                        0   

        indicator.id                                    indicator.value  \
0  NY.GDP.PCAP.PP.KD  GDP per capita, PPP (constant 2021 internation...   
1  NY.GDP.PCAP.PP.KD  GDP per capita, PPP (constant 2021 internation...   
2  NY.GDP.PCAP.PP.KD  GDP per ca

In [255]:
# Rename 'value' to distinguish the indicators
gdp_df_col = gdp_df.rename(columns={'value': 'GDP_per_capita'})
life_df_col = life_df.rename(columns={'value': 'Life_expectancy'})

In [256]:
# Keep only necessary columns
gdp_df_final = gdp_df_col[['country.value', 'countryiso3code', 'date', 'GDP_per_capita']]
life_df_final = life_df_col[['countryiso3code', 'date', 'Life_expectancy']]

In [257]:
# Check gdp_df
print('GDP DataFrame:')
gdp_df_final

GDP DataFrame:


Unnamed: 0,country.value,countryiso3code,date,GDP_per_capita
0,Africa Eastern and Southern,AFE,2024,3968.963751
1,Africa Eastern and Southern,AFE,2023,3948.142721
2,Africa Eastern and Southern,AFE,2022,3974.244214
3,Africa Eastern and Southern,AFE,2021,3933.580905
4,Africa Eastern and Southern,AFE,2020,3861.068816
...,...,...,...,...
17285,Zimbabwe,ZWE,1964,
17286,Zimbabwe,ZWE,1963,
17287,Zimbabwe,ZWE,1962,
17288,Zimbabwe,ZWE,1961,


In [258]:
# Check life_df
print('Life Expectancy DataFrame:')
life_df_final

Life Expectancy DataFrame:


Unnamed: 0,countryiso3code,date,Life_expectancy
0,AFE,2024,
1,AFE,2023,65.146291
2,AFE,2022,64.487020
3,AFE,2021,62.979999
4,AFE,2020,63.766484
...,...,...,...
17285,ZWE,1964,55.431000
17286,ZWE,1963,54.942000
17287,ZWE,1962,54.453000
17288,ZWE,1961,53.966000


In [259]:
# Check how many matching rows exist before merging
common_rows = pd.merge(
    gdp_df[['countryiso3code', 'date']],  # GDP key columns
    life_df[['countryiso3code', 'date']],  # Life Expectancy key columns
    on=['countryiso3code', 'date'],
    how='inner'
)

print(f"Number of rows that will be in the inner merge: {len(common_rows)}")

Number of rows that will be in the inner merge: 18590


In [260]:
# Merge the 2 dataframes
merged_df = pd.merge(gdp_df_final, life_df_final, on=['countryiso3code', 'date'], how='inner')

#Check the result
print(f"Total rows in merged DataFrame: {len(merged_df)}")
merged_df

Total rows in merged DataFrame: 18590


Unnamed: 0,country.value,countryiso3code,date,GDP_per_capita,Life_expectancy
0,Africa Eastern and Southern,AFE,2024,3968.963751,
1,Africa Eastern and Southern,AFE,2023,3948.142721,65.146291
2,Africa Eastern and Southern,AFE,2022,3974.244214,64.487020
3,Africa Eastern and Southern,AFE,2021,3933.580905,62.979999
4,Africa Eastern and Southern,AFE,2020,3861.068816,63.766484
...,...,...,...,...,...
18585,Zimbabwe,ZWE,1964,,55.431000
18586,Zimbabwe,ZWE,1963,,54.942000
18587,Zimbabwe,ZWE,1962,,54.453000
18588,Zimbabwe,ZWE,1961,,53.966000


4. You can also get more information about the available countries (region, capital city, income level classification, etc.) by using the Country API. Use this API to pull in all available data. Merge this with your other datasets. Use this to now remove the rows that correspond to regions and not countries.