Use the API to get all available data for the GDP per capita, PPP (constant 2017 international $) indicator. Hint: this indicator has code "NY.GDP.PCAP.PP.KD". Adjust the query parameters so that you can retrieve all available rows. Convert the results to a DataFrame.

In [44]:
import requests

In [45]:
import pandas as pd

In [46]:
url = "http://api.worldbank.org/v2/country/all/indicator/NY.GDP.PCAP.PP.KD"

In [47]:
base_params = {
    "format": "json",
    "per_page": 500,
    "page": 1
}

Makes the first API call to get the total number of pages from the metadata. base_params: This is a dictionary that contains the parameters that the API needs in order to fulfill the request.

In [7]:
response = requests.get(url, params=base_params)
data = response.json()
total_pages = data[0]['pages']
print("Total number of pages:", total_pages)

Total number of pages: 35


data[0]: The first item in the list. 'pages': uses the key 'pages' to retrieve the corresponding value, which is the total number of pages. 

In [9]:
all_data = []

In [10]:
for page in range(1, total_pages + 1):
    params = {
        "format": "json",
        "per_page": 500,
        "page": page
    }
    response = requests.get(url, params=params)
    data = response.json()
    all_data.extend(data[1]) 

The range() method -starts a loop from page 1 to page 35. The +1 is there because range() in Python is exclusive on the upper end. The params paginates the information per page.The extend() method takes that list and adds each individual record from data[1] to the all_data list.  The extend() method is used to avoid having nested lists and to keep all_data as a flat list of individual records.

In [12]:
df = pd.DataFrame(all_data)
print("Total records retrieved:", len(df))

Total records retrieved: 17024


 Total records retrieved is a string used in the print() function to describe what is showing.

In [14]:
df.head()

Unnamed: 0,indicator,country,countryiso3code,date,value,unit,obs_status,decimal
0,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2023,3967.860937,,,0
1,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2022,3974.803045,,,0
2,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2021,3934.287267,,,0
3,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2020,3861.111238,,,0
4,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2019,4073.880522,,,0


2. API is used to get all available data for Life expectancy at birth, total (years). 

In [16]:
url = "http://api.worldbank.org/v2/country/all/indicator/SP.DYN.LE00.IN"

In [17]:
data_params = {
    "format": "json",
    "per_page": 500,
    "page": 1
}

In [18]:
data = requests.get(url, params=data_params)
#This sends the request to the URL with the parameters you've set. The response will be saved in the data variable, which is a Response object.
life_expectancy = data.json()
#This parses the JSON response into a Python object (a list, in this case). 
all_pages = life_expectancy[0]['pages']
#This accesses the first item in the response (life_expectancy[0]), which contains metadata like the total number of pages.
print("Number of pages:", all_pages)

Number of pages: 35


In [19]:
le_data = []

In [20]:
for page in range(1, all_pages + 1):
    data_params["page"] = page 
    data = requests.get(url, params=data_params) 
    life_expectancy = data.json()  
    le_data.extend(life_expectancy[1]) 

In [21]:
el_df = pd.DataFrame(le_data)

In [22]:
el_df.head()

Unnamed: 0,indicator,country,countryiso3code,date,value,unit,obs_status,decimal
0,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2023,,,,0
1,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2022,62.888463,,,0
2,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2021,62.449093,,,0
3,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2020,63.309794,,,0
4,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2019,63.754752,,,0


In [23]:
print("life_expectancy", len(el_df))

life_expectancy 17024


3. Merge the two results DataFrames together. You may want to rename or drop columns prior to merging.

In [25]:
df = df.rename(columns={'indicator': 'gdp_indicator'})
print("Updated GDP DataFrame columns:")
print(df.columns)

Updated GDP DataFrame columns:
Index(['gdp_indicator', 'country', 'countryiso3code', 'date', 'value', 'unit',
       'obs_status', 'decimal'],
      dtype='object')


The rename() method is renaming the column. indicator: gdp_indicator is a dictionary that specifies the exact column to rename and what name to give it in the argument. df = is assigning the result of the renaming back to df. I need to assign it back to the original df to store the changes after it is renamed.

In [27]:
el_df = el_df.rename(columns={'indicator': 'life_expectancy_indicator'})
print("Updated life_expectancy DataFrame columns:")
print(el_df.columns)

Updated life_expectancy DataFrame columns:
Index(['life_expectancy_indicator', 'country', 'countryiso3code', 'date',
       'value', 'unit', 'obs_status', 'decimal'],
      dtype='object')


In [71]:
df = df.rename(columns={'value': 'gdp_value'})

In [73]:
el_df = el_df.rename(columns={'value': 'life_expectancy_value'})

In [75]:
df = df.drop(columns=['unit', 'obs_status'], errors='ignore')

In [81]:
el_df = el_df.drop(columns=['unit', 'obs_status'], errors='ignore')

In [83]:
el_df.head()

Unnamed: 0,life_expectancy_indicator,country,countryiso3code,date,life_expectancy_value,decimal
0,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2023,,0
1,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2022,62.888463,0
2,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2021,62.449093,0
3,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2020,63.309794,0
4,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2019,63.754752,0


In [334]:
df['country'] = df['country'].astype(str)
df['countryiso3code'] = df['countryiso3code'].astype(str)
df['date'] = df['date'].astype(str)

In [336]:
el_df['country'] = el_df['country'].astype(str)
el_df['countryiso3code'] = el_df['countryiso3code'].astype(str)
el_df['date'] = el_df['date'].astype(str)

I needed to make sure every value in the date column is a string, so it plays nice when merging with other DataFrames that also have string dates.

In [338]:
merged_df = pd.merge(df, el_df, on=['country', 'countryiso3code', 'date'], how='inner')

In [109]:
merged_df.head()

Unnamed: 0,gdp_indicator,country,countryiso3code,date,gdp_value,decimal_x,life_expectancy_indicator,life_expectancy_value,decimal_y
0,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2023,3967.860937,0,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...",,0
1,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2022,3974.803045,0,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...",62.888463,0
2,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2021,3934.287267,0,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...",62.449093,0
3,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2020,3861.111238,0,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...",63.309794,0
4,"{'id': 'NY.GDP.PCAP.PP.KD', 'value': 'GDP per ...","{'id': 'ZH', 'value': 'Africa Eastern and Sout...",AFE,2019,4073.880522,0,"{'id': 'SP.DYN.LE00.IN', 'value': 'Life expect...",63.754752,0


3.You can also get more information about the available countries (region, capital city, income level classification, etc.) by using the Country API. Use this API to pull in all available data. Merge this with your other datasets. Use this to now remove the rows that correspond to regions and not countries.

In [275]:
url = "https://api.worldbank.org/v2/country"

In [283]:
base_params = {
    "format": "json",
    "per_page": 500,  
    "page": 1  
}

In [285]:
response = requests.get(url, params=base_params)
data = response.json()

In [287]:
country_data_df = pd.DataFrame(data[1])

In [301]:
country_data_df = country_data_df.drop(columns=[col for col in ['region', 'adminregion'] if col in country_data_df.columns])


In [303]:
print("Total records after dropping regions:", len(country_data_df))

Total records after dropping regions: 217


In [307]:
country_data_df.head()

Unnamed: 0,id,iso2Code,name,incomeLevel,lendingType,capitalCity,longitude,latitude
0,ABW,AW,Aruba,"{'id': 'HIC', 'iso2code': 'XD', 'value': 'High...","{'id': 'LNX', 'iso2code': 'XX', 'value': 'Not ...",Oranjestad,-70.0167,12.5167
2,AFG,AF,Afghanistan,"{'id': 'LIC', 'iso2code': 'XM', 'value': 'Low ...","{'id': 'IDX', 'iso2code': 'XI', 'value': 'IDA'}",Kabul,69.1761,34.5228
5,AGO,AO,Angola,"{'id': 'LMC', 'iso2code': 'XN', 'value': 'Lowe...","{'id': 'IBD', 'iso2code': 'XF', 'value': 'IBRD'}",Luanda,13.242,-8.81155
6,ALB,AL,Albania,"{'id': 'UMC', 'iso2code': 'XT', 'value': 'Uppe...","{'id': 'IBD', 'iso2code': 'XF', 'value': 'IBRD'}",Tirane,19.8172,41.3317
7,AND,AD,Andorra,"{'id': 'HIC', 'iso2code': 'XD', 'value': 'High...","{'id': 'LNX', 'iso2code': 'XX', 'value': 'Not ...",Andorra la Vella,1.5218,42.5075


I tried really hard to find a way to merge the country_data_df, but was unsuccessful today. I may need to take a break from this today so I can rethink this tomorrow. My brain is tired.