<img src="https://www.uc3m.es/ss/Satellite?blobcol=urldata&blobkey=id&blobtable=MungoBlobs&blobwhere=1371573952659">


---


# **WEB ANALYTICS COURSE 4 - SEMESTER 2**
# **BACHELOR IN DATA SCIENCE AND ENGINEERING**

# **LAB 1 APIs - WORLD BANK**


## Group Members
* Ángela María Durán Pinto: 100472766
* Alejandro Leonardo García Navarro: 100472710
* Melania Guerra Ulloa: 100457522
* Francisco Javier Molina Tirado: 100456560

# 0. LAB PREPARATION

Students have to complete the following tasks before attedning the lab:

1. **Read and study the API documentation to have some initial notions of the functionality of the World Bank API. Following, we share several links to the documentation related to the World Bank API:**
- https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-basic-call-structures
- https://datahelpdesk.worldbank.org/knowledgebase/topics/125589-developer-information
- https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-about-the-indicators-api-documentation

2. **The key element of the World Bank API are the "indicators". Next, we share a link that may simplify the search of indicators through a search tool. Once you have selected an indicator you can find its codification within the url bar of the browser.**

- https://data.worldbank.org/indicator?tab=featured

# **1. INTRODUCTION**

* The goal of this lab is to gain experience testing a widely-used API such as the World Bank API that includes bunch of information about countries indicators in economy, health, education, agriculture, etc.

* The lab includes 5 milestones that will drive the student through the use of several indicators.  

* The lab will be done in groups of 23 students.

* The lab will use two complete consecutive sessions (4 hours). The students are expected to complete the 5 milestones proposed in the lab within these 2 sessions

* **The final mark will be computed as a function of the number of milestones successfully completed.**

* **Each group should also upload their lab notebook in the corresponding task in Aula Global.**

* Upon completing all the milestones, students should call the professor, who will check the correctness of the solution. Partial milestones checks may be allowed in some cases.

# 2. **MILESTONES**

In this section we describe one by one the milestones and leave a space to the students to implement the code to complete the requested task.

**NOTE: Unless otherwise stated, all the milestones have to deliver information about countries. Therefore, you should not consider regions or any other aggreated information in your analysis.**

# **2.1. MILESTONE 1: POPULATION**:
Retrieve the 2022 countries' population and show the Top 10 countries and the Bottom 10 countries within the World Bank database.



In [None]:
!pip install wbgapi

In [None]:
import wbgapi as wb
indicators = wb.series.info()
print(indicators)

In [None]:
import requests
import pandas as pd

# Get the list of all countries (we guarantee that we only take countries
# by checking that they have a capital)
country_url = "https://api.worldbank.org/v2/country?format=json"
response = requests.get(country_url)
countries_data = response.json()[1]
country_codes = {country['id']: country['name'] for country in countries_data if country['capitalCity']}

# Retrieve population data for each country for 2022
population_data = []
for code in country_codes.keys():
    population_url = f"https://api.worldbank.org/v2/country/{code}/indicator/SP.POP.TOTL?date=2022&format=json"
    response = requests.get(population_url)
    data = response.json()[1]
    population_value = data[0]['value']
    population_data.append({
        'country_name': country_codes[code],
        'population': population_value
    })

# Convert the population data to a DataFrame and sort it
df_population = pd.DataFrame(population_data).sort_values(by='population', ascending=False)

In [None]:
# Get the top 10 and bottom 10 countries by population
top_10_countries = df_population.head(10)
bottom_10_countries = df_population.tail(10)

# Print the results
print("Top 10 Countries by Population (2022):")
print(top_10_countries[['country_name', 'population']])

print("\nBottom 10 Countries by Population (2022):")
print(bottom_10_countries[['country_name', 'population']])

# **2.2. MILESTONE 2: WOMEN Vs. MEN POPULATION**:
Obtain the % of men and women for each country and compute the difference among them using the formula %women - %men. Display:

1- The number of countries with more women than men.

2- The number of countries with more men than women

3- The 10 countries with more women compared to men (ten countries with the largest positive value of the previous metric)

- The 10 countries with more men compared to women (ten countries with the largest negative value of the previous metric).

**Note**: You can use the indicator the absolute number of men and women from the World Bank API and compute the % for each country and the difference, or you can use the indicator given directly the %.



In [None]:
# Create a dataframe that contains the number and percentage of each gender for each country
gender_data = []
for code in country_codes.keys():
    # Retrieve Female % of total population
    female_percentage_url = f"https://api.worldbank.org/v2/country/{code}/indicator/SP.POP.TOTL.FE.ZS?format=json"
    female_percentage_response = requests.get(female_percentage_url)
    female_percentage_data = female_percentage_response.json()[1]

    # Retrieve Male % of total population
    male_percentage_url = f"https://api.worldbank.org/v2/country/{code}/indicator/SP.POP.TOTL.MA.ZS?format=json"
    male_percentage_response = requests.get(male_percentage_url)
    male_percentage_data = male_percentage_response.json()[1]

    female_percentage_value = female_percentage_data[0]['value']
    male_percentage_value = male_percentage_data[0]['value']

    gender_data.append({
        'country_name': country_codes[code],
        'female_percentage': female_percentage_value,
        'male_percentage': male_percentage_value
  })

df_gender = pd.DataFrame(gender_data)

# Calculate the difference
df_gender['percentage_difference'] = df_gender['female_percentage'] - df_gender['male_percentage']
df_gender

In [None]:
## 1. The number of countries with more women than men
more_women_than_men_count = (df_gender['percentage_difference'] > 0).sum()

## 2. The number of countries with more men than women
more_men_than_women_count = (df_gender['percentage_difference'] < 0).sum()

## 3. The 10 countries with more women compared to men (ten countries with the largest positive value of the previous metric)
top_10_women_countries = df_gender.nlargest(10, 'percentage_difference')

## 4. The 10 countries with more men compared to women (ten countries with the largest negative value of the previous metric)
top_10_men_countries = df_gender.nsmallest(10, 'percentage_difference')

In [None]:
# Print the results
print(f"Number of countries with more women than men: {more_women_than_men_count}")
print(f"Number of countries with more men than women: {more_men_than_women_count}")

print("\nTop 10 Countries with More Women Compared to Men:")
print(top_10_women_countries[['country_name', 'female_percentage', 'male_percentage', 'percentage_difference']])

print("\nTop 10 Countries with More Men Compared to Women:")
print(top_10_men_countries[['country_name', 'female_percentage', 'male_percentage', 'percentage_difference']])

## **2.3. MILESTONE 3: GDP PER CAPITA ACCORDING FOR INCOME LEVEL GROUPS**:

Compute the average increase/decrease in percentage for the GDP per capita in US dollars in the following two periods: 2000-2022 and  2010-2022, GDPfor the following income groups: low-income economies, lower-middle-income economies, middle economies, upper-middle-income economies and high-income economies. The following, link provides information of the different country aggregations carried out by the World Bank.  

https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups

 You should compute the %GDP increase as follows. Given country A with a PIB Per Capita \$20000 in 2000 and \$30000 in 2022 the increase/decrease should be computed as follow:

%GDP increase = 100*(30000-20000)/20000=50%.


In [None]:
#SOLUTION MILESTONE 3

# **2.4. MILESTONE 4: TOP 5 COUNTRIES INCREASE GDP PER INCOME-GROUP**

For each of the income groups included in Milestone 3 and the period 2010-2022 list the Top 5 countries in terms of %GDPR per capita increase along with the value

**NOTE**: Do not consider the countries for which you do not have data either in 2010 or 2022 or both of them

In [None]:
#SOLUTION MILESTONE 4

# **2.5. MILESTONE 5: CO2 emission per capita**

Retrieve the most recent non empty value for the amount of CO2 emission per capita (metric tons per country) for all the countries. Display the 30 countries with the highest CO2 emission per capita along with their value and the year related to that value.

**NOTE**: You cannot search manually the year and use it in your query for this milestone.


In [None]:
#SOLUTION MILESTONE 5