# APIs

Instead of downloading World Bank data via a csv file, you're going to download the data using the World Bank APIs. The purpose of this exercise is to gain experience with another way of extracting data.

API is an acronym that stands for application programming interface. API’s provide a standardized way for two applications to talk to each other. In this case, the applications communicating with each other are the server application where World Bank stores data and your Jupyter notebook.

If you wanted to pull data directly from the World Bank’s server, you’d have to know what database system the World Bank was using. You’d also need permission to log in directly to the server, which would be a security risk for the World Bank. And if the World Bank ever migrated its data to a new system, you would have to rewrite all of your code again. The API allows you to execute code on the World Bank server without getting direct access.

# Before there were APIs

Before there were APIs, there was web scraping. People would download html directly from a website and then parse the results programatically. This practice is in a legal grey area. One reason that APIs became popular was so that companies could provide data to users and discourage web scraping.

Here are a few articles about the legality of web scraping.

* [QVC Can't Stop Web Scraping](https://www.forbes.com/sites/ericgoldman/2015/03/24/qvc-cant-stop-web-scraping/#120db59b3ca3)
* [Quora - Legality of Web Scraping](https://www.quora.com/What-is-the-legality-of-web-scraping)

All sorts of companies have public facing APIs including Facebook, Twitter, Google and Pinterest. You can pull data from these companies to create your own applications.

In this notebook, you’ll get practice using Python to pull data from the World Bank indicators API.

Here are links to information about the World Bank indicators and projects APIs if you want to learn more:
* [World Bank Indicators API](world bank projects api)
* [World Bank Projects API](http://search.worldbank.org/api/v2/projects)

# Using APIs

In general, you access APIs via the web using a web address. Within the web address, you specify the data that you want. To know how to format the web address, you need to read an API's documentation. Some APIs also require that you send login credentials as part of your request. The World Bank APIs are public and do not require login credentials.

The Python requests library makes working with APIs relatively simple.

# Example Indicators API

Run the code example below to request data from the World Bank Indicators API. According to the documntation, you format your request url like so:

`http://api.worldbank.org/v2/countries/` + list of country abbreviations separated by ; + `/indicators/` + indicator name + `?` + options

where options can include
* per_page - number of records to return per page
* page - which page to return - eg if there are 5000 records and 100 records per page
* date - filter by dates
* format - json or xml
 
 and a few other options that you can read about [here](https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-basic-call-structure).

In [1]:
import requests
import pandas as pd

url = 'http://api.worldbank.org/v2/countries/br;cn;us;de/indicators/SP.POP.TOTL/?format=json&per_page=1000'
r = requests.get(url)
r.json()

[{'page': 1,
  'pages': 1,
  'per_page': 1000,
  'lastupdated': '2018-11-14',
  'total': 232},
 [{'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
   'country': {'id': 'BR', 'value': 'Brazil'},
   'countryiso3code': 'BRA',
   'date': '2017',
   'value': 209288278,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
   'country': {'id': 'BR', 'value': 'Brazil'},
   'countryiso3code': 'BRA',
   'date': '2016',
   'value': 207652865,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
   'country': {'id': 'BR', 'value': 'Brazil'},
   'countryiso3code': 'BRA',
   'date': '2015',
   'value': 205962108,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.POP.TOTL', 'value': 'Population, total'},
   'country': {'id': 'BR', 'value': 'Brazil'},
   'countryiso3code': 'BRA',
   'date': '2014',
   'value': 204213

This json data isn't quite ready for a pandas data frame. Notice that the json response is a list with two entries. The first entry is 
```
{'lastupdated': '2018-06-28',
  'page': 1,
  'pages': 1,
  'per_page': 1000,
  'total': 232}
```

That first entry is meta data about the results. For example, it says that there is one page returned with 232 results. 

The second entry is another list containing the data. This data would need some cleaning to be used in a pandas data frame. That would happen later in the transformation step of an ETL pipeline. Run the cell below to read the results into a dataframe and see what happens.

In [2]:
###
# Run this cell that converts the json into a dataframe
# Note that you do not need the pd.read_json() method because this is not a file or a string containing json 
##

pd.DataFrame(r.json()[1])

Unnamed: 0,country,countryiso3code,date,decimal,indicator,obs_status,unit,value
0,"{'id': 'BR', 'value': 'Brazil'}",BRA,2017,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,209288278
1,"{'id': 'BR', 'value': 'Brazil'}",BRA,2016,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,207652865
2,"{'id': 'BR', 'value': 'Brazil'}",BRA,2015,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,205962108
3,"{'id': 'BR', 'value': 'Brazil'}",BRA,2014,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,204213133
4,"{'id': 'BR', 'value': 'Brazil'}",BRA,2013,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,202408632
5,"{'id': 'BR', 'value': 'Brazil'}",BRA,2012,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,200560983
6,"{'id': 'BR', 'value': 'Brazil'}",BRA,2011,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,198686688
7,"{'id': 'BR', 'value': 'Brazil'}",BRA,2010,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,196796269
8,"{'id': 'BR', 'value': 'Brazil'}",BRA,2009,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,194895996
9,"{'id': 'BR', 'value': 'Brazil'}",BRA,2008,0,"{'id': 'SP.POP.TOTL', 'value': 'Population, to...",,,192979029


There are some issues with this dataframe. The country and indicator variables don't look particularly useful in their current form. Again, dealing with those issues would come in the transformation phase of a pipeline, which comes later in the lesson.

# Exercise Indicators API

Use the Indicators API to request rural population data for Switzerland in the years 1995 through 2001. Here are a few helpful resources:
* [documentation included how to filter by year](https://datahelpdesk.worldbank.org/knowledgebase/articles/898581-api-basic-call-structure)
* [2-character iso country codes](https://www.nationsonline.org/oneworld/country_code_list.htm)
* [search box for World Bank indicators](https://data.worldbank.org)

To find the indicator code, first search for the indicator here: https://data.worldbank.org
Click on the indicator name. The indicator code is in the url. For example, the indicator code for total population is SP.POP.TOTL, which you can see in the link [https://data.worldbank.org/indicator/SP.RUR.TOTL](https://data.worldbank.org/indicator/SP.RUR.TOTL).

In [3]:
# get the url ready
url = 'http://api.worldbank.org/v2/countries/ch/indicators/SP.RUR.TOTL/?format=json&date=1995:2001'

# send the request
r = requests.get(url)

# output the json using the json method like in the previous example
r.json()

[{'page': 1,
  'pages': 1,
  'per_page': 50,
  'lastupdated': '2018-11-14',
  'total': 7},
 [{'indicator': {'id': 'SP.RUR.TOTL', 'value': 'Rural population'},
   'country': {'id': 'CH', 'value': 'Switzerland'},
   'countryiso3code': 'CHE',
   'date': '2001',
   'value': 1924949,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.RUR.TOTL', 'value': 'Rural population'},
   'country': {'id': 'CH', 'value': 'Switzerland'},
   'countryiso3code': 'CHE',
   'date': '2000',
   'value': 1912232,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.RUR.TOTL', 'value': 'Rural population'},
   'country': {'id': 'CH', 'value': 'Switzerland'},
   'countryiso3code': 'CHE',
   'date': '1999',
   'value': 1897587,
   'unit': '',
   'obs_status': '',
   'decimal': 0},
  {'indicator': {'id': 'SP.RUR.TOTL', 'value': 'Rural population'},
   'country': {'id': 'CH', 'value': 'Switzerland'},
   'countryiso3code': 'CHE',
   'date': '1998',
   'value': 