# Exploring APIs and data structures with Jupyter notebooks

Recently a colleague shared a very useful technique for exploring Web APIs with me: [Jupyter notebooks](https://jupyter.org/).

Previously I used to use Bash scripts and [curl](https://curl.haxx.se/) for tasks like this. Other colleagues preferred GUI tools like [Postman](https://www.postman.com/).

Jupyter has some advantages:

* You can use powerful Python libraries
  * [Requests](https://requests.readthedocs.io/en/master/) library for HTTP requests
  * [Pandas](https://pandas.pydata.org/) library for data analysis
* You get documentation to share with your co-workers (and your future self)
  * GitHub will [render Jupyter notebooks](https://help.github.com/en/github/managing-files-in-a-repository/working-with-jupyter-notebook-files-on-github) as static HTML
  * You can include images
  
## Setting up

To get started, you need to install the Jupyter package first:

> pip install jupyterlab

Depending on when you read this, you might have to check if `pip` is the Python 3.x version of Python or still the [legacy Python 2.7](https://pythonclock.org/) version. On my machine I had to use `pip3`. If that's the case, the Python executable is most likely also named `python3`. 

Next you can start Jupyter:

> python -m jupyterlab

## Getting started with Requests

The first library I want to introduce is [Requests](https://requests.readthedocs.io/en/master/), Python standard HTTP library.

In [1]:
import requests

Let request something simple to try out requests:

In [20]:
response = requests.request('GET', 'http://httpbin.org/json')
response.status_code

200

In [21]:
response.headers

{'Date': 'Mon, 24 Feb 2020 05:23:02 GMT', 'Content-Type': 'application/json', 'Content-Length': '429', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}

In [28]:
print(response.json())

{'slideshow': {'author': 'Yours Truly', 'date': 'date of publication', 'slides': [{'title': 'Wake up to WonderWidgets!', 'type': 'all'}, {'items': ['Why <em>WonderWidgets</em> are great', 'Who <em>buys</em> WonderWidgets'], 'title': 'Overview', 'type': 'all'}], 'title': 'Sample Slide Show'}}


The output can be made a little prettier using the [`json.dumps` function](https://docs.python.org/3/library/json.html):

In [30]:
import json
print(json.dumps(response.json(), indent=2, sort_keys=True))

{
  "slideshow": {
    "author": "Yours Truly",
    "date": "date of publication",
    "slides": [
      {
        "title": "Wake up to WonderWidgets!",
        "type": "all"
      },
      {
        "items": [
          "Why <em>WonderWidgets</em> are great",
          "Who <em>buys</em> WonderWidgets"
        ],
        "title": "Overview",
        "type": "all"
      }
    ],
    "title": "Sample Slide Show"
  }
}


You can still get the curl version of your request by using the [curlify](https://github.com/ofw/curlify) package:

> pip install curlify

In [38]:
import curlify
print(curlify.to_curl(response.request))

curl -X GET -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.22.0' http://httpbin.org/json


## Using Pandas to explore JSON documents

In [40]:
import pandas
from pandas import json_normalize

In [41]:
def mapbox(url):
    params = {'access_token': os.environ['MAPBOX_TOKEN']}
    r = requests.request('GET', 'https://api.mapbox.com' + url, params=params)
    return r.json()

coord = '13.4034,52.542' # Berlin

res = mapbox('/geocoding/v5/mapbox.places/{}.json'.format(coord))
df = json_normalize(res['features'])
df.shape

(5, 16)

In [8]:
df.columns

Index(['id', 'type', 'place_type', 'relevance', 'text', 'place_name', 'center',
       'context', 'properties.landmark', 'properties.address',
       'properties.category', 'geometry.coordinates', 'geometry.type', 'bbox',
       'properties.short_code', 'properties.wikidata'],
      dtype='object')

In [68]:
df[['place_name', 'place_type', 'properties.wikidata']]

Unnamed: 0,place_name,place_type,properties.wikidata
0,"Basketballplatz Mauerpark, Bernauer Straße 51,...",[poi],
1,"Prenzlauer Berg, 10437, Berlin, Germany",[locality],
2,"10437, Berlin, Germany",[postcode],
3,"Berlin, Germany","[region, place]",Q64
4,Germany,[country],Q183


In [42]:
def wikidata(entry):
    res = requests.request('GET', 'https://www.wikidata.org/wiki/Special:EntityData/{}.json'.format(entry))
    return res.json()['entities'][entry]

df = json_normalize(wikidata('Q64'))
df.T

Unnamed: 0,0
pageid,190
ns,0
title,Q64
lastrevid,1121078619
modified,2020-02-22T17:04:18Z
...,...
sitelinks.zhwikivoyage.url,https://zh.wikivoyage.org/wiki/%E6%9F%8F%E6%9E%97
sitelinks.zuwiki.site,zuwiki
sitelinks.zuwiki.title,IBerlini
sitelinks.zuwiki.badges,[]
