# 3.13.31 APIs and JSON

### What is an API

An API, acronym for **Application Programming Interface**, is simply a software interface that **allows two applications to talk to each other**. Unlike a user interface, which connects a computer to a person, an application programming interface connects computers or pieces of software to each other. Although it is not meant to be used directly by the end user, data analysts and data scientists may use and test it directly before implementing it into their program/analysis. 

<img src="img/what-is-api.png" width="600">

### API Analogy

Imagine that you're the owner of a store and you're on holiday at the beach. Although you should be relaxing, you're concerned about your business, so you instruct your friend to call the store at the end of every day and ask how many customers came and purchased something and then write it down on a blackboard next to your beachbed. 

This would be a real-world analogy of how an API works: 

- the two computers or pieces of software are the blackboard on your end (usually a browser, web app or script) and the store on the other side (usually a web server/database)
- your friend is the intermediary who is doing the work of the API in connecting the two: 
    - he sends a request
    - waits for a result
    - then writes the result on the board
- notice that you are not phisically making the request, you have instructed your friend, just like you write instructions for your program 

### API example 

You could write a script that asks the end user for a city name and, using a Google Maps API, returns the latitude and longitude coordinates of that city. Let's see how this process would work: 

- The end user starts the program, the program asks the user for a city name, which he enters in the prompt box
- The program receives the city name in input and, via the Google Maps API, sends a request to Google's web server
- Google's web server receives the request and queries its database to find the relative result
- Meanwhile, the API is waiting for this result 
- once the web server returns the result, the API sends it back to the requesting program (the one you wrote) 
- the resulting latitude and longitude coordinates are then showed to the end user 

Although there are [different types of APIs](https://en.wikipedia.org/wiki/API#Usage), we will focus on Web APIs. These are programs that use an internet address (like a URL), to provide access to their services. 

### Your first API call

Let's look at a practical case. The city of Milan has a rich [Open Data portal](https://dati.comune.milano.it/) where you can access its data in several ways, namely by: 
- downloading a file
- retrieving the data via an API call

When we say "API call", we mean that we are **sending a request** via the API (just like we were sending our friend to phone the store in our previous analogy). 

For example, [at this page](https://dati.comune.milano.it/dataset/ds1573-gas-erogato-a2a-totale-giornaliero) we can find the **daily total gas delivered by A2A** (an italian energy provider) in cubic meters. As you can see, there are **two data formats** available, csv and json. 

If we click on the csv option, we are redirected to [this page](https://dati.comune.milano.it/dataset/ds1573-gas-erogato-a2a-totale-giornaliero/resource/dbd7417a-41e8-4097-a1c4-67094983bab1), where we can **see the data in a table** and we can either download the data on our computer (see the red button that says "Download") in a specified format (csv, tsv, json or xml) or we can **retrieve the data using their API** (see the green button that says "Data API"). 

If you **click on the green button** that says `Data API`, a pop-up will be opened and you'll be able to see the URL that can be used to perform the API call *(note: this is also known as the [API endpoint](https://kinsta.com/knowledgebase/api-endpoint/#understanding-api-endpoints))* as well as some examples to get you started. For instance, if you **copy the following URL and paste it in your browser**, you will see the result of the API call in your browser, showing you the first 5 records of the data you were trying to retrieve ([click here](https://dati.comune.milano.it/api/3/action/datastore_search?resource_id=dbd7417a-41e8-4097-a1c4-67094983bab1&limit=5) to try it yourself!):

    https://dati.comune.milano.it/api/3/action/datastore_search?resource_id=dbd7417a-41e8-4097-a1c4-67094983bab1&limit=5

Now, before we proceed any further, there are **a couple of observations** to be made: 
1. The result we're obtaining looks a little messy... what are all those curly brackets?
2. Didn't we say that APIs are not meant to be used directly by the end user? 

Let's start by addressing the first point. 

### JSON

The strange, messy format of the result from our API call is called **JSON**, which is an acronym that stands for **JavaScript Object Notation**. 

According to [Wikipedia](https://en.wikipedia.org/wiki/JSON), *it is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attributeâ€“value pairs and arrays*. 

Let's unpack the previous sentence: 
- it's a **file format** (just like a .csv file)
- it **stores data**
- the data is stored in **attribute-value** pairs

So, it's just another way of storing data, like a csv or a text file, but it stores it using a method that should be familiar to you: the attribute-value pair works just like the key-value pairs from Python's dictionaries that we have seen at the beginning of this Module. 

Now, the Wikipedia definition said also that the contents of the JSON is human readable and, looking at the API result, one could understandably argue against that statement. However, if we copy that result and paste it in a **JSON formatter** like [this one](https://jsonformatter.org/), you will see that **its readability improves** quite dramatically. Check out the example below to get a better idea of this particular data format: 

<img src="img/json.png" width="900">

Although to a human it is not as readable as a csv file, its **nested hierarchical structure** allows you to include all kind of metadata next to the data itself. It takes time and practice to get used to this data format, so don't get discouraged if you're finding it harder than expected. 

Check out [this article](https://realpython.com/python-json/) to find out more about JSON data structures in relation to Python. 

Let's now address the second observation from the previous section: APIs are meant to be used by a program, not to be typed in the address bar of a web browser. In the next section we'll look at how we can **make an API call in Python**. 

### API calls in Python

#### City of Milan API

Now we have all the necessary pieces to get started and make our first API call in Python. To do that, we'll need to load the `requests` [library](https://requests.readthedocs.io/en/latest/), whose goal is to make **HTTP requests in Python** simpler and more human-friendly (as stated also in the tagline of the library's logo). 

In [None]:
import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

import requests

In [None]:
# save the URL string (without the limit=5 parameter) to a "url" object
url = 'https://dati.comune.milano.it/api/3/action/datastore_search?resource_id=dbd7417a-41e8-4097-a1c4-67094983bab1'
#url = 'https://dati.comune.milano.it/api/3/action/datastore_search?resource_id=dbd7417a-41e8-4097-a1c4-67094983bab1&limit=1000'

The **GET** [HTTP method](https://www.w3schools.com/tags/ref_httpmethods.asp) allows you to request data from a specified resource or server. In the `requests` library this can be achieved via the `get()` function: 

In [None]:
# make the HTTP request
r = requests.get(url)

Every time you send a request, you will expect some data in return. Together with your data, **you will also receive a status code**, telling you whether the request was successful. Make sure to always check the status of your request via the `.status_code` attribute and to **verify the meaning** of that status. A status of 200 means that the request was successful, check [this page](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#1xx_informational_response) for a complete list of statuses. 

In [None]:
# check the status code of your HTTP request
r.status_code

If you're dealing with json data, you can **explore the contents of your result** using the `.json` attribute: 

In [None]:
r.json()

We can verify that this JSON object is in fact interpreted by Python as a dictionary: 

In [None]:
type(r.json())

And, since it is a dictionary, it can be accessed as one by specifying its keys via the [ ] operator: 

In [None]:
r.json()['result']['fields']

In the cell above we sliced the dictionary until we reached the `fields` key, which is itself a list of dictionaries. Therefore, we can cycle through each dictionary in this list and print the `id` keys, whose values contain the column names of the data that we retrieved using the API: *

In [None]:
for el in r.json()['result']['fields']: 
    print(el['id'])

Luckily, we don't need to loop through nested dictionaries to get the data in a useful format. If we slice the JSON object at the `records` key, we can see that it is composed of a list of dictionaries (as many as there are rows in the original dataset), each containing a set of key-value pairs (one for each column in the dataset). 

In [None]:
r.json()['result']['records'][0:3]

In [None]:
len(r.json()['result']['records'])

Thanks to `pandas`, we can build a DataFrame using its `from_dict()` function: 

In [None]:
df = pd.DataFrame.from_dict(r.json()['result']['records'])
df

Let's quickly convert the `Data` variable to a datetime type and plot the time series using `seaborn`: 

In [None]:
df['Data'] = pd.to_datetime(df['Data'], format="%Y-%m-%d")

In [None]:
sns.set(rc={'figure.figsize':(13, 7)})   # change the figure size
sns.lineplot(x="Data", y="Totale (Smc)", data=df)
plt.show()

#### More public APIs

At [this web page](https://mixedanalytics.com/blog/list-actually-free-open-no-auth-needed-apis/), you can find a long list of public APIs, meaning you can use them without the need to sign-up, create an account or authenticate to the service in order to access the unterlying data. To keep things nice and simple and allow you to practice more with APIs, let's look at that list and, for instance let's try the [Binance API](https://binance-docs.github.io/apidocs/spot/en/#introduction) example, which shows the endpoint to get some statistics on the [24hr Ticker Price Change](https://binance-docs.github.io/apidocs/spot/en/#24hr-ticker-price-change-statistics):

In [None]:
url_binance = 'https://api2.binance.com/api/v3/ticker/24hr'
bnc = requests.get(url_binance)

In [None]:
bnc.status_code

In [None]:
bnc.json()[0:3]

In [None]:
bnc_df = pd.DataFrame.from_dict(bnc.json())
bnc_df

In [None]:
bnc_df.info()

In [None]:
bnc_df['priceChangePercent'] = pd.to_numeric(bnc_df['priceChangePercent'])

In [None]:
bnc_df_top25 = bnc_df.sort_values('priceChangePercent', ascending=False).head(25)
bnc_df_top25.head()

In [None]:
sns.barplot(x='priceChangePercent',
            y="symbol", 
            data=bnc_df_top25, 
            order=bnc_df_top25.symbol, 
            color='c')