# Using REST APIs as data sources

* Data is everywhere and it is generated constantly
* The number of data sources is amazingly huge
* Datasets are huge and can be used in many ways

* We may do amazing things using data made available by third-party:
    - https://developer.walmartlabs.com/docs
    - https://developer.spotify.com/documentation/web-api/
    - https://earthquake.usgs.gov/fdsnws/event/1/
    
    
We will have a nice and brief overview about how to consume data from REST APIs, mainly focusing on **JSON**.


### What is an API?

**Application Programming Interface** defines the methods for one software program to interact with the other. 

In the case of this lecture, we are dealing with a REST API, which sends data over a network: one type of Web service.

When we want to receive data from an Web service, we need to make a `request` to this service. When the server receives this request, it sends a `response`.

![request.png](request.png)

### Requests

Knowing that, we will not have to learn about making requests in Python. Instead, we do it by importing the module requests.

In [17]:
import requests

There are different types of requests. 

In our case we will use a `GET`, which is used to retrieve data. This is the type of request we use to collect data.

A response from the API contains 2 things (among others): 
* response code
* response data

To make a request, we use:

In [18]:
response = requests.get('http://www.nau.edu/')
type(response)

requests.models.Response

The `request.get(URL)` returns an object Response, which provides, among other things, the response code.

In [19]:
response.status_code

200

The most common codes are:
* 200: Everything went okay, and the result has been returned (if any).
* 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
* 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
* 401: The server thinks you’re not authenticated. Many APIs require login ccredentials, so this happens when you don’t send the right credentials to access an API.
* 403: The resource you’re trying to access is forbidden: you don’t have the right permissions to see it.
* 404: The resource you tried to access wasn’t found on the server.
* 503: The server is not ready to handle the request.

More details about status codes list can be found [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status)

### What about getting the data?

First, read the documentation! Everytime you use an API, please read the documentation to understand how to use, the structure, etc.

We will use the [Open Notify API](http://api.open-notify.org/), which gives access to data about the international space station.

These APIs usually provide multiple endpoints, which are the ways we can interact with that service.

Let's try a request and see how it goes:

In [20]:
response = requests.get("http://api.open-notify.org/astros.json")
print(response.status_code)

200


Now we can see the data...

In [21]:
type(response.content)

bytes

In [22]:
response.text

'{"number": 10, "people": [{"name": "Oleg Artemyev", "craft": "ISS"}, {"name": "Denis Matveev", "craft": "ISS"}, {"name": "Sergey Korsakov", "craft": "ISS"}, {"name": "Kjell Lindgren", "craft": "ISS"}, {"name": "Bob Hines", "craft": "ISS"}, {"name": "Samantha Cristoforetti", "craft": "ISS"}, {"name": "Jessica Watkins", "craft": "ISS"}, {"name": "Cai Xuzhe", "craft": "Tiangong"}, {"name": "Chen Dong", "craft": "Tiangong"}, {"name": "Liu Yang", "craft": "Tiangong"}], "message": "success"}'

In [23]:
response.json()

{'number': 10,
 'people': [{'name': 'Oleg Artemyev', 'craft': 'ISS'},
  {'name': 'Denis Matveev', 'craft': 'ISS'},
  {'name': 'Sergey Korsakov', 'craft': 'ISS'},
  {'name': 'Kjell Lindgren', 'craft': 'ISS'},
  {'name': 'Bob Hines', 'craft': 'ISS'},
  {'name': 'Samantha Cristoforetti', 'craft': 'ISS'},
  {'name': 'Jessica Watkins', 'craft': 'ISS'},
  {'name': 'Cai Xuzhe', 'craft': 'Tiangong'},
  {'name': 'Chen Dong', 'craft': 'Tiangong'},
  {'name': 'Liu Yang', 'craft': 'Tiangong'}],
 'message': 'success'}

### Working with JSON 
JSON stands for JavaScript Object Notation. It is a way to encode data structures that ensures that they are easily readable. 

JSON output look like Python something with *dictionaries, lists, strings* and *integers*. And it is...

But, how to use it? Well, we used it in the last command.


In [24]:
import json

json has two main functions:

* `json.dumps()` — Takes in a Python object and converts (dumps) to a string.
* `json.loads()` — Takes a JSON string and converts (loads) to a Python object.

The `dumps()` is particularly useful as we can use it to format the json, making it easier to understand the output

In [25]:
json_response = response.json()
formatted_json = json.dumps(json_response, sort_keys=True, indent=3)

print(formatted_json)

{
   "message": "success",
   "number": 10,
   "people": [
      {
         "craft": "ISS",
         "name": "Oleg Artemyev"
      },
      {
         "craft": "ISS",
         "name": "Denis Matveev"
      },
      {
         "craft": "ISS",
         "name": "Sergey Korsakov"
      },
      {
         "craft": "ISS",
         "name": "Kjell Lindgren"
      },
      {
         "craft": "ISS",
         "name": "Bob Hines"
      },
      {
         "craft": "ISS",
         "name": "Samantha Cristoforetti"
      },
      {
         "craft": "ISS",
         "name": "Jessica Watkins"
      },
      {
         "craft": "Tiangong",
         "name": "Cai Xuzhe"
      },
      {
         "craft": "Tiangong",
         "name": "Chen Dong"
      },
      {
         "craft": "Tiangong",
         "name": "Liu Yang"
      }
   ]
}


### REST API with Query Parameters
In some cases, it is possible to pass parameters to filter the output of the API.

The https://earthquake.usgs.gov/fdsnws/event/1/query endpoint tells what are the earthquakes given a set of parameters. For example time, location, etc.
More information here:
https://earthquake.usgs.gov/fdsnws/event/1/#parameters  

In the example below, we show the earthquakes in January 2022 (`starttime` and `endtime`), with magnitude between 6 and 7 (`minmagnitude` and `maxmagnitude`).



In [None]:
import requests
import json
from datetime import datetime

response = requests.get("https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2022-01-01&endtime=2022-01-31&maxmagnitude=7&minmagnitude=6")
json_response = response.json()
formatted_json = json.dumps(json_response, sort_keys=False, indent=2)

print(formatted_json)

#### Getting what we need...
Now, we will *print the place, date, and magnitude of each of them*

Here is what we used:
```json
"features": [
    {
      "type": "Feature",
      "properties": {
        "mag": 6.2,
        "place": "66 km E of Hualien City, Taiwan",
        "time": 1641203195767,
...
```



In [28]:
max_magnitude = 0

for earthquake in json_response["features"]:
    magnitude = earthquake["properties"]["mag"]
    print("----")
    print("Place:  " + earthquake["properties"]["place"])    
    print("Time:  " + str(earthquake["properties"]["time"]))    
    print("Mag:  " + str(magnitude))
    if (magnitude > max_magnitude):
        max_magnitude = magnitude
print ("\nMaximum magnitude: " + str(max_magnitude))
    

----
Place:  Kermadec Islands region
Time:  1643424399588
Mag:  6.5
----
Place:  281 km SW of Arenas, Panama
Time:  1643368449293
Mag:  6
----
Place:  220 km WNW of Pangai, Tonga
Time:  1643265605543
Mag:  6.2
----
Place:  South Sandwich Islands region
Time:  1643073873513
Mag:  6
----
Place:  71 km S of Unalaska, Alaska
Time:  1642828624896
Mag:  6.2
----
Place:  232 km SE of Sarangani, Philippines
Time:  1642818373323
Mag:  6
----
Place:  27 km SSE of Saiki, Japan
Time:  1642781317210
Mag:  6.3
----
Place:  74 km WSW of Panguna, Papua New Guinea
Time:  1642337528025
Mag:  6.1
----
Place:  80 km SW of Labuan, Indonesia
Time:  1642151141461
Mag:  6.6
----
Place:  53 km SE of Nikolski, Alaska
Time:  1641904772454
Mag:  6.6
----
Place:  100 km SE of Nikolski, Alaska
Time:  1641900943674
Mag:  6.8
----
Place:  48 km WNW of Pólis, Cyprus
Time:  1641863268064
Mag:  6.6
----
Place:  south of the Kermadec Islands
Time:  1641773190834
Mag:  6.2
----
Place:  northern Qinghai, China
Time:  16415