# HTTP Requests
---
This notebook introduces the `requests` library, which we can use to retrieve data from the web.

Some data resources are provided online as **APIs** (Application Programming Interfaces), which can be accessed using a protocol called HTTP (HyperText Transfer Protocol). These interfaces may allow us to retrieve information in various ways, for example by asking for a particular record or making a search.

In this notebook we will look at an API provided by the [US Geological Survey](https://www.usgs.gov/programs/earthquake-hazards) providing continually-updated information about earthquakes around the world.

Firstly, take a look at the web form for searching manually. Go to https://earthquake.usgs.gov/earthquakes/search/

Notice all of the different search options and try some searches out.

The results are presented as a list and an interactive map.

<img src='../resources/earthquakes.png'>

---
## Making a request
But what if we want to retrieve the data themselves? Here is an example using `requests`:

In [1]:
import requests

# An endpoint is a URL that accepts requests
endpoint = "https://earthquake.usgs.gov/fdsnws/event/1/query?"

# The parameters for our search, in the form of a python dictionary
ps = { "format": "csv", 
       "starttime": "2023-09-27", 
       "endtime": "2023-09-28",
       "minmagnitude": 4.5  
     }

response = requests.get(endpoint, params=ps)

print(response.text)


time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError,depthError,magError,magNst,status,locationSource,magSource
2023-09-27T23:59:04.387Z,4.9975,-82.7707,10,4.5,mb,22,170,3.435,0.92,us,us6000lb8x,2023-12-09T22:00:40.040Z,"south of Panama",earthquake,8.23,1.936,0.204,7,reviewed,us,us
2023-09-27T22:32:59.631Z,-2.7226,128.7449,23.65,4.7,mb,61,51,2.13,0.62,us,us6000lb8b,2023-12-09T22:00:40.040Z,"68 km NNW of Masohi, Indonesia",earthquake,5.93,4.696,0.078,49,reviewed,us,us
2023-09-27T21:27:14.946Z,-61.3586,154.8163,10,4.5,mb,15,139,17.011,0.82,us,us6000lcg7,2023-12-09T22:00:42.040Z,"Balleny Islands region",earthquake,7.45,1.969,0.15,13,reviewed,us,us
2023-09-27T18:00:40.270Z,-18.8999,-178.0139,613.904,4.8,mb,104,44,3.475,0.81,us,us6000lb63,2023-12-09T22:00:40.040Z,"296 km ESE of Levuka, Fiji",earthquake,10.69,5.921,0.046,147,reviewed,us,us
2023-09-27T17:47:21.596Z,0.8622,93.1129,10,5.5,mww,101,25,4.483,0.68,us,us6000lb60,2023-12-09T22:00:40.040

An *endpoint* is a particular URL (web address) that will accept a request in a particular format and return a *response*. 

The USGS endpoint is able to return data in CSV format, which we can access using the `.text` attribute.

The parameter options for this endpoint are explained 
[here](https://earthquake.usgs.gov/fdsnws/event/1/?ref=springboard).

---
## Converting to a DataFrame
If we want to create a DataFrame from this CSV text, we can do the following:

In [None]:
import pandas as pd
import io

# make a string buffer from the CSV text
buf = io.StringIO(response.text) 

# pandas can now read the data from the buffer
data = pd.read_csv(buf)          

data

---
### JSON

We are getting comfortable with CSV format, but there are many other data formats that are in common use.

When working with APIs, we often encounter the **JSON** (JavaScript Object Notation) format. 

The USGS endpoint can return responses in [GeoJSON](https://en.wikipedia.org/wiki/GeoJSON), a particular flavour of JSON which is used to describe geolocated objects.

In [None]:
endpoint = "https://earthquake.usgs.gov/fdsnws/event/1/query?"

ps = { "format": "geojson", 
       "starttime": "2023-09-27", 
       "endtime": "2023-09-28",
       "minmagnitude": 4.8  
     }

response = requests.get(endpoint, params=ps)

print(response.text)


Notice that JSON data consists of nested dictionaries (marked by **{ }**) and lists (marked by **\[ ]**).
It is a very flexible format, and can be used to describe data structures that are more complex than just tables.


If you need to work with JSON data, the built-in `json` module is very handy.

In [None]:
import json
data_dict = json.loads(response.text)
data_dict

*data_dict* is now just a python dictionary, which we can manipulate as we like.
For example, to print the magnitude of all earthquakes in the response:

In [None]:
for eq in data_dict['features']:
    print(eq['properties']['mag'])

If needed, we can also convert a JSON dictionary to a pandas DataFrame using [`json_normalize`](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html#pandas.json_normalize):

In [None]:
# the 'record_path' argument shows how to locate the table rows within the dict
pd.json_normalize(data_dict, record_path=['features'])