<a href="https://colab.research.google.com/github/dmschoi/ML_test/blob/main/ch2/3_making_dataframes_from_api_requests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Making Pandas DataFrames from API Requests
In this example, we will use the U.S. Geological Survey's API to grab a JSON object of earthquake data and convert it to a `pandas.DataFrame`.

USGS API: https://earthquake.usgs.gov/fdsnws/event/1/

### Get Data from API

In [None]:
import datetime as dt
import pandas as pd
import requests

yesterday = dt.date.today() - dt.timedelta(days=1)
api = 'https://earthquake.usgs.gov/fdsnws/event/1/query'
payload = {
    'format': 'geojson',
    'starttime': yesterday - dt.timedelta(days=30),
    'endtime': yesterday
}
response = requests.get(api, params=payload)

# let's make sure the request was OK
response.status_code

200

Response of 200 means OK, so we can pull the data out of the result. Since we asked the API for a JSON payload, we can extract it from the response with the `json()` method.

### Isolate the Data from the JSON Response
We need to check the structures of the response data to know where our data is.

In [None]:
earthquake_json = response.json()
earthquake_json.keys()

dict_keys(['type', 'metadata', 'features', 'bbox'])

The USGS API provides information about our request in the `metadata` key. Note that your result will be different, regardless of the date range you chose, because the API includes a timestamp for when the data was pulled:

In [None]:
earthquake_json['metadata']

{'generated': 1726118919000,
 'url': 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2024-08-12&endtime=2024-09-11',
 'title': 'USGS Earthquakes',
 'status': 200,
 'api': '1.14.1',
 'count': 9774}

Each element in the JSON array `features` is a row of data for our dataframe.

In [None]:
type(earthquake_json['features'])

list

Your data will be different depending on the date you run this.

In [None]:
earthquake_json['features'][0]

{'type': 'Feature',
 'properties': {'mag': 1.8,
  'place': '23 km SE of Alamo, Nevada',
  'time': 1726012794250,
  'updated': 1726016471537,
  'tz': None,
  'url': 'https://earthquake.usgs.gov/earthquakes/eventpage/nn00884519',
  'detail': 'https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=nn00884519&format=geojson',
  'felt': 1,
  'cdi': 2,
  'mmi': None,
  'alert': None,
  'status': 'reviewed',
  'tsunami': 0,
  'sig': 50,
  'net': 'nn',
  'code': '00884519',
  'ids': ',nn00884519,',
  'sources': ',nn,',
  'types': ',dyfi,origin,phase-data,',
  'nst': 8,
  'dmin': 0.205,
  'rms': 0.1706,
  'gap': 107.93,
  'magType': 'ml',
  'type': 'earthquake',
  'title': 'M 1.8 - 23 km SE of Alamo, Nevada'},
 'geometry': {'type': 'Point', 'coordinates': [-114.9876, 37.2084, 0]},
 'id': 'nn00884519'}

### Convert to DataFrame
We need to grab the `properties` section out of every entry in the `features` JSON array to create our dataframe.

In [None]:
earthquake_properties_data = [
    quake['properties'] for quake in earthquake_json['features']
]
df = pd.DataFrame(earthquake_properties_data)
df.head()

Unnamed: 0,mag,place,time,updated,tz,url,detail,felt,cdi,mmi,...,ids,sources,types,nst,dmin,rms,gap,magType,type,title
0,1.8,"23 km SE of Alamo, Nevada",1726012794250,1726016471537,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/fdsnws/event/1/que...,1.0,2.0,,...,",nn00884519,",",nn,",",dyfi,origin,phase-data,",8.0,0.205,0.1706,107.93,ml,earthquake,"M 1.8 - 23 km SE of Alamo, Nevada"
1,5.0,South Sandwich Islands region,1726012642799,1726013717040,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,...,",us7000ncz7,",",us,",",origin,phase-data,",33.0,6.11,0.99,122.0,mb,earthquake,M 5.0 - South Sandwich Islands region
2,0.19,"10 km NW of The Geysers, CA",1726012614740,1726014263814,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,...,",nc75060261,",",nc,",",nearby-cities,origin,phase-data,scitech-link,",8.0,0.01501,0.01,116.0,md,earthquake,"M 0.2 - 10 km NW of The Geysers, CA"
3,1.07,"5 km NNW of Lytle Creek, CA",1726012423840,1726085242912,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,,...,",ci40730351,",",ci,",",nearby-cities,origin,phase-data,scitech-link,",32.0,0.07425,0.19,41.0,ml,earthquake,"M 1.1 - 5 km NNW of Lytle Creek, CA"
4,5.7,South Sandwich Islands region,1726012239102,1726098861674,,https://earthquake.usgs.gov/earthquakes/eventp...,https://earthquake.usgs.gov/fdsnws/event/1/que...,,,3.247,...,",at00sjmfkj,us7000ncz6,",",at,us,",",internal-origin,losspager,origin,phase-data,s...",80.0,6.112,1.0,47.0,mb,earthquake,M 5.7 - South Sandwich Islands region


### (Optional) Write Data to CSV

In [None]:
df.to_csv('earthquakes.csv', index=False)

<hr>
<div>
    <a href="./2-creating_dataframes.ipynb">
        <button style="float: left;">&#8592; Previous Notebook</button>
    </a>
    <a href="./4-inspecting_dataframes.ipynb">
        <button style="float: right;">Next Notebook &#8594;</button>
    </a>
</div>
<br>
<hr>