# Web Scraping Intro, Part 2: APIs

An Application Programming Interface, or **API**, is a structured way to retrieve data from a website. Using an API is safer and easier than something like webscraping, since what you get back is already in a usable format. Many organizations use APIs like:
- Government organizations ([US Government](https://www.data.gov/developers/apis))
- Large companies ([Twitter API](https://developer.twitter.com/en/docs))
- News organizations ([NYT API](https://developer.nytimes.com/))
- And [many more](https://github.com/public-apis/public-apis)

If you type "how to use an api in python" in google, you get back many articles walking through how to use an API. It is a well documented and useful tool to be familiar with.

We can use the `requests` library to retrieve data from an API.

In [None]:
import requests

## Using the data.nashville.gov API

The Nashville Open Data Portal provides an API for retrieveing data.

Let's look at the traffic accidents data: https://data.nashville.gov/Police/Traffic-Accidents/6v6w-hpcw

Notice that in the upper right corner there is an API button. 

You'll see that data.nashville.gov allows accessing many of their datasets through the Socrata Open Data API (SODA).

Click on this and choose the **CSV** endpoint. Copy the url below.

In [None]:
url = 'https://data.nashville.gov/resource/6v6w-hpcw.csv?'

We can send a GET request to this url to fetch the associated csv.

In [None]:
r = requests.get(url)

Let's see that is returned.

In [None]:
print(r.text[:1000])

It is a string formatted like a csv file. If we want to convert this to a dataframe, we can do so using the StringIO method.

In [None]:
import pandas as pd
from io import StringIO

In [None]:
crashes = pd.read_csv(StringIO(r.text))
crashes.head()

Another way to make it work is to pass the url itself to `read_csv`.

In [None]:
crashes = pd.read_csv(r.url)
crashes.head()

Finally, we can save the text as a csv file and then read it back in using pandas:

In [None]:
with open('crashes.csv', 'w') as fi:
    fi.write(r.text)

In [None]:
crashes = pd.read_csv('crashes.csv')
crashes.head()

We can also request only a subset of the data. One way to do so is by adding additional parameters to the url.

For example, let's say we want to look for hit and run crashes that happened near the Nashville Software School (zipcode 37217). We can encode these as a dictionary.

See the [documentation](https://dev.socrata.com/foundry/data.nashville.gov/6v6w-hpcw) for more information about what parameters you can pass in.

In [None]:
payload = {
    'zip': '37217',
    'hit_and_run': 'True'
          }

Then pass this dictionary to the `.get` request using the `params` argument.

In [None]:
r = requests.get(url=url, params=payload)

If you want to inspect the resulting url, you can access the response url attribute. You can see how the parameters we created are tacked onto the original url.

In [None]:
print(r.url)

In [None]:
crashes = pd.read_csv(StringIO(r.text))
crashes.head()

We are somewhat limited in how we can filter our request by just using parameter. For more complicated types of queries, check out [SoQL](https://dev.socrata.com/docs/queries/), the Socrata Query Language. SoQL has many similarities to SQL.

A request using SoQL might look like this:

In [None]:
url = "https://data.nashville.gov/resource/6v6w-hpcw.csv?$where=date_and_time between '2019-01-10T12:00:00' and '2020-01-10T14:00:00' AND number_of_injuries > 0&$limit=2000"

In [None]:
r = requests.get(url)

In [None]:
crashes = pd.read_csv(StringIO(r.text))
crashes.head()