# Class 6: APIs

## APIs

- They are very powerful
- Can be used from any programming language
- Not expecting you to use them in your Final Project

## APIs, conceptually

![Diagram showing how online payments work: Expedia talks to Delta, Delta talks to Stripe, Stripe talks to Visa, and Visa talks to Chase](extras/img/apis_conceptually/payments.png)

![Diagram showing how notifications flow through systems](extras/img/apis_conceptually/notifications.png)

![Diagram showing relationship between human languages, programming languages, and APIs](extras/img/apis_conceptually/languages.png)

interactions between systems ↔️

## Ways to get data

Method | How it happens | Pros | Cons
--- | :--- | :--- | :---
**Bulk** | Download, someone hands you a flash drive, etc. | Fast, one-time transfer | Can be large
**Scraping** | Data only available through a web site, PDF, or doc | You can turn anything into data | Tedious; fragile
**APIs** | If organization makes one available | Usually allows some filtering; can always pull latest-and-greatest | Requires network connection for every call; higher barrier to entry (reading documentation); subject to availability and performance of API

## Scraping

Common tools:

- [Beautiful Soup package](https://realpython.com/beautiful-soup-web-scraper-python/)
- [pandas' `read_html()`](https://pandas.pydata.org/docs/user_guide/io.html#html)

_Please pray to the Demo Gods that these all work and there's no profanity_

Pull table from [Wikipedia's list of countries by area](https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_area#Countries_and_dependencies_by_area):

In [1]:
import pandas as pd

tables = pd.read_html(
    "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_area",
    match="Rank",
)
countries = tables[0]
countries

Unnamed: 0,Rank,Country / Dependency,Total in km2 (mi2),Land in km2 (mi2),Water in km2 (mi2),% water,Notes
0,,World,"510,072,000 (196,940,000)","148,940,000 (57,510,000)","361,132,000 (139,434,000)",70.8,
1,1,Russia,"17,098,246 (6,601,670)","16,378,410 (6,323,740)","719,836 (277,930)",4.21,"The largest country in the world, which spans ..."
2,,Antarctica,"14,200,000 (5,500,000)","14,200,000 (5,500,000)",0 (0),0,"13,916,000 km2 (5,373,000 sq mi) (98%) of the ..."
3,2,Canada[Note 1],"9,984,670 (3,855,100)","9,093,507 (3,511,023)","891,163 (344,080)",8.93,Largest English- and largest French-speaking c...
4,3 or 4[Note 4],China,"9,596,961 (3,705,407)[7]","9,326,410 (3,600,950)","270,550 (104,460)",2.82,"Largest country entirely in Asia, and second-l..."
...,...,...,...,...,...,...,...
260,,Ashmore and Cartier Islands (Australia),5 (1.9),5 (1.9),0 (0),0,
261,,Spratly Islands (disputed),(< 1.9),(< 1.9),0 (0),0,
262,,Coral Sea Islands (Australia),(< 1.2),(< 1.2),0 (0),0,
263,194,Monaco,2.02 (0.78),2.02 (0.78),0 (0),0,European microstate. Smallest country with a c...


### Data is only available if it's available

## API calls in the wild

1. Go to [Candidates page on fec.gov](https://www.fec.gov/data/candidates/?has_raised_funds=true&is_active_candidate=true).
1. Right click and `Inspect`.
   - [More info about opening Developer Tools in various browsers.](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_are_browser_developer_tools#how_to_open_the_devtools_in_your_browser)
1. Go to the `Network` tab and reload.
1. Filter to `XHR`.
1. Click the API call.

We only see this because the tables on [fec.gov](https://fec.gov) are [rendered client-side](https://www.solutelabs.com/blog/client-side-vs-server-side-rendering-what-to-choose-when) using their JSON API. That won't be the case for all tables on all sites.

### Parts of a URL

![URL structure](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/Web_mechanics/What_is_a_URL/mdn-url-all.png)

[source](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL#basics_anatomy_of_a_url)

For APIs:

- Often split into "base URL" + "endpoint"
- Endpoints are like function names: they represent the information you are retrieving or thing you are trying to do
- Parameters are like function arguments:
   - They allow options to be specified
   - Some are required, some are optional
   - They will differ from one endpoint/function to another
- Anchors won't be used

### API documentation

[FEC API](https://api.open.fec.gov/developers/)

### Try it out

1. Visit https://www.fec.gov/data/candidates/
1. [Open Developer Tools](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_are_browser_developer_tools#how_to_open_the_devtools_in_your_browser).
1. In the Network tab's request list, right-click the API call.
1. Click `Open in New Tab`.
1. Replace the API key with `DEMO_KEY`.

## API calls from Python

Usually one of two ways:

- A software development kit (SDK) like [sodapy](https://pypi.org/project/sodapy/)
   - Abstracts the details away
   - Not available for all APIs
   - May have limitations
- [The `requests` package](https://docs.python-requests.org/) (nothing to do with 311 requests)

In [2]:
import requests

params = {
    "api_key": "DEMO_KEY",
    "q": "Jimmy McMillan",
    "sort": "-first_file_date",
}
response = requests.get("https://api.open.fec.gov/v1/candidates/", params=params)
data = response.json()
data

{'api_version': '1.0',
 'results': [{'candidate_id': 'P60016805',
   'name': 'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH',
   'has_raised_funds': False,
   'election_districts': ['00'],
   'district': '00',
   'last_file_date': '2015-10-13',
   'last_f2_date': '2015-10-13',
   'state': 'US',
   'election_years': [2016],
   'incumbent_challenge': 'O',
   'federal_funds_flag': False,
   'candidate_status': 'N',
   'active_through': 2016,
   'office_full': 'President',
   'district_number': 0,
   'incumbent_challenge_full': 'Open seat',
   'inactive_election_years': None,
   'party': 'REP',
   'office': 'P',
   'candidate_inactive': False,
   'cycles': [2016, 2018],
   'flags': 'P60016805',
   'load_date': '2018-02-17T09:16:20+00:00',
   'first_file_date': '2015-10-13',
   'party_full': 'REPUBLICAN PARTY'},
  {'candidate_id': 'P60003290',
   'name': 'MCMILLAN, JIMMY (AKA) JAMES ',
   'has_raised_funds': False,
   'election_districts': ['00', '00'],
   'district': '00',
   'last_file_date': '2

### Retrieving nested data

In [3]:
data["results"][0]["name"]

'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH'

### In-class exercise

[Geocode](https://en.wikipedia.org/wiki/Address_geocoding) an address from Python using the [Nominatim API](https://nominatim.org/release-docs/develop/api/Search/). Print out the latitude and longitude. Any address is fine:

- Your own
- This building
- etc.

### Reading into a DataFrame

In [4]:
params = {
    "api_key": "DEMO_KEY",
    "has_raised_funds": "true",
}
response = requests.get("https://api.open.fec.gov/v1/candidates/", params=params)
data = response.json()

pd.DataFrame(data["results"])

Unnamed: 0,active_through,flags,incumbent_challenge_full,last_file_date,load_date,cycles,inactive_election_years,office,election_districts,candidate_id,...,candidate_status,incumbent_challenge,first_file_date,party_full,office_full,name,district_number,has_raised_funds,last_f2_date,election_years
0,2022,H2CO07170,Open seat,2022-08-10,2022-08-10T21:11:12+00:00,"[2022, 2024]",,H,[07],H2CO07170,...,C,O,2021-12-27,REPUBLICAN PARTY,House,"AADLAND, ERIK",7,True,2022-08-10,[2022]
1,2022,H2UT03280,Challenger,2022-03-21,2022-04-13T21:10:09+00:00,[2022],,H,[03],H2UT03280,...,C,C,2020-03-24,REPUBLICAN PARTY,House,"AALDERS, TIM",3,True,2022-03-21,[2022]
2,2018,S2UT00229,Open seat,2018-04-23,2019-03-27T16:02:41+00:00,"[2012, 2014, 2016, 2018, 2020]",,S,"[00, 00]",S2UT00229,...,P,O,2012-02-08,CONSTITUTION PARTY,Senate,"AALDERS, TIMOTHY NOEL",0,True,2018-04-23,"[2012, 2018]"
3,2020,H0TX22260,Open seat,2019-10-17,2020-03-18T21:13:37+00:00,[2020],,H,[22],H0TX22260,...,C,O,2019-10-17,REPUBLICAN PARTY,House,"AALOORI, BANGAR REDDY",22,True,2019-10-17,[2020]
4,1978,H6PA16106,,1978-07-05,2002-03-30T00:00:00+00:00,"[1976, 1978, 1980]",,H,"[16, 16]",H6PA16106,...,P,,1976-04-12,REPUBLICAN PARTY,House,"AAMODT, NORMAN O.",16,True,1978-07-05,"[1976, 1978]"
5,2012,H2CA01110,Challenger,2012-02-22,2013-04-26T09:04:30+00:00,"[2012, 2014, 2016]",,H,[01],H2CA01110,...,P,C,2012-02-22,REPUBLICAN PARTY,House,"AANESTAD, SAMUEL",1,True,2012-02-22,[2012]
6,2018,H8CO06237,Challenger,2017-04-26,2017-08-01T20:57:28+00:00,[2018],,H,[06],H8CO06237,...,C,C,2017-04-26,DEMOCRATIC PARTY,House,"AARESTAD, DAVID",6,True,2017-04-26,[2018]
7,2008,P80002926,Open seat,2007-03-13,2016-11-17T06:10:48+00:00,"[2006, 2008, 2010, 2012, 2014, 2016]",,P,[00],P80002926,...,N,O,2005-10-12,DEMOCRATIC PARTY,President,"AARON, LAURA DAVIS",0,True,2007-03-13,[2008]
8,2024,H2CA30291,Challenger,2022-07-15,2023-01-12T22:24:01+00:00,"[2022, 2024]",,H,"[32, 32]",H2CA30291,...,N,C,2021-01-16,DEMOCRATIC PARTY,House,"AAZAMI, SHERVIN",32,True,2022-07-15,"[2022, 2024]"
9,2022,H2MN07162,Challenger,2022-06-06,2022-07-25T23:03:23+00:00,"[2022, 2024]",,H,[07],H2MN07162,...,C,C,2022-06-06,DEMOCRATIC-FARMER-LABOR,House,"ABAHSAIN, JILL",7,True,2022-06-06,[2022]


## Back to 311 data

From [NYC Open Data Portal dataset page](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/data), click `Export` -> `SODA API` -> `API Docs`.

### Most open data sites have APIs

Often built on platforms that provide them, e.g.

- [NYC Open Data Portal](https://opendata.cityofnewyork.us/) built on [Socrata](https://dev.socrata.com/)
- [data.gov built on CKAN](https://www.data.gov/developers/apis)

### Example: 311 requests from the last week

In [5]:
from datetime import datetime, timedelta

now = datetime.utcnow()
now

datetime.datetime(2023, 2, 27, 21, 35, 54, 503885)

In [6]:
start = now - timedelta(weeks=1)
start

datetime.datetime(2023, 2, 20, 21, 35, 54, 503885)

In [7]:
start.isoformat()

'2023-02-20T21:35:54.503885'

Using the [Socrata query language (SoQL)](https://dev.socrata.com/docs/queries/):

In [8]:
data_id = "erm2-nwe9"
params = {
    "$where": f"created_date between '{start.isoformat()}' and '{now.isoformat()}'",
}

url = f"https://data.cityofnewyork.us/resource/{data_id}.json"
response = requests.get(url, params=params)
data = response.json()

data

[{'unique_key': '56897747',
  'created_date': '2023-02-26T02:00:25.000',
  'agency': 'NYPD',
  'agency_name': 'New York City Police Department',
  'complaint_type': 'Noise - Residential',
  'descriptor': 'Loud Music/Party',
  'location_type': 'Residential Building/House',
  'incident_zip': '10314',
  'incident_address': '23 SHERADEN AVENUE',
  'street_name': 'SHERADEN AVENUE',
  'cross_street_1': 'VICTORY BOULEVARD',
  'cross_street_2': 'PURDY AVENUE',
  'intersection_street_1': 'VICTORY BOULEVARD',
  'intersection_street_2': 'PURDY AVENUE',
  'address_type': 'ADDRESS',
  'city': 'STATEN ISLAND',
  'landmark': 'SHERADEN AVENUE',
  'status': 'In Progress',
  'resolution_action_updated_date': '2023-02-26T02:37:46.000',
  'community_board': '01 STATEN ISLAND',
  'bbl': '5007360017',
  'borough': 'STATEN ISLAND',
  'x_coordinate_state_plane': '945833',
  'y_coordinate_state_plane': '161990',
  'open_data_channel_type': 'ONLINE',
  'park_facility_name': 'Unspecified',
  'park_borough': 'STA

Like the FEC, Socrata uses their own API to populate the tables when browsing data on sites powered by them.

**At-home exercise:** Try filtering a table on the [NYC Open Data Portal](https://data.cityofnewyork.us/), and find the API calls that makes.

In [9]:
pd.DataFrame(data)

Unnamed: 0,unique_key,created_date,agency,agency_name,complaint_type,descriptor,location_type,incident_zip,incident_address,street_name,...,longitude,location,taxi_pick_up_location,resolution_description,closed_date,facility_type,bridge_highway_name,bridge_highway_direction,road_ramp,bridge_highway_segment
0,56897747,2023-02-26T02:00:25.000,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,10314,23 SHERADEN AVENUE,SHERADEN AVENUE,...,-74.13836486427032,"{'latitude': '40.61121823631895', 'longitude':...",,,,,,,,
1,56895713,2023-02-26T02:00:04.000,NYPD,New York City Police Department,Abandoned Vehicle,With License Plate,Street/Sidewalk,11362,61 AVENUE,61 AVENUE,...,-73.73676093186535,"{'latitude': '40.75574899915731', 'longitude':...",,,,,,,,
2,56898587,2023-02-26T01:59:53.000,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,10472,1030 BOYNTON AVENUE,BOYNTON AVENUE,...,-73.87737784124917,"{'latitude': '40.8250227912721', 'longitude': ...",,,,,,,,
3,56892636,2023-02-26T01:59:48.000,NYPD,New York City Police Department,Noise - Vehicle,Car/Truck Music,Street/Sidewalk,11231,554 COURT STREET,COURT STREET,...,-73.99946284015022,"{'latitude': '40.674991502606964', 'longitude'...",,,,,,,,
4,56898940,2023-02-26T01:59:17.000,NYPD,New York City Police Department,Noise - Commercial,Loud Music/Party,Store/Commercial,11206,162 THROOP AVENUE,THROOP AVENUE,...,-73.94423377144169,"{'latitude': '40.69959035300927', 'longitude':...",,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,56892628,2023-02-25T22:52:38.000,NYPD,New York City Police Department,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11206,920 BROADWAY,BROADWAY,...,-73.93617180672432,"{'latitude': '40.697486417308184', 'longitude'...",,The Police Department responded to the complai...,2023-02-26T00:10:07.000,,,,,
996,56893195,2023-02-25T22:52:14.000,NYPD,New York City Police Department,Illegal Parking,Commercial Overnight Parking,Street/Sidewalk,10025,12 WEST 108 STREET,WEST 108 STREET,...,-73.95961594433278,"{'latitude': '40.799568726743885', 'longitude'...",,The Police Department responded to the complai...,2023-02-25T23:01:23.000,,,,,
997,56899760,2023-02-25T22:51:46.000,HPD,Department of Housing Preservation and Develop...,HEAT/HOT WATER,ENTIRE BUILDING,RESIDENTIAL BUILDING,11364,224-24 UNION TURNPIKE,UNION TURNPIKE,...,-73.74628284882944,"{'latitude': '40.735824187950136', 'longitude'...",,The following complaint conditions are still o...,,,,,,
998,56897188,2023-02-25T22:51:38.000,NYPD,New York City Police Department,Noise - Commercial,Loud Music/Party,Club/Bar/Restaurant,11423,184-11 JAMAICA AVENUE,JAMAICA AVENUE,...,-73.77560768130365,"{'latitude': '40.70923611783929', 'longitude':...",,The Police Department responded to the complai...,2023-02-25T23:19:19.000,,,,,


Coincidence there were exactly 1,000 results?

### Pagination

- Most APIs limit the number of results returned.
- [Socrata defaults to 1,000.](https://dev.socrata.com/docs/queries/limit.html)
- Need to use a loop with parameters like [`$limit`](https://dev.socrata.com/docs/queries/limit.html)+[`$offset`](https://dev.socrata.com/docs/queries/offset.html) (Socrata) or `page`+`per_page` ([FEC](https://api.open.fec.gov/developers/))
   - [`append()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html) to DataFrame

## Things are going to differ by API

- Endpoints
- Supported parameters
- Response structure
   - [`json_normalize()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html) can help
- Quality of documentation
- Helpfulness of errors
- Size/helpfulness of community

Gotta read and experiment.

## [Final Project Submission and Peer grading](https://python-public-policy.afeld.me/en/{{school_slug}}/final_project.html#submission)

## [Course evaluations](https://m.albert.nyu.edu/app/student/nyuCrseEval/crseEval/1228/6312/GP/10)

Please complete before signing off! The are:

- Totally anonymous
- Not visible to me until grades are released
- A big help. Some things I took from the past:
    - Making assignments more rigorous
    - Students are hungry for more
    - People like the in-class exercises

## [Time Out](https://bulletin.columbia.edu/sipa/teaching-guide/#supportservicestext)

You should have gotten an email; please take a moment now to RSVP.

Thanks to the {{assistant_name}}s!

## Thank you!

Keep in touch:

- [Email](https://python-public-policy.afeld.me/en/{{school_slug}}/syllabus.html#instructor-information)
- [@aidanfeldman](https://twitter.com/aidanfeldman/) on Twitter