# Class 5: APIs

## APIs, conceptually

![Diagram showing how online payments work: Expedia talks to Delta, Delta talks to Stripe, Stripe talks to Visa, and Visa talks to Chase](extras/img/apis_conceptually/payments.png)

![Diagram showing how notifications flow through systems](extras/img/apis_conceptually/notifications.png)

![Diagram showing relationship between human languages, programming languages, and APIs](extras/img/apis_conceptually/languages.png)

interactions between systems ↔️

## Ways to get data

Method | How it happens | Pros | Cons
--- | :--- | :--- | :---
**Bulk** | Download, someone hands you a flash drive, etc. | Fast, one-time transfer | Can be large
**Scraping** | Data only available through a web site, PDF, or doc | You can turn anything into data | Tedious; fragile
**APIs** | If organization makes one available | Usually allows some filtering; can always pull latest-and-greatest | Requires network connection for every call; higher barrier to entry (reading documentation); subject to availability and performance of API

## Scraping

Common tools:

- [Beautiful Soup package](https://realpython.com/beautiful-soup-web-scraper-python/)
- [pandas' `read_html()`](https://pandas.pydata.org/docs/user_guide/io.html#html)

_Please pray to the Demo Gods that these all work and there's no profanity_

Pull table from [Wikipedia's list of countries and dependencies by population](https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population):

In [1]:
import pandas as pd

tables = pd.read_html(
    "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population",
    match="Rank",
)
pop = tables[0]
pop

Unnamed: 0_level_0,Rank,Country / Dependency,Population,Population,Date,Source (official or from the United Nations),Notes
Unnamed: 0_level_1,Rank,Country / Dependency,Numbers,% of the world,Date,Source (official or from the United Nations),Notes
0,–,World,7996836000,100%,7 Dec 2022,UN projection[3],
1,1,China,1412600000,,31 Dec 2021,Official estimate[4],The population figure refers to mainland China...
2,2,India,1375586000,,1 Mar 2022,Official projection[5],The figure includes the population of the Indi...
3,3,United States,333329956,,5 Dec 2022,Population clock[6],The figure includes the 50 states and the Dist...
4,4,Indonesia,275773800,,1 Jul 2022,Official estimate[7],
...,...,...,...,...,...,...,...
237,–,Tokelau (New Zealand),1647,,1 Jan 2019,2019 Census [208],
238,–,Niue (New Zealand),1549,,1 Jul 2021,National annual projection[95],
239,195,Vatican City,825,,1 Feb 2019,Monthly national estimate[209],The total population of 825 consisted of 453 r...
240,–,Cocos (Keeling) Islands (Australia),593,,30 Jun 2020,2021 Census[210],


### Data is only available if it's available

## API calls in the wild

1. Go to [Candidates page on fec.gov](https://www.fec.gov/data/candidates/?has_raised_funds=true&is_active_candidate=true).
1. Right click and `Inspect`.
   - [More info about opening Developer Tools in various browsers.](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_are_browser_developer_tools#how_to_open_the_devtools_in_your_browser)
1. Go to the `Network` tab and reload.
1. Filter to `XHR`.
1. Click the API call.

### Parts of a URL

![URL structure](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL/mdn-url-all.png)

[source](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL#basics_anatomy_of_a_url)

For APIs:

- Often split into "base URL" + "endpoint"
- Anchors aren't relevant

### API documentation

[FEC API](https://api.open.fec.gov/developers/)

### Try it out

1. Visit https://www.fec.gov/data/candidates/
1. In the Network tab's request list, right-click the API call.
1. Click `Open in New Tab`.
1. Replace the API key with `DEMO_KEY`.

## API calls from Python

Usually one of two ways:

- A software development kit (SDK) like [sodapy](https://pypi.org/project/sodapy/)
   - Abstracts the details away
   - Not available for all APIs
   - May have limitations
- [The `requests` package](https://docs.python-requests.org/) (nothing to do with 311 requests)

In [2]:
import requests

params = {"api_key": "DEMO_KEY", "q": "Jimmy McMillan", "sort": "-first_file_date"}
response = requests.get("https://api.open.fec.gov/v1/candidates/", params=params)
data = response.json()
data

{'api_version': '1.0',
 'results': [{'last_file_date': '2015-10-13',
   'election_districts': ['00'],
   'candidate_status': 'N',
   'district': '00',
   'office_full': 'President',
   'flags': 'P60016805',
   'first_file_date': '2015-10-13',
   'inactive_election_years': None,
   'district_number': 0,
   'party_full': 'REPUBLICAN PARTY',
   'office': 'P',
   'state': 'US',
   'name': 'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH',
   'party': 'REP',
   'load_date': '2018-02-17T09:16:20+00:00',
   'candidate_id': 'P60016805',
   'incumbent_challenge_full': 'Open seat',
   'active_through': 2016,
   'candidate_inactive': False,
   'federal_funds_flag': False,
   'last_f2_date': '2015-10-13',
   'cycles': [2016, 2018],
   'incumbent_challenge': 'O',
   'has_raised_funds': False,
   'election_years': [2016]},
  {'last_file_date': '2011-02-07',
   'election_districts': ['00', '00'],
   'candidate_status': 'N',
   'district': '00',
   'office_full': 'President',
   'flags': 'P60003290',
   'first

### Retrieving nested data

In [3]:
data["results"][0]["name"]

'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH'

### [In-class exercise](hw_5.ipynb)

### Reading into a DataFrame

In [4]:
params = {"api_key": "DEMO_KEY", "has_raised_funds": "true"}
response = requests.get("https://api.open.fec.gov/v1/candidates/", params=params)
data = response.json()

pd.DataFrame(data["results"])

Unnamed: 0,candidate_status,incumbent_challenge_full,incumbent_challenge,flags,party_full,candidate_id,district,inactive_election_years,first_file_date,federal_funds_flag,...,cycles,office_full,load_date,state,election_years,candidate_inactive,name,has_raised_funds,office,active_through
0,C,Open seat,O,H2CO07170,REPUBLICAN PARTY,H2CO07170,7,,2021-12-27,False,...,[2022],House,2022-08-10T21:11:12+00:00,CO,[2022],False,"AADLAND, ERIK",True,H,2022
1,C,Challenger,C,H2UT03280,REPUBLICAN PARTY,H2UT03280,3,,2020-03-24,False,...,[2022],House,2022-04-13T21:10:09+00:00,UT,[2022],False,"AALDERS, TIM",True,H,2022
2,P,Open seat,O,S2UT00229,CONSTITUTION PARTY,S2UT00229,0,,2012-02-08,False,...,"[2012, 2014, 2016, 2018, 2020]",Senate,2019-03-27T16:02:41+00:00,UT,"[2012, 2018]",False,"AALDERS, TIMOTHY NOEL",True,S,2018
3,C,Open seat,O,H0TX22260,REPUBLICAN PARTY,H0TX22260,22,,2019-10-17,False,...,[2020],House,2020-03-18T21:13:37+00:00,TX,[2020],False,"AALOORI, BANGAR REDDY",True,H,2020
4,P,,,H6PA16106,REPUBLICAN PARTY,H6PA16106,16,,1976-04-12,False,...,"[1976, 1978, 1980]",House,2002-03-30T00:00:00+00:00,PA,"[1976, 1978]",False,"AAMODT, NORMAN O.",True,H,1978
5,P,Challenger,C,H2CA01110,REPUBLICAN PARTY,H2CA01110,1,,2012-02-22,False,...,"[2012, 2014, 2016]",House,2013-04-26T09:04:30+00:00,CA,[2012],False,"AANESTAD, SAMUEL",True,H,2012
6,C,Challenger,C,H8CO06237,DEMOCRATIC PARTY,H8CO06237,6,,2017-04-26,False,...,[2018],House,2017-08-01T20:57:28+00:00,CO,[2018],False,"AARESTAD, DAVID",True,H,2018
7,N,Open seat,O,P80002926,DEMOCRATIC PARTY,P80002926,0,,2005-10-12,False,...,"[2006, 2008, 2010, 2012, 2014, 2016]",President,2016-11-17T06:10:48+00:00,US,[2008],False,"AARON, LAURA DAVIS",True,P,2008
8,C,Challenger,C,H2CA30291,DEMOCRATIC PARTY,H2CA30291,32,,2021-01-16,False,...,[2022],House,2022-07-14T21:11:42+00:00,CA,[2022],False,"AAZAMI, SHERVIN",True,H,2022
9,C,Challenger,C,H2MN07162,DEMOCRATIC-FARMER-LABOR,H2MN07162,7,,2022-06-06,False,...,[2022],House,2022-07-25T23:03:23+00:00,MN,[2022],False,"ABAHSAIN, JILL",True,H,2022


## Back to 311 data

From [NYC Open Data Portal dataset page](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/data), click `Export` -> `SODA API` -> `API Docs`.

### Most open data sites have APIs

Often built on platforms that provide them, e.g.

- [NYC Open Data Portal](https://opendata.cityofnewyork.us/) built on [Socrata](https://dev.socrata.com/)
- [data.gov built on CKAN](https://www.data.gov/developers/apis)

### Example: 311 requests from the last week

In [5]:
from datetime import datetime, timedelta

now = datetime.utcnow()
now

datetime.datetime(2022, 12, 7, 21, 47, 54, 114679)

In [6]:
start = now - timedelta(weeks=1)
start

datetime.datetime(2022, 11, 30, 21, 47, 54, 114679)

In [7]:
start.isoformat()

'2022-11-30T21:47:54.114679'

Using the [Socrata query language (SoQL)](https://dev.socrata.com/docs/queries/):

In [8]:
data_id = "erm2-nwe9"
params = {"$where": f"created_date between '{start.isoformat()}' and '{now.isoformat()}'"}

url = f"https://data.cityofnewyork.us/resource/{data_id}.json"
response = requests.get(url, params=params)
data = response.json()

data

[{'unique_key': '56175237',
  'created_date': '2022-12-06T12:00:00.000',
  'closed_date': '2022-12-06T12:00:00.000',
  'agency': 'DSNY',
  'agency_name': 'Department of Sanitation',
  'complaint_type': 'Derelict Vehicles',
  'descriptor': 'Derelict Vehicles',
  'location_type': 'Street',
  'incident_zip': '11420',
  'incident_address': '135-28 125 STREET',
  'street_name': '125 STREET',
  'cross_street_1': '135 AVENUE',
  'cross_street_2': '149 AVENUE',
  'address_type': 'ADDRESS',
  'city': 'SOUTH OZONE PARK',
  'status': 'Closed',
  'resolution_action_updated_date': '2022-12-06T12:00:00.000',
  'community_board': '10 QUEENS',
  'bbl': '4118580048',
  'borough': 'QUEENS',
  'x_coordinate_state_plane': '1035695',
  'y_coordinate_state_plane': '183106',
  'open_data_channel_type': 'PHONE',
  'park_facility_name': 'Unspecified',
  'park_borough': 'QUEENS',
  'latitude': '40.66911094248772',
  'longitude': '-73.81455186093125',
  'location': {'latitude': '40.66911094248772',
   'longitude

In [9]:
pd.DataFrame(data)

Unnamed: 0,unique_key,created_date,closed_date,agency,agency_name,complaint_type,descriptor,location_type,incident_zip,incident_address,...,resolution_description,intersection_street_1,intersection_street_2,landmark,bridge_highway_name,bridge_highway_direction,bridge_highway_segment,facility_type,taxi_pick_up_location,taxi_company_borough
0,56175237,2022-12-06T12:00:00.000,2022-12-06T12:00:00.000,DSNY,Department of Sanitation,Derelict Vehicles,Derelict Vehicles,Street,11420,135-28 125 STREET,...,,,,,,,,,,
1,56173592,2022-12-06T12:00:00.000,,DSNY,Department of Sanitation,Derelict Vehicles,Derelict Vehicles,Street,11249,759 WYTHE AVENUE,...,The Department of Sanitation is in the process...,,,,,,,,,
2,56173497,2022-12-06T12:00:00.000,,DSNY,Department of Sanitation,Derelict Vehicles,Derelict Vehicles,Street,11373,41-19 WARREN STREET,...,The Department of Sanitation is in the process...,,,,,,,,,
3,56174756,2022-12-06T12:00:00.000,,DSNY,Department of Sanitation,Derelict Vehicles,Derelict Vehicles,Street,11249,759 WYTHE AVENUE,...,The Department of Sanitation is in the process...,,,,,,,,,
4,56172714,2022-12-06T03:06:41.000,,EDC,Economic Development Corporation,Noise - Helicopter,Other,Above Address,11213,161 UTICA AVENUE,...,,ST MARKS AVENUE,PROSPECT PLACE,UTICA AVENUE,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,56172994,2022-12-05T21:37:26.000,2022-12-05T22:06:06.000,NYPD,New York City Police Department,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,11372,94-06 34 AVENUE,...,The Police Department responded to the complai...,94 STREET,JUNCTION BOULEVARD,34 AVENUE,,,,,,
996,56171610,2022-12-05T21:37:07.000,,DCA,Department of Consumer Affairs,Consumer Complaint,Garage/Parking Lot,Business,10065,155 EAST 68 STREET,...,,LEXINGTON AVENUE,3 AVENUE,EAST 68 STREET,,,,,,
997,56170925,2022-12-05T21:36:54.000,2022-12-05T21:40:33.000,NYPD,New York City Police Department,Blocked Driveway,No Access,Street/Sidewalk,11221,168 WEIRFIELD STREET,...,Your request can not be processed at this time...,EVERGREEN AVENUE,CENTRAL AVENUE,WEIRFIELD STREET,,,,,,
998,56173213,2022-12-05T21:36:33.000,2022-12-05T21:51:58.000,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,10456,1465 WASHINGTON AVENUE,...,The Police Department responded to the complai...,ST PAULS PLACE,EAST 171 STREET,WASHINGTON AVENUE,,,,,,


Coincidence there were exactly 1,000 results?

### Pagination

- Most APIs limit the number of results returned.
- [Socrata defaults to 1,000.](https://dev.socrata.com/docs/queries/limit.html)
- Need to use a loop with parameters like [`$limit`](https://dev.socrata.com/docs/queries/limit.html)+[`$offset`](https://dev.socrata.com/docs/queries/offset.html) (Socrata) or `page`+`per_page` ([FEC](https://api.open.fec.gov/developers/))
   - [`append()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html) to DataFrame

## Things are going to differ by API

- Endpoints
- Supported parameters
- Response structure
   - [`json_normalize()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html) can help
- Quality of documentation
- Helpfulness of errors
- Size/helpfulness of community

Gotta read and experiment.

## Homework 6

In real/ideal world, start with specific question and find data to answer it:

![project flow](extras/img/projectflow.png)

_Source: [Big Data and Social Science](https://textbook.coleridgeinitiative.org/chap-intro.html#the-structure-of-the-book)_

Data needed often doesn't exist or is hard (or impossible) to find/access

![project flow](extras/img/projectflow_amended.png)

[Homework 6](hw_6.ipynb)

## No homework/resubmissions will be accepted after Wednesday 12/21 at 6:45pm ET

In other words, Homework 6 cannot be late.

## Lecture 6

[The Joys (and Woes) of the Craft of Software Engineering](https://cs.calvin.edu/courses/cs/262/kvlinden/references/brooksJoysAndWoes.html)