# **NYU Wagner - Python Coding for Public Policy**
# Class 5: APIs

## Ways to get data

Method | How it happens | Pros | Cons
--- | :--- | :--- | :---
**Bulk** | Download, someone hands you a flash drive, etc. | Fast, one-time transfer | Can be large
**Scraping** | Data only available through a web site, PDF, or doc | You can turn anything into data | Tedious; fragile
**APIs** | If organization makes one available | Usually allows some filtering; can always pull latest-and-greatest | Requires network connection for every pull; higher barrier to entry (reading documentation); subject to availability and performance of API

## Scraping

Common tools:

- [Beautiful Soup package](https://realpython.com/beautiful-soup-web-scraper-python/)
- [pandas' `read_html()`](https://pandas.pydata.org/docs/user_guide/io.html#html)

In [1]:
import pandas as pd

tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population', match='Rank')
pop = tables[0]
pop

Unnamed: 0,Rank,Country / Dependency,Region,Population,Percentage of the world,Date,Source (official or from the United Nations),Notes
0,–,World,,7941814000,100%,7 Apr 2022,UN projection[1],
1,1,China,Asia,1412600000,,31 Dec 2021,National annual estimate[2],The population figure refers to mainland China...
2,2,India,Asia,1375019925,,7 Apr 2022,National population clock[3],The figure includes the population of Jammu an...
3,3,United States,Americas,332607072,,7 Apr 2022,National population clock[4],The figure includes the 50 states and the Dist...
4,4,Indonesia,Asia[b],272248500,,1 Jul 2021,National annual estimate[5],
...,...,...,...,...,...,...,...,...
237,–,Niue (New Zealand),Oceania,1549,,1 Jul 2021,National annual projection[92],
238,–,Tokelau (New Zealand),Oceania,1501,,1 Jul 2021,National annual projection[92],
239,195,Vatican City,Europe,825,,1 Feb 2019,Monthly national estimate[196],The total population of 825 consisted of 453 r...
240,–,Cocos (Keeling) Islands (Australia),Oceania,573,,30 Jun 2020,National annual estimate[195],


### Data is only available if it's available

## API calls in the wild

1. Go to [Candidates page on fec.gov](https://www.fec.gov/data/candidates/?has_raised_funds=true&is_active_candidate=true).
1. Right click and `Inspect`.
   - [More info about opening Developer Tools in various browsers.](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_are_browser_developer_tools#how_to_open_the_devtools_in_your_browser)
1. Go to the `Network` tab and reload.
1. Filter to `XHR`.
1. Click the API call.

### Parts of a URL

![URL structure](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL/mdn-url-all.png)

[source](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL#basics_anatomy_of_a_url)

For APIs:

- Often split into "base URL" + "endpoint"
- Anchors aren't relevant

### API documentation

[FEC API](https://api.open.fec.gov/developers/)

### Try it out

1. Visit https://www.fec.gov/data/candidates/
1. In the Network tab's request list, right-click the API call.
1. Click `Open in New Tab`.
1. Replace the API key with `DEMO_KEY`.

## API calls from Python

Usually one of two ways:

- A software development kit (SDK)
   - Only if the API provider offers one
   - Abstracts the details away
   - May have limitations
- [The `requests` package](https://docs.python-requests.org/)

In [2]:
import requests

params = {
    'api_key': 'DEMO_KEY',
    'q': 'Jimmy McMillan',
    'sort': '-first_file_date'
}
response = requests.get('https://api.open.fec.gov/v1/candidates/', params=params)
data = response.json()
data

{'api_version': '1.0',
 'pagination': {'page': 1, 'pages': 1, 'per_page': 20, 'count': 2},
 'results': [{'load_date': '2018-02-17T09:16:20+00:00',
   'party_full': 'REPUBLICAN PARTY',
   'has_raised_funds': False,
   'party': 'REP',
   'active_through': 2016,
   'cycles': [2016, 2018],
   'incumbent_challenge_full': 'Open seat',
   'candidate_id': 'P60016805',
   'candidate_inactive': False,
   'federal_funds_flag': False,
   'inactive_election_years': None,
   'election_districts': ['00'],
   'candidate_status': 'N',
   'last_f2_date': '2015-10-13',
   'name': 'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH',
   'office': 'P',
   'state': 'US',
   'incumbent_challenge': 'O',
   'office_full': 'President',
   'first_file_date': '2015-10-13',
   'flags': 'P60016805',
   'district': '00',
   'district_number': 0,
   'last_file_date': '2015-10-13',
   'election_years': [2016]},
  {'load_date': '2021-12-08T06:50:50+00:00',
   'party_full': 'REPUBLICAN PARTY',
   'has_raised_funds': False,
   'part

### Retrieving nested data

In [3]:
data['results'][0]['name']

'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH'

### Reading into a DataFrame

In [4]:
params = {'api_key': 'DEMO_KEY'}
response = requests.get('https://api.open.fec.gov/v1/candidates/', params=params)
data = response.json()

pd.DataFrame(data['results'])

Unnamed: 0,candidate_status,election_years,candidate_inactive,office_full,name,first_file_date,party,incumbent_challenge_full,federal_funds_flag,last_file_date,...,candidate_id,load_date,office,district_number,incumbent_challenge,election_districts,party_full,district,cycles,flags
0,N,[2024],False,President,"12-INCH COCK PLEASE, CAN YOU SUCK MY",2021-04-28,PAF,,False,2021-05-05,...,P40006033,2021-12-08T06:50:50+00:00,P,0,,[00],PEACE AND FREEDOM,0,[2022],P40006033
1,N,[2020],False,President,"753, JO",2019-04-23,NNE,Challenger,False,2019-04-23,...,P00011569,2021-12-08T06:50:50+00:00,P,0,C,[00],NONE,0,"[2020, 2022]",P00011569
2,N,[2004],False,President,"AABBATTE, MICHAEL THOMAS WITORT",2002-01-30,IND,Challenger,False,2002-01-30,...,P40002172,2002-04-12T00:00:00+00:00,P,0,C,[00],INDEPENDENT,0,"[2002, 2004]",P40002172
3,C,[2022],False,Senate,"AADLAND, ERIK",2021-06-04,REP,Challenger,False,2021-06-04,...,S2CO00175,2021-12-08T06:50:51+00:00,S,0,C,[00],REPUBLICAN PARTY,0,[2022],S2CO00175
4,N,[2022],False,House,"AADLAND, ERIK",2021-12-27,REP,Open seat,False,2021-12-27,...,H2CO07170,2022-01-13T01:40:22+00:00,H,7,O,[07],REPUBLICAN PARTY,7,[2022],H2CO07170
5,N,[2022],False,House,"AALDERS, TIM",2020-03-24,REP,Challenger,False,2022-03-21,...,H2UT03280,2022-03-23T21:10:32+00:00,H,3,C,[03],REPUBLICAN PARTY,3,[2022],H2UT03280
6,P,"[2012, 2018]",False,Senate,"AALDERS, TIMOTHY NOEL",2012-02-08,CON,Open seat,False,2018-04-23,...,S2UT00229,2019-03-27T16:02:41+00:00,S,0,O,"[00, 00]",CONSTITUTION PARTY,0,"[2012, 2014, 2016, 2018, 2020]",S2UT00229
7,C,[2020],False,House,"AALOORI, BANGAR REDDY",2019-10-17,REP,Open seat,False,2019-10-17,...,H0TX22260,2020-03-18T21:13:37+00:00,H,22,O,[22],REPUBLICAN PARTY,22,[2020],H0TX22260
8,P,"[1976, 1978]",False,House,"AAMODT, NORMAN O.",1976-04-12,REP,,False,1978-07-05,...,H6PA16106,2002-03-30T00:00:00+00:00,H,16,,"[16, 16]",REPUBLICAN PARTY,16,"[1976, 1978, 1980]",H6PA16106
9,P,[2012],False,House,"AANESTAD, SAMUEL",2012-02-22,REP,Challenger,False,2012-02-22,...,H2CA01110,2013-04-26T09:04:30+00:00,H,1,C,[01],REPUBLICAN PARTY,1,"[2012, 2014, 2016]",H2CA01110


## Back to 311 data

From [NYC Open Data Portal dataset page](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/data), click `Export` -> `SODA API` -> `API Docs`.

### Most open data sites have APIs

Often built on platforms that provide them, e.g.

- [NYC Open Data Portal](https://opendata.cityofnewyork.us/) built on [Socrata](https://dev.socrata.com/)
- [data.gov built on CKAN](https://www.data.gov/developers/apis)

### Example: 311 requests from the last week

In [5]:
from datetime import datetime, timedelta

now = datetime.utcnow()
now

datetime.datetime(2022, 4, 7, 22, 16, 9, 584741)

In [6]:
start = now - timedelta(weeks=1)
start

datetime.datetime(2022, 3, 31, 22, 16, 9, 584741)

In [7]:
start.isoformat()

'2022-03-31T22:16:09.584741'

Using the [Socrata query language (SoQL)](https://dev.socrata.com/docs/queries/):

In [8]:
data_id = 'erm2-nwe9'
params = {
    '$where': f"created_date between '{start.isoformat()}' and '{now.isoformat()}'"
}

url = f'https://data.cityofnewyork.us/resource/{data_id}.json'
response = requests.get(url, params=params)
data = response.json()

data

[{'unique_key': '53794758',
  'created_date': '2022-03-31T22:16:30.000',
  'closed_date': '2022-03-31T22:53:46.000',
  'agency': 'NYPD',
  'agency_name': 'New York City Police Department',
  'complaint_type': 'Noise - Residential',
  'descriptor': 'Loud Music/Party',
  'location_type': 'Residential Building/House',
  'incident_zip': '11236',
  'incident_address': '2045 ROCKAWAY PARKWAY',
  'street_name': 'ROCKAWAY PARKWAY',
  'cross_street_1': 'SEAVIEW AVENUE',
  'cross_street_2': 'SKIDMORE AVENUE',
  'intersection_street_1': 'SEAVIEW AVENUE',
  'intersection_street_2': 'SKIDMORE AVENUE',
  'address_type': 'ADDRESS',
  'city': 'BROOKLYN',
  'landmark': 'ROCKAWAY PARKWAY',
  'status': 'Closed',
  'resolution_description': 'The Police Department responded to the complaint and took action to fix the condition.',
  'resolution_action_updated_date': '2022-03-31T22:53:53.000',
  'community_board': '18 BROOKLYN',
  'bbl': '3083290225',
  'borough': 'BROOKLYN',
  'x_coordinate_state_plane': '1

In [9]:
pd.DataFrame(data)

Unnamed: 0,unique_key,created_date,closed_date,agency,agency_name,complaint_type,descriptor,location_type,incident_zip,incident_address,...,park_facility_name,park_borough,latitude,longitude,location,facility_type,taxi_pick_up_location,bridge_highway_name,bridge_highway_direction,bridge_highway_segment
0,53794758,2022-03-31T22:16:30.000,2022-03-31T22:53:46.000,NYPD,New York City Police Department,Noise - Residential,Loud Music/Party,Residential Building/House,11236,2045 ROCKAWAY PARKWAY,...,Unspecified,BROOKLYN,40.63258253067065,-73.88830929722229,"{'latitude': '40.63258253067065', 'longitude':...",,,,,
1,53791502,2022-03-31T22:16:31.000,2022-03-31T22:55:15.000,NYPD,New York City Police Department,Noise - Street/Sidewalk,Loud Music/Party,Street/Sidewalk,10038,1 PLATT STREET,...,Unspecified,MANHATTAN,40.7077665991556,-74.00669796168562,"{'latitude': '40.7077665991556', 'longitude': ...",,,,,
2,53792535,2022-03-31T22:16:40.000,2022-03-31T22:18:23.000,NYPD,New York City Police Department,Noise - Commercial,Loud Music/Party,Store/Commercial,11102,28-10 ASTORIA BOULEVARD,...,Unspecified,QUEENS,40.77072323102476,-73.920566566059,"{'latitude': '40.77072323102476', 'longitude':...",,,,,
3,53797190,2022-03-31T22:16:43.000,2022-03-31T22:55:15.000,NYPD,New York City Police Department,Drug Activity,Use Indoor,Lobby,10026,1350 5 AVENUE,...,Unspecified,MANHATTAN,40.798891506207724,-73.94776239432711,"{'latitude': '40.798891506207724', 'longitude'...",,,,,
4,53798838,2022-03-31T22:16:59.000,2022-03-31T22:43:51.000,NYPD,New York City Police Department,Illegal Parking,Commercial Overnight Parking,Street/Sidewalk,10302,261 TRANTOR PLACE,...,Unspecified,STATEN ISLAND,40.62714905349053,-74.14502652835174,"{'latitude': '40.62714905349053', 'longitude':...",,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,53807141,2022-04-01T03:36:06.000,,HPD,Department of Housing Preservation and Develop...,PAINT/PLASTER,WALL,RESIDENTIAL BUILDING,10451,428 EAST 162 STREET,...,Unspecified,BRONX,40.82450319994843,-73.91262920143505,"{'latitude': '40.82450319994843', 'longitude':...",,,,,
996,53807198,2022-04-01T03:36:06.000,,HPD,Department of Housing Preservation and Develop...,WATER LEAK,HEAVY FLOW,RESIDENTIAL BUILDING,10451,428 EAST 162 STREET,...,Unspecified,BRONX,40.82450319994843,-73.91262920143505,"{'latitude': '40.82450319994843', 'longitude':...",,,,,
997,53806430,2022-04-01T03:37:27.000,2022-04-01T04:06:00.000,DHS,Department of Homeless Services,Homeless Person Assistance,,Subway,,,...,Unspecified,MANHATTAN,40.798627809256374,-73.94161893619888,"{'latitude': '40.798627809256374', 'longitude'...",,,6,,Other
998,53807994,2022-04-01T03:38:30.000,2022-04-01T05:06:12.000,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,11212,23 NEW LOTS AVENUE,...,Unspecified,BROOKLYN,40.6569429914104,-73.90404377512105,"{'latitude': '40.6569429914104', 'longitude': ...",,,,,


Coincidence there were exactly 1,000 results?

### Pagination

- Most APIs limit the number of results returned.
- [Socrata defaults to 1,000.](https://dev.socrata.com/docs/queries/limit.html)
- Need to use a loop with parameters like [`$limit`](https://dev.socrata.com/docs/queries/limit.html)+[`$offset`](https://dev.socrata.com/docs/queries/offset.html) (Socrata) or `page`+`per_page` ([FEC](https://api.open.fec.gov/developers/))
   - [`append()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html) to DataFrame

## Things are going to differ by API

- Endpoints
- Supported parameters
- Response structure
   - [`json_normalize()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html) can help
- Quality of documentation
- Helpfulness of errors
- Size/helpfulness of community

Gotta read and experiment.

## [Homework 5](https://padmgp-4506-spring.rcnyu.org/user-redirect/notebooks/class_materials/hw_5.ipynb)

## Homework 6

In real/ideal world, start with specific question and find data to answer it:

![project flow](extras/img/projectflow.png)

_Source: [Big Data and Social Science](https://textbook.coleridgeinitiative.org/chap-intro.html#the-structure-of-the-book)_

Data needed often doesn't exist or is hard (or impossible) to find/access

![project flow](extras/img/projectflow_amended.png)

[Homework 6](https://padmgp-4506-spring.rcnyu.org/user-redirect/notebooks/class_materials/hw_6.ipynb)

## No homework/resubmissions will be accepted after Sunday 5/15 at 6:45pm ET

In other words, homework 6 cannot be late.

## Lecture 6