<a href="https://colab.research.google.com/github/afeld/python-public-policy/blob/main/lecture_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **NYU Wagner - Python Coding for Public Policy**
# Class 5: APIs

But first, [a Twitter thread showing the power of data analysis](https://twitter.com/kate_ptrv/status/1332398737604431874).

In [1]:
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I couldn’t just walk past this Tweet, so here is some fun <a href="https://twitter.com/hashtag/dataviz?src=hash&amp;ref_src=twsrc%5Etfw">#dataviz</a><br><br>Scented candles: An unexpected victim of the COVID-19 pandemic 1/n <a href="https://t.co/xEmCTQn9sA">https://t.co/xEmCTQn9sA</a> <a href="https://t.co/tVecEiX5Jc">pic.twitter.com/tVecEiX5Jc</a></p>&mdash; Kate Petrova (@kate_ptrv) <a href="https://twitter.com/kate_ptrv/status/1332398737604431874?ref_src=twsrc%5Etfw">November 27, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> 

## Ways to get data

Method | How it happens | Pros | Cons
--- | :--- | :--- | :---
**Bulk** | Download, someone hands you a flash drive, etc. | Fast, one-time transfer | Can be large
**Scraping** | Data only available through a web site, PDF, or doc | You can turn anything into data | Tedious; fragile
**APIs** | If organization makes one available | Usually allows some filtering; can always pull latest-and-greatest | Requires network connection for every pull; higher barrier to entry (reading documentation); subject to availability and performance of API

### Scraping

Common tools:

- [Beautiful Soup package](https://realpython.com/beautiful-soup-web-scraper-python/)
- [pandas' `read_html()`](https://pandas.pydata.org/docs/user_guide/io.html#html)

In [2]:
import pandas as pd

tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population', match='Rank')
pop = tables[0]
pop

Unnamed: 0,Rank,Country(or dependent territory),Population,% of world,Date,Source(official or United Nations)
0,1,China[b],1407722000,,29 Apr 2021,National population clock[3]
1,2,India[c],1376318536,,29 Apr 2021,National population clock[4]
2,3,United States[d],331449281,,1 Apr 2020,2020 census result[5]
3,4,Indonesia,271350000,,31 Dec 2020,National annual estimate[6]
4,5,Pakistan[e],225200000,,1 Jul 2021,UN projection[2]
...,...,...,...,...,...,...
237,–,Tokelau (NZ),1501,,1 Jul 2021,National annual projection[92]
238,195,Vatican City[ab],825,,1 Feb 2019,Monthly national estimate[197]
239,–,Cocos (Keeling) Islands (Australia),573,,30 Jun 2020,National annual estimate[196]
240,–,Pitcairn Islands (UK),40,,1 Jan 2021,National annual estimate[198]


## Data is only available if it's available

## API calls in the wild

1. Go to [Candidates page on fec.gov](https://www.fec.gov/data/candidates/?has_raised_funds=true&is_active_candidate=true).
1. Right click and `Inspect`.
   - [More info about opening Developer Tools in various browsers.](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_are_browser_developer_tools#how_to_open_the_devtools_in_your_browser)
1. Go to the `Network` tab and reload.
1. Filter to `XHR`.
1. Click the API call.

### Parts of a URL

![URL structure](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL/mdn-url-all.png)

[source](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL#basics_anatomy_of_a_url)

For APIs:

- Often split into "base URL" + "endpoint"
- Anchors aren't relevant

### API documentation

[FEC API](https://api.open.fec.gov/developers/)

### Try it out

1. In the Network tab's request list, right-click the API call.
1. Click `Open in New Tab`.
1. Replace the API key with `DEMO_KEY`.

## API calls from Python

Usually one of two ways:

- A software development kit (SDK)
   - Only if the API provider offers one
   - Abstracts the details away
   - May have limitations
- [The `requests` package](https://docs.python-requests.org/)

In [3]:
import requests

params = {
    'api_key': 'DEMO_KEY',
    'q': 'Jimmy McMillan',
    'sort': '-first_file_date'
}
response = requests.get('https://api.open.fec.gov/v1/candidates/', params=params)
data = response.json()
data

{'api_version': '1.0',
 'pagination': {'count': 2, 'page': 1, 'pages': 1, 'per_page': 20},
 'results': [{'district': '00',
   'state': 'US',
   'election_years': [2016],
   'load_date': '2018-02-17T09:16:20+00:00',
   'has_raised_funds': False,
   'name': 'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH',
   'office_full': 'President',
   'first_file_date': '2015-10-13',
   'last_f2_date': '2015-10-13',
   'last_file_date': '2015-10-13',
   'candidate_status': 'N',
   'candidate_id': 'P60016805',
   'federal_funds_flag': False,
   'candidate_inactive': False,
   'party_full': 'REPUBLICAN PARTY',
   'district_number': 0,
   'incumbent_challenge_full': 'Open seat',
   'active_through': 2016,
   'cycles': [2016, 2018],
   'inactive_election_years': None,
   'flags': 'P60016805',
   'party': 'REP',
   'election_districts': ['00'],
   'incumbent_challenge': 'O',
   'office': 'P'},
  {'district': '00',
   'state': 'US',
   'election_years': [1996, 2012],
   'load_date': '2021-04-07T08:02:01+00:00',
 

### Retrieving nested data

In [4]:
data['results'][0]['name']

'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH'

### Reading into a DataFrame

In [5]:
params = {'api_key': 'DEMO_KEY'}
response = requests.get('https://api.open.fec.gov/v1/candidates/', params=params)
data = response.json()

pd.DataFrame(data['results'])

Unnamed: 0,last_f2_date,party_full,cycles,election_years,last_file_date,candidate_inactive,first_file_date,name,candidate_id,district_number,...,active_through,inactive_election_years,election_districts,load_date,office,incumbent_challenge_full,office_full,candidate_status,has_raised_funds,federal_funds_flag
0,2019-04-23,NONE,"[2020, 2022]",[2020],2019-04-23,False,2019-04-23,"753, JO",P00011569,0,...,2020,,[00],2021-04-07T08:02:01+00:00,P,Challenger,President,N,False,False
1,2002-01-30,INDEPENDENT,"[2002, 2004]",[2004],2002-01-30,False,2002-01-30,"AABBATTE, MICHAEL THOMAS WITORT",P40002172,0,...,2004,,[00],2002-04-12T00:00:00+00:00,P,Challenger,President,N,False,False
2,2020-03-24,REPUBLICAN PARTY,[2020],[2020],2020-03-24,False,2020-03-24,"AADLER, TIM",H0UT03227,3,...,2020,,[03],2020-05-05T21:11:57+00:00,H,Challenger,House,C,True,False
3,,INDEPENDENT AMERICAN PARTY,[2014],[2014],,False,,"AALDERS, TIM",H4UT04052,4,...,2014,,[04],2014-03-22T21:40:34+00:00,H,Open seat,House,N,False,False
4,2018-04-23,CONSTITUTION PARTY,"[2012, 2014, 2016, 2018, 2020]","[2012, 2018]",2018-04-23,False,2012-02-08,"AALDERS, TIMOTHY NOEL",S2UT00229,0,...,2018,,"[00, 00]",2019-03-27T16:02:41+00:00,S,Open seat,Senate,P,True,False
5,2019-10-17,REPUBLICAN PARTY,[2020],[2020],2019-10-17,False,2019-10-17,"AALOORI, BANGAR REDDY",H0TX22260,22,...,2020,,[22],2020-03-18T21:13:37+00:00,H,Open seat,House,C,True,False
6,1978-07-05,REPUBLICAN PARTY,"[1976, 1978, 1980]","[1976, 1978]",1978-07-05,False,1976-04-12,"AAMODT, NORMAN O.",H6PA16106,16,...,1978,,"[16, 16]",2002-03-30T00:00:00+00:00,H,,House,P,True,False
7,2012-02-22,REPUBLICAN PARTY,"[2012, 2014, 2016]",[2012],2012-02-22,False,2012-02-22,"AANESTAD, SAMUEL",H2CA01110,1,...,2012,,[01],2013-04-26T09:04:30+00:00,H,Challenger,House,P,True,False
8,2017-04-26,DEMOCRATIC PARTY,[2018],[2018],2017-04-26,False,2017-04-26,"AARESTAD, DAVID",H8CO06237,6,...,2018,,[06],2017-08-01T20:57:28+00:00,H,Challenger,House,C,True,False
9,,LIBERTARIAN PARTY,[1998],[1998],,False,,"AAROE, KEN",H8CA18053,18,...,1998,,[18],2002-04-03T00:00:00+00:00,H,Challenger,House,N,False,False


## Back to 311 data

From [NYC Open Data Portal dataset page](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/data), click `Export` -> `SODA API` -> `API Docs`.

### Most open data sites have APIs

Often built on platforms that provide them, e.g.

- [NYC Open Data Portal](https://opendata.cityofnewyork.us/) built on [Socrata](https://dev.socrata.com/)
- [data.gov built on CKAN](https://www.data.gov/developers/apis)

In [6]:
from datetime import datetime, timedelta

now = datetime.utcnow()
now

datetime.datetime(2021, 4, 29, 7, 28, 59, 899266)

In [7]:
start = now - timedelta(weeks=1)
start

datetime.datetime(2021, 4, 22, 7, 28, 59, 899266)

In [10]:
start.isoformat()

'2021-04-22T07:28:59.899266'

311 requests from past week, using the [Socrata query language (SoQL)](https://dev.socrata.com/docs/queries/)

In [8]:
data_id = 'erm2-nwe9'
params = {
    '$where': f"created_date between '{start.isoformat()}' and '{now.isoformat()}'"
}

url = f'https://data.cityofnewyork.us/resource/{data_id}.json'
response = requests.get(url, params=params)
data = response.json()

data

[{'unique_key': '50349126',
  'created_date': '2021-04-22T07:29:00.000',
  'closed_date': '2021-04-22T08:11:56.000',
  'agency': 'NYPD',
  'agency_name': 'New York City Police Department',
  'complaint_type': 'Noise - Residential',
  'descriptor': 'Banging/Pounding',
  'location_type': 'Residential Building/House',
  'incident_zip': '11209',
  'incident_address': '537 OVINGTON AVENUE',
  'street_name': 'OVINGTON AVENUE',
  'cross_street_1': '5 AVENUE',
  'cross_street_2': '6 AVENUE',
  'intersection_street_1': '5 AVENUE',
  'intersection_street_2': '6 AVENUE',
  'city': 'BROOKLYN',
  'landmark': 'OVINGTON AVENUE',
  'status': 'Closed',
  'resolution_description': 'The Police Department responded to the complaint and determined that police action was not necessary.',
  'resolution_action_updated_date': '2021-04-22T12:12:00.000',
  'community_board': '10 BROOKLYN',
  'bbl': '3058740072',
  'borough': 'BROOKLYN',
  'x_coordinate_state_plane': '978808',
  'y_coordinate_state_plane': '16945

In [9]:
pd.DataFrame(data)

Unnamed: 0,unique_key,created_date,closed_date,agency,agency_name,complaint_type,descriptor,location_type,incident_zip,incident_address,...,latitude,longitude,location,address_type,facility_type,taxi_pick_up_location,bridge_highway_name,bridge_highway_direction,road_ramp,bridge_highway_segment
0,50349126,2021-04-22T07:29:00.000,2021-04-22T08:11:56.000,NYPD,New York City Police Department,Noise - Residential,Banging/Pounding,Residential Building/House,11209,537 OVINGTON AVENUE,...,40.63177873278014,-74.01960624896257,"{'latitude': '40.63177873278014', 'longitude':...",,,,,,,
1,50344407,2021-04-22T07:29:29.000,2021-04-22T08:11:10.000,NYPD,New York City Police Department,Illegal Parking,Posted Parking Sign Violation,Street/Sidewalk,10029,315 EAST 113 STREET,...,40.79475884397608,-73.93846219115683,"{'latitude': '40.79475884397608', 'longitude':...",,,,,,,
2,50347091,2021-04-22T07:29:30.000,,HPD,Department of Housing Preservation and Develop...,ELECTRIC,LIGHTING,RESIDENTIAL BUILDING,11208,2 ELTON STREET,...,40.683597158171494,-73.88561702648977,"{'latitude': '40.683597158171494', 'longitude'...",ADDRESS,,,,,,
3,50347668,2021-04-22T07:29:30.000,2021-04-22T08:26:37.000,NYPD,New York City Police Department,Blocked Driveway,No Access,Street/Sidewalk,11369,23-31 JACKSON MILL ROAD,...,40.7671183392016,-73.8758815739708,"{'latitude': '40.7671183392016', 'longitude': ...",,,,,,,
4,50342786,2021-04-22T07:29:30.000,2021-04-24T14:37:10.000,HPD,Department of Housing Preservation and Develop...,PAINT/PLASTER,CEILING,RESIDENTIAL BUILDING,10465,2668 MILES AVENUE,...,40.81596653565409,-73.82520125032069,"{'latitude': '40.81596653565409', 'longitude':...",ADDRESS,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,50378858,2021-04-22T09:52:31.000,,DOT,Department of Transportation,Street Condition,Pothole,,11205,109 WALWORTH STREET,...,40.695653370442685,-73.95446790347272,"{'latitude': '40.695653370442685', 'longitude'...",ADDRESS,,,,,,
996,50349766,2021-04-22T09:53:00.000,,DOT,Department of Transportation,Street Light Condition,Street Light Out,,11357,,...,40.7841952512934,-73.80735233224561,"{'latitude': '40.7841952512934', 'longitude': ...",INTERSECTION,,,,,,
997,50345687,2021-04-22T09:53:06.000,2021-04-22T20:25:14.000,HPD,Department of Housing Preservation and Develop...,HEAT/HOT WATER,ENTIRE BUILDING,RESIDENTIAL BUILDING,10031,509 WEST 135 STREET,...,40.81918540590982,-73.95276841145517,"{'latitude': '40.81918540590982', 'longitude':...",ADDRESS,,,,,,
998,50344136,2021-04-22T09:53:28.000,2021-04-22T11:14:18.000,NYPD,New York City Police Department,Illegal Parking,Double Parked Blocking Traffic,,11233,383 MACON STREET,...,40.6826180852291,-73.93633386408524,"{'latitude': '40.6826180852291', 'longitude': ...",,,,,,,


Coincidence there were exactly 1,000 results?

## Pagination

- Most APIs limit the number of results returned.
- [Socrata defaults to 1,000.](https://dev.socrata.com/docs/queries/limit.html)
- Need to use a loop with parameters like [`$limit`](https://dev.socrata.com/docs/queries/limit.html)+[`$offset`](https://dev.socrata.com/docs/queries/offset.html) (Socrata) or `page`+`per_page` ([FEC](https://api.open.fec.gov/developers/))
   - [`append()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html) to DataFrame

## Things are going to differ by API

- Endpoints
- Supported parameters
- Response structure
   - [`json_normalize()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html) can help
- Quality of documentation
- Helpfulness of errors
- Size/helpfulness of community

Gotta read and experiment.

## [Homework 5 + 6](https://colab.research.google.com/github/afeld/python-public-policy/blob/main/hw_5.ipynb)

## Lecture 6