<a href="https://colab.research.google.com/github/afeld/python-public-policy/blob/main/lecture_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **NYU Wagner - Python Coding for Public Policy**
# Class 5: APIs

But first, [a Twitter thread showing the power of data analysis](https://twitter.com/kate_ptrv/status/1332398737604431874).

In [40]:
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I couldn’t just walk past this Tweet, so here is some fun <a href="https://twitter.com/hashtag/dataviz?src=hash&amp;ref_src=twsrc%5Etfw">#dataviz</a><br><br>Scented candles: An unexpected victim of the COVID-19 pandemic 1/n <a href="https://t.co/xEmCTQn9sA">https://t.co/xEmCTQn9sA</a> <a href="https://t.co/tVecEiX5Jc">pic.twitter.com/tVecEiX5Jc</a></p>&mdash; Kate Petrova (@kate_ptrv) <a href="https://twitter.com/kate_ptrv/status/1332398737604431874?ref_src=twsrc%5Etfw">November 27, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> 

- Bulk data vs. scraping vs. APIs
   - BeautifulSoup
   - `from_html()`
- HTTP requests
- Inspect HTTP requests on site
   - [fec.gov](https://www.fec.gov/data/candidates/?has_raised_funds=true&is_active_candidate=true)
   - [Creative Commons search](https://search.creativecommons.org/search?q=puppies)
- Enrich data
   - [Geocode](https://geosearch.planninglabs.nyc/docs/) missing lat/lng
- [Filter 311 data](https://data.cityofnewyork.us/resource/erm2-nwe9.json?$where=created_date%20%3E%20%272021-04-25T00:00:00.000%27)
- JSON to DataFrame
- Plot noise complaints

## Ways to get data

Method | How it happens | Pros | Cons
--- | :--- | :--- | :---
**Bulk** | Download, someone hands you a flash drive, etc. | Fast, one-time transfer | Can be large
**Scraping** | Data only available through a web site, PDF, or doc | You can turn anything into data | Tedious; fragile
**APIs** | If organization makes one available | Usually allows some filtering; can always pull latest-and-greatest | Requires network connection for every pull; higher barrier to entry (reading documentation)

### Scraping

Common tools:

- [Beautiful Soup package](https://realpython.com/beautiful-soup-web-scraper-python/)
- [pandas' `read_html()`](https://pandas.pydata.org/docs/user_guide/io.html#html)

In [52]:
import pandas as pd

tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population', match='Rank')
pop = tables[0]
pop

Unnamed: 0,Rank,Country(or dependent territory),Population,% of world,Date,Source(official or United Nations)
0,1,China[b],1407722000,,29 Apr 2021,National population clock[3]
1,2,India[c],1376318536,,29 Apr 2021,National population clock[4]
2,3,United States[d],331449281,,1 Apr 2020,2020 census result[5]
3,4,Indonesia,271350000,,31 Dec 2020,National annual estimate[6]
4,5,Pakistan[e],225200000,,1 Jul 2021,UN projection[2]
...,...,...,...,...,...,...
237,–,Tokelau (NZ),1501,,1 Jul 2021,National annual projection[92]
238,195,Vatican City[ab],825,,1 Feb 2019,Monthly national estimate[197]
239,–,Cocos (Keeling) Islands (Australia),573,,30 Jun 2020,National annual estimate[196]
240,–,Pitcairn Islands (UK),40,,1 Jan 2021,National annual estimate[198]


## Data is only available if it's available

## API calls in the wild

1. Go to [Candidates page on fec.gov](https://www.fec.gov/data/candidates/?has_raised_funds=true&is_active_candidate=true).
1. Right click and `Inspect`.
   - [More info about opening Developer Tools in various browsers.](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_are_browser_developer_tools#how_to_open_the_devtools_in_your_browser)
1. Go to the `Network` tab and reload.
1. Filter to `XHR`.
1. Click the API call.

### Parts of a URL

![URL structure](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL/mdn-url-all.png)

[source](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL#basics_anatomy_of_a_url)

For APIs:

- Often split into "base URL" + "endpoint"
- Anchors aren't relevant

### API documentation

[FEC API](https://api.open.fec.gov/developers/)

### Try it out

1. In the Network tab's request list, right-click the API call.
1. Click `Open in New Tab`.
1. Replace the API key with `DEMO_KEY`.

## API calls from Python

Usually one of two ways:

- A software development kit (SDK)
   - Only if the API provider offers one
   - Abstracts the details away
   - May have limitations
- [The `requests` package](https://docs.python-requests.org/)

In [61]:
params = {
    'api_key': 'DEMO_KEY',
    'q': 'Jimmy McMillan',
    'sort': '-first_file_date'
}
response = requests.get('https://api.open.fec.gov/v1/candidates/', params=params)
data = response.json()
data

{'api_version': '1.0',
 'pagination': {'count': 2, 'page': 1, 'pages': 1, 'per_page': 20},
 'results': [{'district': '00',
   'state': 'US',
   'election_years': [2016],
   'load_date': '2018-02-17T09:16:20+00:00',
   'has_raised_funds': False,
   'name': 'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH',
   'office_full': 'President',
   'first_file_date': '2015-10-13',
   'last_f2_date': '2015-10-13',
   'last_file_date': '2015-10-13',
   'candidate_status': 'N',
   'candidate_id': 'P60016805',
   'federal_funds_flag': False,
   'candidate_inactive': False,
   'party_full': 'REPUBLICAN PARTY',
   'district_number': 0,
   'incumbent_challenge_full': 'Open seat',
   'active_through': 2016,
   'cycles': [2016, 2018],
   'inactive_election_years': None,
   'flags': 'P60016805',
   'party': 'REP',
   'election_districts': ['00'],
   'incumbent_challenge': 'O',
   'office': 'P'},
  {'district': '00',
   'state': 'US',
   'election_years': [1996, 2012],
   'load_date': '2021-04-07T08:02:01+00:00',
 

In [62]:
data['results'][0]['name']

'MCMILLAN, JIMMY "RENT IS TOO DAMN HIGH'

In [65]:
params = {'api_key': 'DEMO_KEY'}
response = requests.get('https://api.open.fec.gov/v1/candidates/', params=params)
data = response.json()

pd.DataFrame.from_records(data['results'])

Unnamed: 0,party_full,district_number,federal_funds_flag,has_raised_funds,candidate_status,election_districts,candidate_inactive,last_file_date,office_full,inactive_election_years,name,incumbent_challenge,incumbent_challenge_full,first_file_date,active_through,party,cycles,load_date,state,last_f2_date,election_years,district,flags,office,candidate_id
0,NONE,0,False,False,N,[00],False,2019-04-23,President,,"753, JO",C,Challenger,2019-04-23,2020,NNE,"[2020, 2022]",2021-04-07T08:02:01+00:00,US,2019-04-23,[2020],0,P00011569,P,P00011569
1,INDEPENDENT,0,False,False,N,[00],False,2002-01-30,President,,"AABBATTE, MICHAEL THOMAS WITORT",C,Challenger,2002-01-30,2004,IND,"[2002, 2004]",2002-04-12T00:00:00+00:00,US,2002-01-30,[2004],0,P40002172,P,P40002172
2,REPUBLICAN PARTY,3,False,True,C,[03],False,2020-03-24,House,,"AADLER, TIM",C,Challenger,2020-03-24,2020,REP,[2020],2020-05-05T21:11:57+00:00,UT,2020-03-24,[2020],3,H0UT03227,H,H0UT03227
3,INDEPENDENT AMERICAN PARTY,4,False,False,N,[04],False,,House,,"AALDERS, TIM",O,Open seat,,2014,IAP,[2014],2014-03-22T21:40:34+00:00,UT,,[2014],4,H4UT04052,H,H4UT04052
4,CONSTITUTION PARTY,0,False,True,P,"[00, 00]",False,2018-04-23,Senate,,"AALDERS, TIMOTHY NOEL",O,Open seat,2012-02-08,2018,CON,"[2012, 2014, 2016, 2018, 2020]",2019-03-27T16:02:41+00:00,UT,2018-04-23,"[2012, 2018]",0,S2UT00229,S,S2UT00229
5,REPUBLICAN PARTY,22,False,True,C,[22],False,2019-10-17,House,,"AALOORI, BANGAR REDDY",O,Open seat,2019-10-17,2020,REP,[2020],2020-03-18T21:13:37+00:00,TX,2019-10-17,[2020],22,H0TX22260,H,H0TX22260
6,REPUBLICAN PARTY,16,False,True,P,"[16, 16]",False,1978-07-05,House,,"AAMODT, NORMAN O.",,,1976-04-12,1978,REP,"[1976, 1978, 1980]",2002-03-30T00:00:00+00:00,PA,1978-07-05,"[1976, 1978]",16,H6PA16106,H,H6PA16106
7,REPUBLICAN PARTY,1,False,True,P,[01],False,2012-02-22,House,,"AANESTAD, SAMUEL",C,Challenger,2012-02-22,2012,REP,"[2012, 2014, 2016]",2013-04-26T09:04:30+00:00,CA,2012-02-22,[2012],1,H2CA01110,H,H2CA01110
8,DEMOCRATIC PARTY,6,False,True,C,[06],False,2017-04-26,House,,"AARESTAD, DAVID",C,Challenger,2017-04-26,2018,DEM,[2018],2017-08-01T20:57:28+00:00,CO,2017-04-26,[2018],6,H8CO06237,H,H8CO06237
9,LIBERTARIAN PARTY,18,False,False,N,[18],False,,House,,"AAROE, KEN",C,Challenger,,1998,LIB,[1998],2002-04-03T00:00:00+00:00,CA,,[1998],18,H8CA18053,H,H8CA18053


## [Homework 5 + 6](https://colab.research.google.com/github/afeld/python-public-policy/blob/main/hw_5.ipynb)

## Lecture 6