<a href="https://colab.research.google.com/github/afeld/python-public-policy/blob/main/lecture_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **NYU Wagner - Python Coding for Public Policy**
# Class 5: APIs

But first, [a Twitter thread showing the power of data analysis](https://twitter.com/kate_ptrv/status/1332398737604431874).

In [40]:
%%html
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I couldn’t just walk past this Tweet, so here is some fun <a href="https://twitter.com/hashtag/dataviz?src=hash&amp;ref_src=twsrc%5Etfw">#dataviz</a><br><br>Scented candles: An unexpected victim of the COVID-19 pandemic 1/n <a href="https://t.co/xEmCTQn9sA">https://t.co/xEmCTQn9sA</a> <a href="https://t.co/tVecEiX5Jc">pic.twitter.com/tVecEiX5Jc</a></p>&mdash; Kate Petrova (@kate_ptrv) <a href="https://twitter.com/kate_ptrv/status/1332398737604431874?ref_src=twsrc%5Etfw">November 27, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> 

- Bulk data vs. scraping vs. APIs
   - BeautifulSoup
   - `from_html()`
- HTTP requests
- Inspect HTTP requests on site
   - [fec.gov](https://www.fec.gov/data/candidates/?has_raised_funds=true&is_active_candidate=true)
   - [Creative Commons search](https://search.creativecommons.org/search?q=puppies)
- Enrich data
   - [Geocode](https://geosearch.planninglabs.nyc/docs/) missing lat/lng
- [Filter 311 data](https://data.cityofnewyork.us/resource/erm2-nwe9.json?$where=created_date%20%3E%20%272021-04-25T00:00:00.000%27)
- JSON to DataFrame
- Plot noise complaints

## Ways to get data

Method | How it happens | Pros | Cons
--- | :--- | :--- | :---
In bulk | Download, someone hands to you, etc. | Fast, one-time transfer | Can be large
Scraping | Data only available through a web site, PDF, or doc | With enough effort, you can turn anything into data | Fragile
APIs | If organization makes one available | Usually allows some filtering; can always pull latest-and-greatest | Requires network connection for every pull; higher barrier to entry (reading documentation)

## Data is only available if it's available

## API calls in the wild

1. Go to [Candidates page on fec.gov](https://www.fec.gov/data/candidates/?has_raised_funds=true&is_active_candidate=true).
1. Right click and `Inspect`.
   - [More info about opening Developer Tools in various browsers.](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_are_browser_developer_tools#how_to_open_the_devtools_in_your_browser)
1. Go to the `Network` tab and reload.
1. Filter to `XHR`.
1. Click the API call.

### Parts of a URL

![URL structure](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL/mdn-url-all.png)

[source](https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL#basics_anatomy_of_a_url)

For APIs:

- Often split into "base URL" + "endpoint"
- Anchors aren't relevant

### API documentation

[FEC API](https://api.open.fec.gov/developers/)

### Try it out

1. In the Network tab's request list, right-click the API call.
1. Click `Open in New Tab`.
1. Replace the API key with `DEMO_KEY`.

## API calls from Python

Usually one of two ways:

- A software development kit (SDK)
- [The `requests` package](https://docs.python-requests.org/)

In [41]:
import requests

response = requests.get('https://api.open.fec.gov/v1/candidates/?api_key=DEMO_KEY&sort_hide_null=false&sort_nulls_last=true&is_active_candidate=true&has_raised_funds=true&sort=-first_file_date&per_page=30&page=1')
response.json()

{'api_version': '1.0',
 'pagination': {'per_page': 30, 'pages': 766, 'page': 1, 'count': 22970},
 'results': [{'office': 'H',
   'candidate_status': 'C',
   'candidate_inactive': False,
   'candidate_id': 'H2TX06343',
   'cycles': [2022],
   'inactive_election_years': None,
   'office_full': 'House',
   'election_districts': ['06'],
   'last_file_date': '2021-04-20',
   'party': 'REP',
   'active_through': 2021,
   'party_full': 'REPUBLICAN PARTY',
   'district': '06',
   'incumbent_challenge': 'O',
   'last_f2_date': '2021-04-20',
   'has_raised_funds': True,
   'name': 'KIM, SERY MS.',
   'flags': 'H2TX06343',
   'federal_funds_flag': False,
   'load_date': '2021-04-26T21:28:01+00:00',
   'district_number': 6,
   'incumbent_challenge_full': 'Open seat',
   'first_file_date': '2021-04-20',
   'state': 'TX',
   'election_years': [2021]},
  {'office': 'H',
   'candidate_status': 'C',
   'candidate_inactive': False,
   'candidate_id': 'H2OH16069',
   'cycles': [2022],
   'inactive_electi

In [42]:
params = {
    'api_key': 'DEMO_KEY',
    'sort_hide_null': False,
    'sort_nulls_last': True,
    'is_active_candidate': True,
    'has_raised_funds': True,
    'sort': '-first_file_date',
    'per_page': 30,
    'page': 1
}
response = requests.get('https://api.open.fec.gov/v1/candidates/', params=params)
response.json()

{'api_version': '1.0',
 'pagination': {'count': 22970, 'per_page': 30, 'page': 1, 'pages': 766},
 'results': [{'name': 'KIM, SERY MS.',
   'party': 'REP',
   'state': 'TX',
   'party_full': 'REPUBLICAN PARTY',
   'office': 'H',
   'office_full': 'House',
   'election_years': [2021],
   'first_file_date': '2021-04-20',
   'inactive_election_years': None,
   'active_through': 2021,
   'candidate_inactive': False,
   'has_raised_funds': True,
   'candidate_status': 'C',
   'flags': 'H2TX06343',
   'incumbent_challenge_full': 'Open seat',
   'election_districts': ['06'],
   'district': '06',
   'load_date': '2021-04-26T21:28:01+00:00',
   'last_f2_date': '2021-04-20',
   'candidate_id': 'H2TX06343',
   'cycles': [2022],
   'district_number': 6,
   'incumbent_challenge': 'O',
   'federal_funds_flag': False,
   'last_file_date': '2021-04-20'},
  {'name': 'SCHULZ, JONAH',
   'party': 'REP',
   'state': 'OH',
   'party_full': 'REPUBLICAN PARTY',
   'office': 'H',
   'office_full': 'House',
   

## [Homework 5 + 6](https://colab.research.google.com/github/afeld/python-public-policy/blob/main/hw_5.ipynb)

## Lecture 6