# APIs with Keys

In [None]:
from requests import get 

import pandas as pd 
import numpy as np

# API Keys
Many times, data providers don't want to provide access to their APIs to just anybody. In order to make sure that they control access and track usage of the API, they might require the use of an API key. An API key is basically like a password that is uniquely associated with your account that you use every time you want to use that API.

# New York Times API
One example of an API that requires a key is the New York Times API. We'll show an example of using the New York Times API to make the API call. We start by navigating the NYT API site so that we can look up instructions on how to access their API.

We need to get an API key from the New York Times first before we can access the API. We can go to their Dev Portal to sign up and get access: https://developer.nytimes.com/apis. You'll need to make an account, then log in. After you have an account, you can access your Apps by clicking on your username at the top right and create an app. Enable the APIs that you want to have access to, and get the key.

After you get the key, create a new text file (I called mine nyt-key.txt) and paste the key into that text file. <b>We want to avoid writing out the key in any documents we share with others</b>, so we're going to keep the key separate and simply read in the key into Python and use it to call the API.



<b style="color:red;"> Question 1: Do the steps described above and assign the API key as a string to nyt_key.</b>

In [None]:
with open('nyt-key.txt', 'r') as f:
    nyt_key = f.readline()

# NYT Archives
After you do this, you can poke around on the API site a bit to get an idea of what data is available and how you might access that data. We'll start with the Archives API, for which the documentation can be found here: https://developer.nytimes.com/docs/archive-product/1/overview. The Archives API can be used to access article metadata (such as headline, byline, article URL, and so on) for a given month. Let's try getting the content for January 2019.

Following the instructions given on their site, we start with the base URL.

In [None]:
base_url = "https://api.nytimes.com/svc/archive/v1/2019/1.json"

In [None]:
r = get(base_url, params= {'api-key':nyt_key})

Now we can check the status code. Remember that code 200 means everything is fine. When we're sending authentication information, a code of 401 will indicate that our request is not authorized. 

In [None]:
r.status_code

We are good to go. Now let's get the content.

In [None]:
json = r.json()  # Convert response to JSON format

<b style="color:red;">Question 2: How many NYT articles were there in January 2019?</b>



<b style="color:red;">Question 3: What are the types of metadata that are available in the data from this API? Show the keys from one article to answer this question.</b>

<b style="color:red;">Question 4: Create a list called `abstracts` that contains the article abstract for each article in `json`.

</b>

## Editing strings

If we wanted to get all of the metadata of articles published in a certain year, or over an extended time period, we would actually need to change the base URL that we were using. That's because the URL as we've defined it contains the year and month hard-coded into it. This might get tedious, so we can instead edit the strings to do this automatically. This way, we are able to, for example, loop through years and months and get the data we want.

We've already discussed editing strings by using the `+` operator to combine them. That is one possibility for how we might approach this.

In [None]:
month = 10
year = 2020

f"https://api.nytimes.com/svc/archive/v1/{year}/{month}.json"


The `f` in front of the string indicates that it is an f-string, and the pieces that we want to replace within the string are included with curly braces. We use the names of the objects we want to put into those places, and the values are then interpolated into the string.

<b style="color:red;">Question 5: Write a function called nyt_api that has two arguments, month and year, and outputs the response from pulling from the NYT Archive API for that month and year.</b>

<b style="color:red;">Question 6: Write a function called nyt_headlines that has two arguments, month and year, and outputs a list of headlines from pulling from the NYT Archive API for that month and year.</b>

## JSON to Pandas DataFrame

If we have nicely formatted JSON data we can often convert it into a more useable pandas data frame with minimal effort by using `json_normalize`

In [None]:
pd.json_normalize(json['response']['docs'])

### Article Search

If you are looking into the New York Times archives, most of the time, you are trying to find articles about a certain topic. That is, you usually don't want to try to sift through all of the articles that the NYT has published. But, you might be interested in how they are covering the election, for example. In that case, you might not want to grab every single article published. Instead, you'd want to do a search on some keywords. To do this, you can use the Article Search API instead.

You can look at the documentation at https://developer.nytimes.com/docs/articlesearch-product/1/overview for more information on how this might work. It is very similar to the Archive API, except we use a slightly different base URL, as well as different parameters. 

In [None]:
article_base = 'https://api.nytimes.com/svc/search/v2/articlesearch.json'

We can specify the keywords using `q` in our parameters. Let's look for articles with the keyword "election".

In [None]:
r = get(article_base, params= {'q':'election','api-key':nyt_key}) 

In [None]:
r.url

In [None]:
response_dict =  r.json()
response_dict.keys()

In [None]:
election_articles = r.json()['response']['docs']
len(election_articles)

In [None]:
election_articles[0]

<b style="color:red;">Question 7: Use the NYT Article Search to look for articles about mental health in January 2024. How many articles were there? How does this compare to January 2014?</b>

Note that the search only returns 10 articles at a time. We can get more using pagination. 

In [None]:
election_parameters = {'q':'election',
                       'page':'1',
                       'api-key':nyt_key}

response_page1 = get(article_base, params= election_parameters).json()
election_articles2 = response_page1['response']['docs']
election_articles2[0]['abstract']

We can also take a look at the meta information to see how many hits we had. Since we are just searching on "election" without any other qualifiers, we would expect to be pretty high.

In [None]:
response_page1['response']['meta']

To narrow our search, we can add filters. For example, you can adjust the begin and end dates of your search to look at specific time periods. Let's take a look at the month of January in 2020. Note that the dates use "YYYYMMDD" formatting. So, January 1, 2020 will be `20200101`. 

In [None]:
election_parameters = {'q':'election',
                       'begin_date':'20200101',
                       'end_date':'20200201',
                       'api-key':nyt_key}

response_2020 = get(article_base, params= election_parameters).json()
election_articles3 = response_2020['response']['docs']
election_articles3[0]['web_url']

In [None]:
response_2020['response']['meta']

<font color = 'red'>**Question 7: Use the NYT Article Search to look for articles about mental health in January 2024. How many articles were there? How does this compare to January 2014?**</font>

## Census API

One extremely useful API in social science research is the **Census API**. This API provides access to a wide variety of data sources on demographics and characteristics of people in the US. It contains data from the Decennial Census, but also from many other sources, such as the American Community Survey (ACS). Information about the Census API can be found at: https://www.census.gov/data/developers/data-sets.html.

As with the New York Times API, you will need to request an API key in order to access it. You can request an API key here: https://api.census.gov/data/key_signup.html. You will need to provide your email address and organization (you can just put University of Maryland), and you should get an email with your census key shortly after that. 

After you get your Census API key, save it in a text file like before (I put mine in a file called `census-key.txt`), then read it in.

In [None]:
with open('census-key.txt', 'r') as f:
    census_key = f.readline()

Even within just one data source like the ACS, there are lots of different variables and groupings that you can pull data about. We'll start with the 1-year ACS estimates. Information about this data can be found by navigating to the American Community Survey 1-Year Data page (https://www.census.gov/data/developers/data-sets/acs-1year.html). 

The webpage documentation shows how to access their data as well example code and a list of variables. For example, if you scroll down to the Detailed Tables section, you can find a link to the detailed tables variables (https://api.census.gov/data/2022/acs/acs1/variables.html). The Examples and Supported Geographies page (https://api.census.gov/data/2022/acs/acs1.html) can also be helpful in identifying the data that you want.

To start, let's find something basic: the total number of people in each state. Looking at the variables table, we can see that this is called `B01001_001E` (not very intuitive, I know). Since we want this for every state, we use `state:*` as our `for` parameter. We include `NAME` as a variable we want to get since we want to know what the state names are for each of the counts. Finally, we make sure to include our key.

In [None]:
census_base_url = 'https://api.census.gov/data/2022/acs/acs1'

census_params = {'get':'NAME,B01001_001E', 
                 'for':'state:*',
                 'key':census_key}

r = get(census_base_url, params = census_params)
r.status_code

In [None]:
people_by_state = r.json()
people_by_state

<font color = 'red'>**Question 8: Which states had more than 10,000,000 people in 2022? Create a list that contains the names of these states.**</font>

## Using groups in the Census API

If you look at some of the examples provided, you'll notice that they use the `groups` syntax. For example, on the ACS 1-Year estimates page (https://www.census.gov/data/developers/data-sets/acs-1year.html), you can see an example shown as:

    api.census.gov/data/2022/acs/acs1?get=NAME,group(B01001)&for=us:1&key=YOUR_KEY_GOES_HERE

This grabs all of the variables in that group. Variable groups might be something like all Race categories, or Age categories, or combinations thereof. These are helpful if you want to get every breakdown for a certain characteristic.

In [None]:
census_params = {'get':'NAME,group(B02001)', 
                 'for':'state:*',
                 'key':census_key}

r = get(census_base_url, params = census_params)
r.status_code

In [None]:
r.json()[:2]

There are a lot of variables here! Note that this includes both estimates as well as the margin of error and any annotations about there. The variables ending in E are the estimates and the ones ending in M are the margin of error, with EA and MA representing annotations on those values. For more information about these annotations, see this page: https://www.census.gov/data/developers/data-sets/acs-1year/notes-on-acs-estimate-and-annotation-values.html

<b style="color:red;">Question 9: Find the breakdown of the number of people by race in Maryland in 2014. Create a dictionary with the race category as the key and the counts as integers for the value.</b>