---
title: APIs
subtitle: "Application Programming Interfaces"
author:
  - name: Charles Pletcher
    affiliations: Tufts University
    orcid: 0000-0003-2734-5511
    email: charles.pletcher@tufts.edu
license:
  code: MIT
date: 2025-03-31
---

# Application Programming Interfaces

You will hear the term "API" in many different contexts when it comes to computer programming. Already in this class, we have discussed Python's API for the file system.

Often, however, when someone mentions an API, they are referring to a web-based API that is usually accessed over HTTP(S). You might have heard about the kerfuffle when Twitter shut down much of the access to its API, or when Reddit did the same thing a few years earlier. These APIs are servers that provide _interfaces_ (the "I" in "API") to a platform's data.

As you probably noticed while reading @Walker2019, it is not exactly uncommon for references to APIs to become out of date.

Luckily, we can still use the API provided by the [Digital Public Library of America](https://dp.la) for our work for this class.

We'll be working with the Python [Requests](https://docs.python-requests.org/en/latest/) library, which provides its own easy-to-use API for making HTTP requests. In other words, it's APIs all the way down.

## Getting an access token

Generally, APIs will ask that you first obtain a key to use them. Even if APIs offer unlimited requests, it is important for them to require users to supply an API key so that they can track (often anonymized) usage statistics, errors, and so on.

Sometimes, APIs require you to pay, either immediately or after making a certain number of requests. Keys can be used to track usage for payment calculations, too. For an example of this system, see OpenAI's [pricing page](https://openai.com/api/pricing/).

### An API Key for DPLA

For this tutorial, we'll work with the Digital Public Library of America's (DPLA) API. Take a few minutes to read through their [API Basics](https://pro.dp.la/developers/api-basics), then request an API key.

:::{note} Request types

You'll notice that you must submit a `POST` request to receive an API key. `POST` is one of several HTTP verbs. When you enter a URL into a web browser and hit "Enter," you're typically issuing a `GET` request: `GET` requests do not have a request body; they simply ask for the information at the provided URL, perhaps with some query parameters (the `key=value` pairs after a `?` in the URL).

`POST` requests, by contrast, _may_ contain a request body. You've probably submitted `POST` requests without knowing it whenever you sign up for a new service. That's essentially what we're doing with DPLA here, we're just doing it from the command line instead of through an interface that DPLA has built.
:::

The DPLA [documentation](https://pro.dp.la/developers/policies#get-a-key) instructs you to submit a request using `curl`, but we don't have access to `curl` from this notebook. Instead, let's make the request using the Python "Requests" library.

In [17]:
%pip install requests

Note: you may need to restart the kernel to use updated packages.


In [18]:
import requests

my_email = "n9linh@gmail.com"

requests.post(f"https://api.dp.la/v2/api_key/{my_email}")

<Response [409]>

After running the above code cell, you should receive an email with your API code. It's good practice not to share these codes or include them in version control (i.e., git).

Instead, create an account-specific [secret](https://docs.github.com/en/codespaces/managing-your-codespaces/managing-your-account-specific-secrets-for-github-codespaces) by following the instructions provided by GitHub. 

Let's call the secret `DPLA_API_KEY`. (It's conventional to use all caps for environment variables and secrets.)

Make sure to give your fork of this repository access to the secret, and then restart this codespace. We'll be here when you get back.

## Making your first request

As we saw above, making requests using the `requests` library is pretty straightforward — for a `GET` request, we can just pass a URL to `requests.get()`.

In order for the request to be successful, though, we'll need to include the API key in the `api_key` querystring parameter. And to do that, we'll need to use the `os` library in Python.

In [19]:
import os
import requests

DPLA_API_KEY = os.getenv("DPLA_API_KEY")

Let's use the example provided by the DPLA documentation, querying for the term "weasels".

In [20]:

requests.get(f"https://api.dp.la/v2/items?q=weasels&api_key={DPLA_API_KEY}")


<Response [200]>

`<Response [200]>` means that our request was successful, but it doesn't give us a whole lot of information. This is because we have not read the response body. To do so, let's assign the response — which is the return value of `requests.get()` — to a variable and read it as JSON.

In [21]:
response = requests.get(f"https://api.dp.la/v2/items?q=weasels&api_key={DPLA_API_KEY}")

response.json()

{'count': 244,
 'docs': [{'@context': 'http://dp.la/api/items/context',
   '@id': 'http://dp.la/api/items/58952c8df9811303a16d3599f547d2c2',
   '@type': 'ore:Aggregation',
   'aggregatedCHO': '#sourceResource',
   'dataProvider': {'@id': 'http://dp.la/api/contributor/american-philosophical-society',
    'exactMatch': ['http://www.wikidata.org/entity/Q466089'],
    'name': 'American Philosophical Society'},
   'id': '58952c8df9811303a16d3599f547d2c2',
   'ingestDate': '2025-03-25T15:38:08.858Z',
   'ingestType': 'item',
   'isShownAt': 'https://diglib.amphilsoc.org/islandora/object/weasels',
   'object': 'https://diglib.amphilsoc.org/islandora/object/graphics%3A627/datastream/TN/view/Weasels.jpg',
   'originalRecord': {'stringValue': '<record \nxmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n  <header>\n    <identifier>oai:funnel_cake:padig:APS-weasels</identifier>\n    <datestamp>2024-08-12T19:25:29Z</datestamp>\n    <setSpec>Set:dpl

## Reading responses

As you can see above, the response returns a JSON (JavaScript Object Notation) object with a few top-level keys. If you're thinking, "Hm, this JSON looks an awful lot like a Python dictionary," you're absolutely right. While the semantics of Python dictionaries and JSON _are_ different, in this case, the `requests` library has already coerced the raw JSON to a Python dictionary for us. You can access its values like you would with any Python dict:

In [22]:
parsed_response = response.json()

parsed_response['count']

244

:::{important}
Experiment a bit. How, for example, would you get all of the titles in a list?
:::

## Constructing queries

Naturally, when you're working with an API, you'll want to be able to construct your own queries. Above, we hard-coded the value `weasels` under the querystring parameter `q`. But you can use Python's string interpolation to set any value you want. For example

In [23]:
my_query = "foxes"
my_url = f"https://api.dp.la/v2/items?q={my_query}&api_key={DPLA_API_KEY}"

response = requests.get(my_url)

parsed_response = response.json()
parsed_response

{'count': 1575,
 'docs': [{'@context': 'http://dp.la/api/items/context',
   '@id': 'http://dp.la/api/items/d5728fb0ffcd131d28c33cbefd7437d0',
   '@type': 'ore:Aggregation',
   'aggregatedCHO': '#sourceResource',
   'dataProvider': {'@id': 'http://dp.la/api/contributor/denver-public-library',
    'exactMatch': ['http://www.wikidata.org/entity/Q5259775'],
    'name': 'Denver Public Library'},
   'id': 'd5728fb0ffcd131d28c33cbefd7437d0',
   'ingestDate': '2025-01-29T20:03:33.435Z',
   'ingestType': 'item',
   'isShownAt': 'https://digital.denverlibrary.org/nodes/view/1050323',
   'object': 'https://digital.denverlibrary.org/assets/nodeimg/1050323',
   'originalRecord': {'stringValue': '<record \nxmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n  <header>\n    <identifier>\n      oai:YOUR_OAI_PREFIX:DPL:oai:digital.denverlibrary.org:1050323\n    </identifier>\n    <datestamp>2025-01-27T21:49:44Z</datestamp>\n  </header>\n  <metadata>\n   

You could even write a function that puts constructs the request URL and returns the parsed response so that you don't have to do these things manually over and over again.

In [24]:
def make_dpla_request(query: str):
    url = f"https://api.dp.la/v2/items?q={query}&api_key={DPLA_API_KEY}"
    response = requests.get(url)

    return response.json()

make_dpla_request('red+foxes')

{'count': 138,
 'docs': [{'@context': 'http://dp.la/api/items/context',
   '@id': 'http://dp.la/api/items/d9dd4dc15414b61176ea5400aae31089',
   '@type': 'ore:Aggregation',
   'aggregatedCHO': '#sourceResource',
   'dataProvider': {'@id': 'http://dp.la/api/contributor/missouri-state-archives-through-missouri-digital-heritage',
    'name': 'Missouri State Archives through Missouri Digital Heritage'},
   'id': 'd9dd4dc15414b61176ea5400aae31089',
   'iiifManifest': 'http://cdm16795.contentdm.oclc.org/iiif/info/p16795coll24/4146/manifest.json',
   'ingestDate': '2025-03-25T17:35:01.813Z',
   'ingestType': 'item',
   'isShownAt': 'http://cdm16795.contentdm.oclc.org/cdm/ref/collection/p16795coll24/id/4146',
   'object': 'http://cdm16795.contentdm.oclc.org/utils/getthumbnail/collection/p16795coll24/id/4146',
   'originalRecord': {'stringValue': '{\n  "@context" : "http://dp.la/api/items/context",\n  "isShownAt" : "http://cdm16795.contentdm.oclc.org/cdm/ref/collection/p16795coll24/id/4146",\n  

There's a problem with this code, however. What happens if you try to make a request with a query that contains spaces, such as `"red foxes"`?

Can you find the appropriate workaround using the documentation? https://pro.dp.la/developers/requests

What other features does this API support?

## RESTful APIs

Many APIs, including the DPLA's, are built on RESTful principles. REST stands for **Re**presentational **S**tate **T**ransfer. In terms of web APIs, REST means that a given server will respond with a representation of the data that it has available, and that representation will contain additional information for manipulating the data or requesting further data.

Although it is not, strictly speaking, a requirement of REST APIs, many REST implementations use a predictable URL scheme.

For example, you might find a list of "collections" at the `/collections` endpoint. To request a specific collection, you would append its ID — e.g., for Collection 3, `/collections/3`.

Each collection might contain items, so to get a list of items in Collection 3 you could send a request to `/collections/3/items`. And then to get a specific item in that collection — you guessed it, `/collections/3/items/12`.

DPLA does _not_ implement this kind of schema, and instead relies on facets and other search parameters. But it is worth being aware of such schemes if you want to use other APIs in your work and research.

## Readings

- @Walker2019
- @Matthes2023 [chs. 15–17]

## Homework

Design and test an experiment using the data from a publicly available API, such as the [Digital Public Library of America](https://pro.dp.la/developers) or [Chronicling America](https://chroniclingamerica.loc.gov/about/api/) — you can also use another data source, just run it by me first.

In your report, be sure to discuss your research question, hypothesis, methods, results, and conclusion — in other words, walk the reader through the full scientific process.

These experiments need not be large — think of a small, answerable question that you could tackle in the space of 4 hours of work (i.e., the amount of outside work generally expected for each lab).

#### Research question:
How did public interest in electricity evolve in the United States from 1880 to 1920, as reflected in newspaper coverage?

#### Hypothesis:
The number of newspaper articles mentioning "electricity" increased significantly between 1880 and 1920, as there are more use of electric power and related technologies (e.g., lighting, streetcars).

#### Methods:
1. Data source: [Chronicling America](https://chroniclingamerica.loc.gov/about/api/)

2. Search Parameters
- Search term: "electricity"
- Date range: 1880–1920

3. Tools
- Using data from [Chronicling America](https://chroniclingamerica.loc.gov/about/api/)


#### Results:
There are less articles shown up on the dataframe than I expected. But it still reflects that there are more articles containing the term 'electricity' in the later year than it is in the early 1880s

In [67]:
import json

# Prepare terms for our query
qs = 'electricity'
start_date = '1880-01-01'
end_date = '1920-12-31'
lang = 'English'

query_result = requests.post("https://www.loc.gov/collections/chronicling-america/?dl=page&end_date={end_date}&qs={qs}&start_date={start_date}&fo=json")

# Write data into a file
# This step meant to prevent exceeding the API request limit
with open("data.json", "w") as f:
    json.dump(query_result.json(), f, indent=2)


In [68]:
# read data from file
with open("data.json", "r") as f:
    data = json.load(f)

data

{'advanced_search': True,
 'aka': ['hnews', 'http://lccn.loc.gov/2007618519', 'ndnp', 'chronam'],
 'breadcrumbs': [{'Library of Congress': 'https://www.loc.gov'},
  {'Digital Collections': 'https://www.loc.gov/collections/'},
  {'Chronicling America': 'https://www.loc.gov/collections/chronicling-america/'}],
 'browse': {'advanced_search': True,
  'advanced_search_dropdowns_per_row': 4,
  'coverage_dates': {'end': {'day': 31, 'month': 11, 'year': 1963},
   'start': {'day': 3, 'month': 8, 'year': 1736}},
  'default_state': 'advanced',
  'display_options': {'options': [{'field': 'all', 'label': 'All'},
    {'field': 'title', 'label': 'Titles', 'type': 'title'},
    {'field': 'issue', 'label': 'Issues', 'type': 'issue'},
    {'field': 'page', 'label': 'Pages (Full Text)', 'type': 'segment'}]},
  'facets': {'include': [{'field': 'digitized'},
    {'field': 'object-type', 'label': 'Display Level'},
    {'field': 'original-format'},
    {'field': 'partof_title', 'label': 'Title'},
    {'field

I want to write data into a json file as I thought it would be easier for me to investigate the data fields. But turned out it got me more confused 

In [66]:
data.keys()

dict_keys(['advanced_search', 'aka', 'breadcrumbs', 'browse', 'categories', 'collection_titles', 'content', 'content_is_post', 'digitized', 'expert_resources', 'facet_trail', 'facet_views', 'facets', 'featured_items', 'form_facets', 'inherit_from_portal', 'legacy-url', 'next', 'next_sibling', 'options', 'original_formats', 'pages', 'pagination', 'partof', 'previous', 'previous_sibling', 'research-centers', 'results', 'search', 'shards', 'site_type', 'subjects', 'timestamp', 'title', 'topics', 'total', 'views'])

After struggling for too long, I decided to spend more time reading and following the documentation. These scrips are taken from [this documentation](https://libraryofcongress.github.io/data-exploration/loc.gov%20JSON%20API/Chronicling_America/ChronAm_analyzing_language_location_frequency.html). I only modify it to fit with my research question.

In [69]:
import time
import re
import json
from urllib.request import urlopen
import requests
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import pprint

In [None]:
# Query
'''
qs = 'electricity'
start_date = '1880-01-01'
end_date = '1920-12-31'
lang = 'English'
'''
searchURL = 'https://www.loc.gov/collections/chronicling-america/?dl=page&end_date=1920-12-31&ops=AND&qs=electricity&searchType=advanced&start_date=1880-01-01&fo=json'

# a = requests.post("https://www.loc.gov/collections/chronicling-america/?dl=page&end_date=1920-12-31&ops=AND&qs=electricity&searchType=advanced&start_date=1880-01-01&fo=json")
# a.json()

{'adv_search_query': 'All of these words: [[tag]]"electricity"[[/tag]]',
 'advanced_search': True,
 'aka': ['hnews', 'http://lccn.loc.gov/2007618519', 'ndnp', 'chronam'],
 'breadcrumbs': [{'Library of Congress': 'https://www.loc.gov'},
  {'Digital Collections': 'https://www.loc.gov/collections/'},
  {'Chronicling America': 'https://www.loc.gov/collections/chronicling-america/'}],
 'browse': {'advanced_search': True,
  'advanced_search_dropdowns_per_row': 4,
  'coverage_dates': {'end': {'day': 31, 'month': 11, 'year': 1963},
   'start': {'day': 3, 'month': 8, 'year': 1736}},
  'default_state': 'advanced',
  'display_options': {'options': [{'field': 'all', 'label': 'All'},
    {'field': 'title', 'label': 'Titles', 'type': 'title'},
    {'field': 'issue', 'label': 'Issues', 'type': 'issue'},
    {'field': 'page', 'label': 'Pages (Full Text)', 'type': 'segment'}]},
  'facets': {'include': [{'field': 'digitized'},
    {'field': 'object-type', 'label': 'Display Level'},
    {'field': 'origin

In [74]:
def get_item_ids(url, items=[], conditional='True'):
    # Check that the query URL is not an item or resource link.
    exclude = ["loc.gov/item","loc.gov/resource"]
    if any(string in url for string in exclude):
        raise NameError('Your URL points directly to an item or '
                        'resource page (you can tell because "item" '
                        'or "resource" is in the URL). Please use '
                        'a search URL instead. For example, instead '
                        'of \"https://www.loc.gov/item/2009581123/\", '
                        'try \"https://www.loc.gov/maps/?q=2009581123\". ')

    # request pages of 100 results at a time
    params = {"fo": "json", "c": 100, "at": "results,pagination"}
    call = requests.get(url, params=params)
    # Check that the API request was successful
    if (call.status_code==200) & ('json' in call.headers.get('content-type')):
        data = call.json()
        results = data['results']
        for result in results:
            # Filter out anything that's a colletion or web page
            filter_out = ("collection" in result.get("original_format")) \
                    or ("web page" in result.get("original_format")) \
                    or (eval(conditional)==False)
            if not filter_out:
                # Get the link to the item record
                if result.get("id"):
                    item = result.get("id")
                    # Filter out links to Catalog or other platforms
                    if item.startswith("http://www.loc.gov/resource"):
                      resource = item  # Assign item to resource
                      items.append(resource)
                    if item.startswith("http://www.loc.gov/item"):
                        items.append(item)
        # Repeat the loop on the next page, unless we're on the last page.
        if data["pagination"]["next"] is not None:
            next_url = data["pagination"]["next"]
            get_item_ids(next_url, items, conditional)

        return items
    else:
            print('There was a problem. Try running the cell again, or check your searchURL.')

# Generate a list of records found from performing a query and save these Item IDs. (Create ids_list based on items found in the searchURL result)
ids_list = get_item_ids(searchURL, items=[])

# Add 'fo=json' to the end of each row in ids_list (All individual ids from from the ids_list are now listed in JSON format in new_ids)
ids_list_json = []
for id in ids_list:
  if not id.endswith('&fo=json'):
    id += '&fo=json'
  ids_list_json.append(id)
ids = ids_list_json

print('\nSuccess. Your API Search Query found '+str(len(ids_list_json))+' related newspaper pages.')

There was a problem. Try running the cell again, or check your searchURL.

Success. Your API Search Query found 2000 related newspaper pages.


##### Pull the metadata from the search query in preparation for download into a CSV.

In [75]:
# Create a list of dictionaries to store the item metadata
item_metadata_list = []

# Iterate over the list of item IDs
for item_id in ids_list_json:
  item_response = requests.get(item_id)

  # Check if the API call was successful and Parse the JSON response
  if item_response.status_code == 200:
    # Iterate over the ids_list_json list and extract the relevant metadata from each dictionary.
    item_data = item_response.json()
    if 'location_city' not in item_data['item']:
      continue

    # Extract the relevant item metadata
    Newspaper_Title = item_data['item']['newspaper_title']
    Issue_Date = item_data['item']['date']
    Page = item_data['pagination']['current']
    State = item_data['item']['location_state']
    City = item_data['item']['location_city']
    LCCN = item_data['item']['number_lccn']
    Contributor = item_data['item']['contributor_names']
    Batch = item_data['item']['batch']
    pdf = item_data['resource']['pdf']

    # Add the item metadata to the list
    item_metadata_list.append({
        'Newspaper Title': Newspaper_Title,
        'Issue Date': Issue_Date,
        'Page Number': Page,
        'LCCN': LCCN,
        'City': City,
        'State': State,
        'Contributor': Contributor,
        'Batch': Batch,
        'PDF Link': pdf,
  })

# Change date format to MM-DD-YYYY
for item in item_metadata_list:
  item['Issue Date'] = pd.to_datetime(item['Issue Date']).strftime('%m-%d-%Y')

# Create a Pandas DataFrame from the list of dictionaries
df = pd.DataFrame(item_metadata_list)

print('\nReady to proceed to the next step!')


Ready to proceed to the next step!


In [76]:
# # Add your Local saveTo Location (e.g. C:/Downloads/)
# saveTo = 'output'

# Set File Name. Make sure to rename the file so it doesn't overwrite previous!
filename = 'MetadataFileName'

In [77]:
print('\nSuccess! Please check your saveTo location to see the saved csv file. See Preview Below:\n')

metadata_dataframe = pd.DataFrame(item_metadata_list)
metadata_dataframe.to_csv(filename + '.csv')
metadata_dataframe


Success! Please check your saveTo location to see the saved csv file. See Preview Below:



Unnamed: 0,Newspaper Title,Issue Date,Page Number,LCCN,City,State,Contributor,Batch,PDF Link
0,[Evening journal],04-01-1914,5,[sn85042354],[wilmington],[delaware],"[University of Delaware Library, Newark, DE]",[deu_fairfax_ver01],https://tile.loc.gov/storage-services/service/...
1,[The evening bulletin.],02-25-1895,2,[sn87060190],[maysville],[kentucky],"[University of Kentucky, Lexington, KY]",[kyu_aluminum_ver01],https://tile.loc.gov/storage-services/service/...
2,[Lovelock tribune],01-15-1909,2,[sn86091313],[lovelock],[nevada],[University of Nevada Las Vegas University Lib...,[nvln_jackpot_ver01],https://tile.loc.gov/storage-services/service/...
3,[The Silver state news.],01-09-1909,2,[sn86076356],[winnemucca],[nevada],[University of Nevada Las Vegas University Lib...,[nvln_deeth_ver01],https://tile.loc.gov/storage-services/service/...
4,[El Paso herald],11-26-1910,23,[sn88084272],[el paso],[texas],"[University of North Texas; Denton, TX]",[txdn_belgium_ver01],https://tile.loc.gov/storage-services/service/...
5,[Fort Worth gazette],10-31-1891,8,[sn86071158],[fort worth],[texas],"[University of North Texas; Denton, TX]",[txdn_hotel_ver02],https://tile.loc.gov/storage-services/service/...
6,[El Paso herald],12-02-1911,25,[sn88084272],[el paso],[texas],"[University of North Texas; Denton, TX]",[txdn_canada_ver02],https://tile.loc.gov/storage-services/service/...
7,[The Birmingham age-herald],01-25-1914,24,[sn85038485],[birmingham],[alabama],"[University of Alabama Libraries, Tuscaloosa, AL]",[au_inman_ver01],https://tile.loc.gov/storage-services/service/...
8,[El Paso herald],03-18-1914,11,[sn88084272],[el paso],[texas],"[University of North Texas; Denton, TX]",[txdn_egypt_ver01],https://tile.loc.gov/storage-services/service/...
