# Crossref API in Python

By Avery Fernandez, Vincent F. Scalfani, and Michael T. Moen

The Crossref API provides metadata about publications, including articles, books, and conference proceedings. This metadata spans items such as author details, journal details, references, and DOIs (Digital Object Identifiers). Working with Crossref allows for programmatic access to bibliographic information and can streamline large-scale metadata retrieval.

Please see the following resources for more information on API usage:
- Documentation
    - <a href="https://api.crossref.org/swagger-ui/index.html" target="_blank">Crossref API Documentation</a>
    - <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/a-non-technical-introduction-to-our-api/" target="_blank">Crossref API Introduction</a>
    - <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/text-and-data-mining/" target="_blank">Crossref Data Mining</a>
    - <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/text-and-data-mining-for-members/" target="_blank">Crossref Data Mining for Members</a>
    - <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/text-and-data-mining-for-researchers/" target="_blank">Crossref Data Mining for Researchers</a>
    - <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/providing-full-text-links-to-tdm-tools/" target="_blank">Crossref Full-Text Links</a>
- Terms
    - <a href="https://www.crossref.org/membership/terms/" target="_blank">Crossref Terms of Use</a>
- Data Reuse
    - <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/" target="_blank">Crossref Metadata Reuse</a>
    - <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/providing-licensing-information-to-tdm-tools/" target="_blank">Crossref TDM Licensing</a>

**_NOTE:_** The <a href="https://api.crossref.org/swagger-ui/index.html" target="_blank">Crossref API</a> limits requests to a maximum of 50 per second.

*These recipe examples were tested on January 28, 2026.*

_**Note:**_ From our testing, we have found that the Crossref metadata across publishers and even journals can vary considerably. As a result, it can be easier to work with one journal at a time when using the Crossref API (particularly when trying to extract selected data from records).

## Setup

The following external libraries need to be installed into your environment to run the code examples in this tutorial:
- <a href="https://github.com/ipython/ipykernel" target="_blank">ipykernel</a>
- <a href="https://github.com/theskumar/python-dotenv" target="_blank">python-dotenv</a>
- <a href="https://github.com/psf/requests" target="_blank">requests</a>

We import the libraries used in this tutorial below:

In [1]:
from dotenv import load_dotenv
import os
from pprint import pprint
import requests
from time import sleep

### Import Email

It is important to provide an email address when making requests to the Crossref API. This is used to contact you in case of any issues with your requests.

We keep our email in a separate file, a `.env` file, and use the `dotenv` library to access it. If you use this method, create a file named `.env` in the same directory as this notebook and add the following line to it:

```text
CROSSREF_EMAIL=PUT_YOUR_EMAIL_HERE
```

In [2]:
load_dotenv()
try:
    email = os.environ['CROSSREF_EMAIL']
except KeyError:
    print("Email not found in environment. Please set CROSSREF_EMAIL in your .env file.")
else:
    print("Environment and email successfully loaded.")

Environment and email successfully loaded.


## 1. Basic Crossref API Call

In this section, we perform a basic API call to the Crossref service to retrieve metadata for a single DOI.

We will:
1. Build the Crossref endpoint using our base URL, DOI, and the `mailto` parameter.
2. Retrieve the response.
3. Examine and parse the JSON data.

In [3]:
# Base URL for Crossref works
WORKS_URL = "https://api.crossref.org/works/"

# Example DOI to retrieve metadata for
doi = "10.1186/1758-2946-4-12"

response = requests.get(f"{WORKS_URL}{doi}?mailto={email}")

# Status code 200 indicates success
response.status_code

200

This calls the Crossref API to retrieve metadata for a single DOI, but the data is in a JSON format. We can extract the information we need from the call using `.json()`.

In [4]:
data = response.json()

# Print response structure
pprint(data, depth=1)

{'message': {...},
 'message-type': 'work',
 'message-version': '1.0.0',
 'status': 'ok'}


### Extract Data from API Response

In the snippet below, we parse and extract some key fields from the response:
1. **Journal title** via the `container-title` key.
2. **Article title** via the `title` key.
3. **Author names** via the `author` key.
4. **Bibliographic references** via the `reference` key.

In [5]:
# Extract journal title
data["message"]["container-title"]

['Journal of Cheminformatics']

In [6]:
# Extract article title
data["message"]["title"]

['The Molecule Cloud - compact visualization of large collections of molecules']

In [7]:
# Extract author names
for author in data["message"]["author"]:
    print(f"{author["given"]} {author["family"]}")

Peter Ertl
Bernhard Rohde


In [8]:
# Extract the first 75 characters of each reference for demonstration
bib_refs = [ref["unstructured"][:75] for ref in data["message"]["reference"]]
bib_refs

['Martin E, Ertl P, Hunt P, Duca J, Lewis R: Gazing into the crystal ball; th',
 'Langdon SR, Brown N, Blagg J: Scaffold diversity of exemplified medicinal c',
 'Blum LC, Reymond J-C: 970 Million druglike small molecules for virtual scre',
 'Dubois J, Bourg S, Vrain C, Morin-Allory L: Collections of compounds - how ',
 'Medina-Franco JL, Martinez-Mayorga K, Giulianotti MA, Houghten RA, Pinilla ',
 'Schuffenhauer A, Ertl P, Roggo S, Wetzel S, Koch MA, Waldmann H: The Scaffo',
 'Langdon S, Ertl P, Brown N: Bioisosteric replacement and scaffold hopping i',
 'Lipkus AH, Yuan Q, Lucas KA, Funk SA, Bartelt WF, Schenck RJ, Trippe AJ: St',
 'mib 2010.10, Molinspiration Cheminformatics: \n                    http://ww',
 'Bernhard R: Avalon Cheminformatics Toolkit. \n                    http://sou',
 'Wang Y, Bolton E, Dracheva S, Karapetyan K, Shoemaker BA, Suzek TO, Wang J,',
 'Irwin JJ, Shoichet BK: ZINC\u2009−\u2009a free database of commercially available com',
 'Gaulton A, Bellis LJ, Bent

## 2. Crossref API Call with a Loop

In this section, we want to request metadata from multiple DOIs at once. We will:
1. Create a list of several DOIs.
2. Loop through that list, calling the Crossref API for each DOI.
3. Store each response in a new list.
4. Parse specific data, such as article titles and affiliations.

> **Note**: We include a one-second sleep (`time.sleep(1)`) between requests to respect Crossref's <a href="https://api.crossref.org/swagger-ui/index.html" target="_blank">policies</a>. Crossref has usage guidelines that discourage extremely rapid repeated requests. Please also check out Crossref's <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/tips-for-using-public-data-files-and-plus-snapshots/" target="_blank">public data file</a> for bulk downloads.

In [9]:
dois = [
    '10.1021/acsomega.1c03250',
    '10.1021/acsomega.1c05512',
    '10.1021/acsomega.8b01647',
    '10.1021/acsomega.1c04287',
    '10.1021/acsomega.8b01834'
]

# Loop over each DOI, request metadata, and store the data
doi_metadata = []
for doi in dois:
    response = requests.get(f"{WORKS_URL}{doi}?mailto={email}")
    data = response.json()
    doi_metadata.append(data)
    sleep(1)    # Add a short delay to avoid overwhelming the API

In [10]:
# Extract article titles
titles = [article["message"]["title"] for article in doi_metadata]
titles

[['Navigating into the Chemical Space of Monoamine Oxidase Inhibitors by Artificial Intelligence and Cheminformatics Approach'],
 ['Impact of Artificial Intelligence on Compound Discovery, Design, and Synthesis'],
 ['How Precise Are Our Quantitative Structure–Activity Relationship Derived Predictions for New Query Chemicals?'],
 ['Applying Neuromorphic Computing Simulation in Band Gap Prediction and Chemical Reaction Classification'],
 ['QSPR Modeling of the Refractive Index for Diverse Polymers Using 2D Descriptors']]

In [11]:
# Extract author affiliations for each article
for idx, entry in enumerate(doi_metadata):
    authors = entry.get("message", {}).get("author", [])
    print(f"DOI {idx + 1}:")
    for author in authors:
        # Some authors may not have an affiliation key, so we use get with a default
        affiliation_list = author.get("affiliation", [])
        if affiliation_list:
            print(f" - {affiliation_list[0].get("name", "No affiliation name")}")
        else:
            print(" - No affiliation provided")
    print()

DOI 1:
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutical Chemistry and Analysis, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi 682041, India
 - Department of Pharmaceutics and Industrial Pharmacy, College of Pharmacy, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
 - Department of Pharmaceutical Chemistry, College of Phar

## 3. Retrieve Journal Information

Crossref also provides an endpoint to query journal metadata using the **ISSN**. In this section, we:
1. Use the `journals` endpoint.
2. Provide an ISSN.
3. Inspect the returned JSON data.

In [12]:
# Base URL for journal queries
JOURNALS_URL = "https://api.crossref.org/journals/"

# Example ISSN for the journal BMC Bioinformatics
issn = "1471-2105"

response = requests.get(f"{JOURNALS_URL}{issn}?mailto={email}")
data = response.json()

# Print structure of the response message
pprint(data["message"], depth=1)

{'ISSN': [...],
 'breakdowns': {...},
 'counts': {...},
 'coverage': {...},
 'coverage-type': {...},
 'flags': {...},
 'issn-type': [...],
 'last-status-check-time': 1759277629723,
 'publisher': 'Springer (Biomed Central Ltd.)',
 'subjects': [],
 'title': 'BMC Bioinformatics'}


In [13]:
# Extract total number of articles from the journal in Crossref
data["message"]["counts"]["total-dois"]

12831

In [14]:
# Extract percentage of articles from the journal with abstracts in Crossref
data["message"]["coverage"]["abstracts-current"]

0.787422497785651

## 4. Get Article DOIs for a Journal

We can get all article DOIs for a given journal and year range by combining the **journals** endpoint with **filters**.
For example, to retrieve all DOIs for BMC Bioinformatics published in **2014**, we filter between the start date (`from-pub-date`) and end date (`until-pub-date`) of 2014.

> **Note**: By default, the API only returns the first 20 results. We can specify `rows` to increase this up to **1000**. If the total number of results is **greater** than 1000, we can use the `offset` parameter to page through the results in multiple calls.

Below, we demonstrate:
1. Filtering to get only DOIs from 2014.
2. Increasing the `rows` to 700.
3. Pushing beyond the 1000-row limit by using `offset`.

### Retrieve and Display First 20 DOIs

In [15]:
params = {
    "filter": "from-pub-date:2014,until-pub-date:2014",
    "select": "DOI",
    "mailto": email
}
response = requests.get(f"{JOURNALS_URL}{issn}/works", params=params)
doi_data_2014 = response.json()

# Print DOIs from the response
doi_data_2014["message"]["items"]

[{'DOI': '10.1186/1471-2105-15-38'},
 {'DOI': '10.1186/1471-2105-15-s10-p35'},
 {'DOI': '10.1186/1471-2105-15-s10-p24'},
 {'DOI': '10.1186/1471-2105-15-122'},
 {'DOI': '10.1186/1471-2105-15-24'},
 {'DOI': '10.1186/s12859-014-0397-8'},
 {'DOI': '10.1186/1471-2105-15-16'},
 {'DOI': '10.1186/s12859-014-0411-1'},
 {'DOI': '10.1186/1471-2105-15-268'},
 {'DOI': '10.1186/1471-2105-15-119'},
 {'DOI': '10.1186/1471-2105-15-s6-s3'},
 {'DOI': '10.1186/1471-2105-15-310'},
 {'DOI': '10.1186/1471-2105-15-335'},
 {'DOI': '10.1186/1471-2105-15-222'},
 {'DOI': '10.1186/1471-2105-15-337'},
 {'DOI': '10.1186/1471-2105-15-95'},
 {'DOI': '10.1186/1471-2105-15-s9-s12'},
 {'DOI': '10.1186/1471-2105-15-254'},
 {'DOI': '10.1186/1471-2105-15-152'},
 {'DOI': '10.1186/1471-2105-15-333'}]

### Increase Rows to Retrieve More Than 20 DOIs

In [16]:
# Add the rows parameter to increase the number of results
params = {
    "filter": "from-pub-date:2014,until-pub-date:2014",
    "select": "DOI",
    "rows": 700,
    "mailto": email,
}
response = requests.get(f"{JOURNALS_URL}{issn}/works", params=params)
response.raise_for_status()
doi_data_all = response.json()

# Extract the DOIs from the response
dois_list = []
for item in doi_data_all["message"]["items"]:
    dois_list.append(item.get("DOI", "NoDOI"))

print("Number of DOIs retrieved:", len(dois_list))
print("First 20 DOIs:")
pprint(dois_list[:20])

Number of DOIs retrieved: 619
First 20 DOIs:
['10.1186/1471-2105-15-38',
 '10.1186/1471-2105-15-s10-p35',
 '10.1186/1471-2105-15-s10-p24',
 '10.1186/1471-2105-15-122',
 '10.1186/1471-2105-15-24',
 '10.1186/s12859-014-0397-8',
 '10.1186/1471-2105-15-16',
 '10.1186/s12859-014-0411-1',
 '10.1186/1471-2105-15-268',
 '10.1186/1471-2105-15-119',
 '10.1186/1471-2105-15-s6-s3',
 '10.1186/s12859-014-0376-0',
 '10.1186/1471-2105-15-310',
 '10.1186/1471-2105-15-335',
 '10.1186/1471-2105-15-192',
 '10.1186/1471-2105-15-95',
 '10.1186/1471-2105-15-s9-s12',
 '10.1186/1471-2105-15-254',
 '10.1186/1471-2105-15-152',
 '10.1186/1471-2105-15-333']


### Paged Retrieval with Offsets

If we need more than 1000 records, we can combine `rows=1000` with the `offset` parameter. We:
1. Determine the total number of results (`total-results`).
2. Calculate how many loops we need based on 1000 items per page.
3. For each page, we adjust the `offset` by `1000 * n`.
4. Collect all DOIs into one large list.

In [17]:
# First, get total number of results to see if we exceed 1000
params = {
    "filter": "from-pub-date:2014,until-pub-date:2016",
    "select": "DOI",
    "mailto": email,
    "rows": 1000
}
response = requests.get(f"{JOURNALS_URL}{issn}/works", params=params)
initial_data = response.json()

num_results = initial_data["message"].get("total-results", 0)
print("Total results for 2014-2016:", num_results)

Total results for 2014-2016: 1772


In [18]:
# Page through results if more than 1000
journal_dois = []

# Calculate how many pages we need
pages_needed = (num_results // 1000) + 1  # integer division, then add 1 for remainder

for n in range(pages_needed):
    # Build URL using offset
    params = {
        "filter": "from-pub-date:2014,until-pub-date:2016",
        "select": "DOI",
        "rows": 1000,
        "mailto": email,
        "offset": 1000 * n
    }
    response = requests.get(f"{JOURNALS_URL}{issn}/works", params=params)
    response.raise_for_status()
    page_data = response.json()

    items = page_data["message"]["items"]
    for record in items:
        journal_dois.append(record.get("DOI", "NoDOI"))
        
    sleep(1)    # Important to respect Crossref usage guidelines

# Print number of DOIs extracted
len(journal_dois)

1772

In [19]:
# Sample DOIs from 1000-1010
journal_dois[1000:1010]

['10.1186/1471-2105-15-116',
 '10.1186/s12859-016-1178-3',
 '10.1186/1471-2105-15-s12-s9',
 '10.1186/1471-2105-15-316',
 '10.1186/s12859-016-1233-0',
 '10.1186/s12859-015-0656-3',
 '10.1186/s12859-016-1327-8',
 '10.1186/s12859-016-1039-0',
 '10.1186/s12859-016-1035-4',
 '10.1186/s12859-015-0646-5']