---
title: Lesson 7. Crossref
format:
  html:
    toc: true
    toc-expand: 2
    toc-title: CONTENTS
---

Crossref is a nonprofit organization that manages a registry of Digital Object Identifiers (DOIs). Publishers collaborate with Crossref to assign a unique DOI to each journal article, book, conference paper, or dataset they publish. This DOI acts like a permanent web address, enabling seamless linking between references, citations, research outputs, funding information, and more.

The __[Crossref REST API]( https://www.crossref.org/documentation/retrieve-metadata/rest-api/)__ offers free access to the nonprofit’s metadata. This tutorial introduces two useful tools: JSON, a simple data format that resembles Python dictionaries and is easy to read and use, and Python’s built-in `logging` module.

## Data skills | concepts
- APIs
- logging
- JSON data

## Learning objectives
1. Interpret documentation and apply concepts to write functional code.
2. Extract and work with JSON data using Python’s built-in tools.
3. Use Python’s logging module to capture and report errors that interrupt code execution.

This tutorial is designed to support multi-session __[workshops](https://library.osu.edu/events?combine=&tid=All&field_location_code_value=10&sort_bef_combine=field_end_date_value_ASC)__ hosted by The Ohio State University Libraries Research Commons. It assumes you already have a basic understanding of Python, including how to iterate through lists and dictionaries to extract data using a for loop. To learn basic Python concepts visit the [Python - Mastering the Basics](python_basics.ipynb) tutorial.

# LESSON 7

## Crossref

Crossref provides detailed __[documentation](https://www.crossref.org/documentation/retrieve-metadata/rest-api/)__ and a wide range of robust __[learning resources](https://www.crossref.org/learning/)__ to help users effectively work with its REST API.

## JSON
Crossref queries return data in JSON format, which is easy to read and looks similar to Python dictionaries. You can work with JSON data by looping through its key-value pairs to access the information you need.

<div class="accordion" id="accordionExercise1">

  <div class="accordion-item"><h2 class="accordion-header" id="ex1-headingOne"><button class="accordion-button fs-3" type="button" data-bs-toggle="collapse" data-bs-target="#ex1-collapseOne" aria-expanded="true" aria-controls="ex1-collapseOne"><img src="images/guidepost_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Exercise 1: Crossref API</button></h2><div id="ex1-collapseOne" class="accordion-collapse collapse show fs-4" aria-labelledby="ex1-headingOne" data-bs-parent="#accordionExercise1"> <div class="accordion-body fs-4"><p>Read through the <a href="https://www.crossref.org/documentation/retrieve-metadata/rest-api/">Crossref REST API documentation</a>. Then ...</p><ol><li>Read `data/dois.csv` into a Pandas DataFrame</li><li>Use the **Crossref works API** to gather the following fields for each DOI:<ul><li><span class="text-primary">publisher</span></li><li><span class="text-primary">article_title</span></li><li><span class="text-primary">journal_title</span></li><li><span class="text-primary">journal_abbr</span></li><li><span class="text-primary">year</span></li><li><span class="text-primary">reference count</span></li></ul></li></ol>
  </div></div>
  </div>

  <div class="accordion-item"><h2 class="accordion-header" id="ex1-headingTwo"><button class="accordion-button fs-3 collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#ex1-collapseTwo" aria-expanded="false" aria-controls="ex1-collapseTwo"><img src="images/magnifying_glass_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Solution:</button></h2><div id="ex1-collapseTwo" class="accordion-collapse collapse" aria-labelledby="ex1-headingTwo" data-bs-parent="#accordionExercise1"> <div class="accordion-body">

```python
import requests
import pandas as pd

def lookup(target_doi):
    base_url='https://api.crossref.org/works/'
    url=base_url+target_doi
    response=requests.get(url)
    response.raise_for_status() #Raise an HTTP Error for bad responses
    json_data = response.json() #Parse JSON response
    return json_data

file=pd.read_csv('C:/Users/murphy.465/Documents/GitHub/data_visualization/data/dois.csv')
dois=file.doi.tolist()
results=pd.DataFrame(columns=['doi','publisher','article_title','journal_title','year','reference_count'])

for doi in dois:
    data={}
    response=lookup(doi)
    entry=response['message']
    data['doi']=doi
    data['publisher']=entry['publisher']
    data['article_title']=entry['title'][0]
    data['journal_title']=entry['container-title'][0]
    data['year']=entry['published']['date-parts'][0][0]
    data['reference_count']=entry['reference-count']
    row=pd.DataFrame(data, index=[0])
    results=pd.concat([row,results], axis=0, ignore_index=True)
```
  </div>
  </div>
  </div>

</div>

## Logging 

APIs sometimes return [error codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) which interrupt our program's execution. Logging tells Python how to handle these errors. It can also help to identify issues with your code.

<div class="card border-primary mb-3 p-1" style="max-width: 100%;">
  <div class="card-header" style="font-size: 1.8rem;"><img src="images/idea_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Tip - Copilot</div>
  <div class="card-body"><p>Ask Copilot how to  `handle exceptions in logging module`. Copilot will return code you can modify for your project and provide additional tips.</p><img src="images/microsoft_copilot_icon.svg" alt="">
  </div>
</div>

<div class="accordion" id="accordionExercise2">

  <div class="accordion-item"><h2 class="accordion-header" id="ex2-headingOne"><button class="accordion-button fs-3" type="button" data-bs-toggle="collapse" data-bs-target="#ex2-collapseOne" aria-expanded="true" aria-controls="ex2-collapseOne"><img src="images/guidepost_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Exercise 2: Handling exceptions</button></h2><div id="ex2-collapseOne" class="accordion-collapse collapse show fs-4" aria-labelledby="ex2-headingOne" data-bs-parent="#accordionExercise2"> <div class="accordion-body fs-4">Modify code from Exercise 1 to add a function that logs and handles HTTP Errors for bad responses.</div></div>
  </div>

  <div class="accordion-item"><h2 class="accordion-header" id="ex2-headingTwo"><button class="accordion-button fs-3 collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#ex2-collapseTwo" aria-expanded="false" aria-controls="ex2-collapseTwo"><img src="images/magnifying_glass_standard_icon.png" alt="" aria-hidden="true" style="height: 3rem; vertical-align: middle; margin-right: 0.5rem;">Solution:</button></h2><div id="ex2-collapseTwo" class="accordion-collapse collapse" aria-labelledby="ex2-headingTwo" data-bs-parent="#accordionExercise2"> <div class="accordion-body">




```python
import requests
import pandas as pd
import logging
import time

#  Configure logging
formatstring="%(asctime)s - %(levelname)s - %(message)s"
datestring="%m/%d/%Y %I%M%S %p"
logging.basicConfig(filename="cr_errors_find_dois.log", level=logging.ERROR, format=formatstring, datefmt=datestring)

# Define function to request url and log HTTP errors
def lookup(target_doi):
    try:
        base_url='https://api.crossref.org/works/'
        url=base_url+target_doi
        response=requests.get(url)
        response.raise_for_status() #Raise an HTTP Error for bad responses
        json_data = response.json() #Parse JSON response
        return json_data
    except requests.exceptions.HTTPError as http_err:
        logging.error(f"HTTP Error = {http_err}") # Log the HTTP error
        time.sleep(10)
    except Exception as err:
        logging.error(f"Other error = {err}") #Log any other errors
        time.sleep(10)
        
file=pd.read_csv('C:/Users/murphy.465/Documents/GitHub/data_visualization/data/dois.csv')
dois=file.doi.tolist()
results=pd.DataFrame(columns=['doi','publisher','article_title','journal_title','year','reference_count'])

for doi in dois[0:2]:
    data={}
    response=lookup(doi)
    entry=response['message']
    data['doi']=doi
    data['publisher']=entry['publisher']
    data['article_title']=entry['title'][0]
    data['journal_title']=entry['container-title'][0]
    data['year']=entry['published']['date-parts'][0][0]
    data['reference_count']=entry['reference-count']
    row=pd.DataFrame(data, index=[0])
    results=pd.concat([row,results], axis=0, ignore_index=True)
```
</div>
  </div>
  </div>

</div>
