# Crossref API in Mathematica

by Vishank Patel

**Crossref API documentation:** https://api.crossref.org/swagger-ui/index.html

Also see the CrossRef Mathematica documentation: https://reference.wolfram.com/language/ref/service/CrossRef.html

These recipe examples were tested on January 20, 2022.

*From our testing, we have found that the Crossref metadata across publishers and even journals can vary considerably. As a result, it can be easier to work with one journal at a time when using the Crossref API (e.g., particulary when trying to extract selected data from records).*

## 1. Basic Crossref API call

### Establish a connection

In [None]:
crossref = ServiceConnect["CrossRef"]

### Request data from Crossref API

In [None]:
doi = "10.1038/357225a0";
paper = crossref["WorkInformation", "DOI" -> doi];
paper

### Format data output

To present the same data in a nice table format, we can use a postfix called "Dataset".

In [None]:
paper // Dataset

### Select some specific data

In [None]:
paper["Title"]

In [None]:
authorNum = 2; (*Second author*)

(*authorNum variable has been created only for clarity, 
the same output can be achieved without defining the variable.*)

sAuthor = paper["Author"][[authorNum, "Family"]]

In [None]:
paper["Reference"][[1]] // Dataset     
(*Extracting the first reference*)

To see the publishing journal for the paper's first reference, we will first turn the default list output into an "Association".

In [None]:
pAssociation = Association[paper["Reference"][[1]]];
pAssociation["journal-title"]

To do this for all the paper's references,

In [None]:
allRefs = paper["Reference"]; (*returns a list of lists*)
allRefsAssoc = Association @@@ allRefs; (*converts lists at the first level into associations*)
allRefsAssoc[[All, "journal-title"]]

In [None]:
% // Dataset

Similarly, we can also extract the reference years.

In [None]:
allRefsAssoc[[All, "year"]] // Dataset

### Request and select data from multiple sources

Let us define three papers using their DOIs, and the request "WorkInformation".

In [None]:
vortexP = crossref["WorkInformation", "DOI" -> "10.1038/357225a0"];
crisprP = crossref["WorkInformation", "DOI" -> "10.1038/nprot.2013.143"];
globalWarmingP = crossref["WorkInformation", "DOI" -> "10.1038/d41586-018-07586-5"];

Get article titles:

In [None]:
{vortexP[#], crisprP[#], globalWarmingP[#]} &["Title"]
(*Here, # serves as a placeholder, and anything after the & sign is used to replace the #*)

Looking at the publisher, we can see that all of them have been published  by Springer Science and Business Media LLC.

In [None]:
{vortexP[#], crisprP[#], globalWarmingP[#]} &["Publisher"]

Published print date:

In [None]:
{vortexP[#], crisprP[#], globalWarmingP[#]} &["PublishedPrint"] //Dataset

## 2. Acquiring a list of DOIs

Let us extract a list of DOIs by asking for papers from a particular journal from the year 2019 to 2021.
Working with the Journal of Cheminformatics, CrossRef can be queried using the journal's ISSN (International Standard Serial Number), which is 1758-2946.

In [None]:
papers = crossref["WorksDataset", "ISSN" -> "1758-2946","IssuedDate" -> {DateObject[{2019}], DateObject[{2021}]}, MaxItems -> 10];

As we can see below, all the papers are from J Cheminform.

In [None]:
papers[All, "ShortContainerTitle"] //Normal

Extracting their respective DOIs.

In [None]:
doiList = papers[All, "DOI"] //Normal

Note that the number of DOIs can easily be changed by manipulating the MaxItems parameter while querying crossref. If the parameter is not defined, Mathematica sets it to 20. 

## 3. Crossref API call with a loop

Now that we know how to extract a list of DOIs, below are a few ways by which we could operate on them.

In [None]:
listDOIs = {"10.1093/oso/9780198828044.003.0003", "10.1093/oso/9780198714934.003.0006", "10.7551/mitpress/13811.003.0005", "10.1093/oso/9780190941659.003.0001", "10.7551/mitpress/8996.003.0003", "10.1017/9781107338548.009", "10.1002/9781119557500.ch1", "10.7551/mitpress/8996.003.0016", "10.7551/mitpress/13811.003.0004", "10.1002/9781119557500.ch12"};

In [None]:
For[i = 1, i <= Length[listDOIs], i++,
 Print[crossref["WorkInformation", "DOI" -> listDOIs[[i]]]["Title"]]]

{Machine learning with sklearn}
{Statistical machine learning}
{Machine Learning, Statistics, and Data Analytics}
{Why Use Automated Machine Learning?}
{Introduction: Optimization and Machine Learning}
{Adversarial Machine Learning Challenges}
{Introduction to Machine Learning}
{Robust Optimization in Machine Learning}
{Why We Are Interested in Machine Learning}
{Deploying Machine Learning Models}


To check their respective publishers:

In [None]:
For[i = 1, i <= Length[listDOIs], i++,
 Print[crossref["WorkInformation", "DOI" -> listDOIs[[i]]]["Publisher"]]]

Oxford University Press
Oxford University Press
The MIT Press
Oxford University Press
The MIT Press
Cambridge University Press
John Wiley & Sons, Inc.
The MIT Press
The MIT Press
John Wiley & Sons, Inc.


Extracting author last names from the papers:

As a lot of the sources from the "listDOIs"  have missing author information, we will switch to the "doiList" defined in the previous section.

In [None]:
For[i = 1, i <= Length[doiList], i++,
 Print[crossref["WorkInformation", "DOI" -> doiList[[i]]]["Author"][[All, "Family"]]]]

{Thibault, Roe, Facelli, Cheatham}
{Steinberg, Russo, Frey}
{Kuhn, Neumann, Steinbeck, Wittekindt, Zielesny}
{Zhang, Zhang, Li, Wang, Zhang, Hou}
{Kru"ger, Gohlke}
{Rupp, Tkatchenko, Mu"ller, von Lilienfeld}
{Spjuth, Rydberg, Willighagen, Evelo, Jeliazkova}
{Barnard, Downs}
{Mavridis, Mitchell}
{Baumann, Baumann}


## 4. Crossref metadata visualization

Let us try to visualize where research papers related to the subject of Tectonic Plates are published using a word cloud. 
We will start by generating a list of all the publishers.

To achieve that, we loop through a hundred calls to crossref, starting at a random index to get a new paper every time. A pause is also added to not exceed crossref's call limits.

Note: run `Off[DateObject::date]` to suppress the common DateObject error (irrelevant to our current application).

In [None]:
publisherList = {};

For[i = 1, i <= 100, i++,
  {Pause[0.5]; 
   publisherName := crossref["WorksList", "Query" -> "Tectonic Plates", MaxItems -> 1,
       "StartIndex" -> RandomInteger[{1, 10000}]][[1, "Publisher"]];
   AppendTo[publisherList, publisherName]}];

publisherList

In [None]:
WordCloud[publisherList]

Title word cloud for research from a particular professor:

In [None]:
titleList = {};    
For[i = 1, i <= 30, i++,
  {Pause[0.5];
   titleTemp := crossref["WorksList", "Query" -> "Aaleti, S.",
      MaxItems -> 1, "StartIndex" -> i, "SortBy" -> "Published"][[1,"Title"]];
   AppendTo[titleList, titleTemp]    
   }];

In [None]:
Flatten[titleList]

In [None]:
WordCloud[ToString[StringRiffle[Flatten[titleList], ""]]]

We can also see the frequency of publication for the author using a histogram:

In [None]:
DatesList = {};
For[i = 1, i <= 30, i++,
  {Pause[0.5];
   DateTemp := crossref["WorksList", "Query" -> "Aaleti, S.", MaxItems -> 1, 
      "StartIndex" -> i][[1, "Published"]];
   AppendTo[DatesList, DateTemp]    
   }];
DatesList

In [None]:
DateHistogram[Flatten[Values[DatesList], 2], "Year"]