# Pure API Demonstration: Research Software

These notebooks demonstrate some uses of the API of Elsevier's *Pure* Current Research Information System (CRIS). This notebook demonstrates some requests for research software.

Research Software is currently recorded in Pure as a type of Research Output.

**Enter API details - including an API key which gives access to the `research-outputs` endpoint - in [`_Config_DO_THIS_FIRST.ipynb`](./_Config_DO_THIS_FIRST.ipynb) and execute that notebook before executing this notebook.**

In [1]:
# We're using the requests library to talk to the API
import requests

# The display, HTML and Markdown libraries will help render HTML and Markdown
from IPython.core.display import display, HTML, Markdown

# The utility_functions.py script includes:
# - pretty_print_json(json_object, ind=4) - prints json with indentation and colours
import utility_functions as uf

In [2]:
# Retrieve the api_url and headers set in the config notebook
%store -r api_url
%store -r headers

In [3]:
# We'll be making requests to /research-outputs
request_url = "/".join([api_url,"research-outputs"])

Research Software has a type URI of `/dk/atira/pure/researchoutput/researchoutputtypes/nontextual/software`. We could use this as the value of the general-purpose 'q' parameter in a GET request, but to be sure we getting what we expect it's better to POST the request using JSON (or XML).

In [4]:
# We need the json library to create the POST body
import json

# Create the JSON structure using dicts/lists
request_body = {
    "typeUris": [
        "/dk/atira/pure/researchoutput/researchoutputtypes/nontextual/software",
    ]
}
# Serialize as JSON
request_json = json.dumps(request_body)

# We need to modify the headers to specify the type of data we're submitting
post_headers = headers.copy()
post_headers["Content-Type"] = 'application/json'

## Get all Research Software

In [5]:
# Make the request
response = requests.post(url=request_url, headers=post_headers, data=request_json)
research_software_json = response.json()

# Display raw output
uf.pretty_print_json(research_software_json)

{
    [94m"count"[39;49;00m: [34m106[39;49;00m,
    [94m"pageInformation"[39;49;00m: {
        [94m"offset"[39;49;00m: [34m0[39;49;00m,
        [94m"size"[39;49;00m: [34m10[39;49;00m
    },
    [94m"items"[39;49;00m: [
        {
            [94m"pureId"[39;49;00m: [34m166055[39;49;00m,
            [94m"externalId"[39;49;00m: [33m"3789"[39;49;00m,
            [94m"externalIdSource"[39;49;00m: [33m"standrews_research_output"[39;49;00m,
            [94m"uuid"[39;49;00m: [33m"4694cdeb-5d07-46b6-ba44-3600c8916b81"[39;49;00m,
            [94m"title"[39;49;00m: [33m"Java Hyper-Program System"[39;49;00m,
            [94m"managingOrganisationalUnit"[39;49;00m: {
                [94m"uuid"[39;49;00m: [33m"6eab485b-fea1-4d37-96cc-dbea0ea5b725"[39;49;00m,
                [94m"link"[39;49;00m: {
                    [94m"ref"[39;49;00m: [33m"content"[39;49;00m,
                    [94m"href"[39;49;00m: [33m"https://risweb.st-andrews.ac.uk/ws/api/513

## Count Software by publication year

Let's count number of software items published each year.

In [7]:
# We're going to get a count of research software items published each year from 1989 to 2018
pub_counts = {}

for year in range(1989, 2019):
    
    # Add additional parameters to the request JSON
    request_body_for_count = request_body.copy()
    
    # We don't need the individual records - just the summary information
    request_body_for_count['size'] = 0
    
    # Specify the date range
    request_body_for_count['publishedAfterDate'] = f'{year}-01-01'
    request_body_for_count['publishedBeforeDate'] = f'{year + 1}-01-01'
    
    request_json = json.dumps(request_body_for_count)
    
    # Make the request
    response = requests.post(url=request_url, headers=post_headers, data=request_json)
    research_software_json = response.json()

    # Add the result for this year to the results dictionary
    pub_counts[year] = research_software_json["count"]
    
print(pub_counts)

{1989: 1, 1990: 0, 1991: 1, 1992: 1, 1993: 0, 1994: 1, 1995: 2, 1996: 2, 1997: 3, 1998: 4, 1999: 3, 2000: 1, 2001: 2, 2002: 3, 2003: 2, 2004: 0, 2005: 2, 2006: 4, 2007: 3, 2008: 7, 2009: 5, 2010: 2, 2011: 3, 2012: 3, 2013: 2, 2014: 1, 2015: 1, 2016: 11, 2017: 32, 2018: 25}


### Visualising this data

We're going to use the Bokeh library to visualise this data. We won't go into the details of using Bokeh here; it's presented as an example of what can be done with data from the API.

In [8]:
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
from bokeh.palettes import PuBu

output_notebook()

# x labels need to be strings
x_labels = list(map(str, list(pub_counts.keys())))
y_values = list(pub_counts.values())

p = figure(x_range=x_labels, plot_height=500, title="Publication Counts by Year")
p.vbar(x=x_labels, top=y_values, width=0.9, color=PuBu[7][2])

p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p)