# Pure API Demonstration: Research Software

These notebooks demonstrate some uses of the API of Elsevier's *Pure* Current Research Information System (CRIS). This notebook demonstrates some requests for research software.

Research Software is currently recorded in Pure as a type of Research Output.

**Enter API details - including an API key which gives access to the `research-outputs` endpoint - in [`_Config_DO_THIS_FIRST.ipynb`](./_Config_DO_THIS_FIRST.ipynb) and execute that notebook before executing this notebook. Additionally, the section looking at software across Schools requires access to the `organisational-units` endpoint.**

In [None]:
# We're using the requests library to talk to the API
import requests

# The display, HTML and Markdown libraries will help render HTML and Markdown
from IPython.core.display import display, HTML, Markdown

# The utility_functions.py script includes:
# - pretty_print_json(json_object, ind=4) - prints json with indentation and colours
import utility_functions as uf

In [None]:
# Retrieve the api_url and headers set in the config notebook
%store -r api_url
%store -r headers

In [None]:
# We'll be making requests to /research-outputs
request_url = "/".join([api_url,"research-outputs"])

Research Software has a type URI of `/dk/atira/pure/researchoutput/researchoutputtypes/nontextual/software`. We could use this as the value of the general-purpose 'q' parameter in a GET request, but to be sure we getting what we expect it's better to POST the request using JSON (or XML).

In [None]:
# We need the json library to create the POST body
import json

# Create the JSON structure using dicts/lists
request_body = {
    "typeUris": [
        "/dk/atira/pure/researchoutput/researchoutputtypes/nontextual/software",
    ]
}
# Serialize as JSON
request_json = json.dumps(request_body)

# We need to modify the headers to specify the type of data we're submitting
post_headers = headers.copy()
post_headers["Content-Type"] = 'application/json'

## Get all Research Software

In [None]:
# Make the request
response = requests.post(url=request_url, headers=post_headers, data=request_json)
research_software_json = response.json()

# Display raw output
uf.pretty_print_json(research_software_json)

## Count Software by publication year

Let's count number of software items published each year.

In [None]:
# We're going to get a count of research software items published each year from 1989 to 2018
pub_counts = {}

for year in range(1989, 2019):
    
    # Add additional parameters to the request JSON
    request_body_for_count = request_body.copy()
    
    # We don't need the individual records - just the summary information
    request_body_for_count['size'] = 0
    
    # Specify the date range
    request_body_for_count['publishedAfterDate'] = f'{year}-01-01'
    request_body_for_count['publishedBeforeDate'] = f'{year + 1}-01-01'
    
    request_json = json.dumps(request_body_for_count)
    
    # Make the request
    response = requests.post(url=request_url, headers=post_headers, data=request_json)
    research_software_json = response.json()

    # Add the result for this year to the results dictionary
    pub_counts[year] = research_software_json["count"]
    
print(pub_counts)

### Visualising this data

We're going to use the Bokeh library to visualise this data. We won't go into the details of using Bokeh here; it's presented as an example of what can be done with data from the API.

In [None]:
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
from bokeh.palettes import PuBu
import math

output_notebook()

# x labels need to be strings
x_labels = list(map(str, list(pub_counts.keys())))
y_values = list(pub_counts.values())

p = figure(x_range=x_labels, plot_height=500, title="Publication Counts by Year")
p.vbar(x=x_labels, top=y_values, width=0.9, color=PuBu[7][2])

p.xgrid.grid_line_color = None
p.y_range.start = 0
p.xaxis.major_label_orientation = math.pi/2

show(p)

## Count Software items by School

### Identify the schools

First, we need to use the `organisational-units` endpoint to get a list of schools.

In [None]:
# We'll be making a request to /organisational-units
org_request_url = "/".join([api_url,"organisational-units"])

# Create the JSON structure using dicts/lists
# We only need the identifiers and names
# Ordering alphabetically by name
org_request_body = {
  "organisationalUnitTypeUris": [
    "/dk/atira/pure/organisation/organisationtypes/organisation/school"
  ],
  "orderings": [
    "name"
  ],
  "fields": [
    "uuid",
    "name.text.value"
  ],
  "size": 50
}
# Serialize as JSON
org_request_json = json.dumps(org_request_body)

# Make the request
response = requests.post(url=org_request_url, headers=post_headers, data=org_request_json)
schools_json = response.json()

# Display details
for item in schools_json["items"]:
    print(item["uuid"],item["name"]["text"][0]["value"])

### Get software counts by school

In [None]:
pub_counts = {}

for item in schools_json["items"]:
    
    # Add additional parameters to the request JSON
    request_body_for_count = request_body.copy()
    
    # We don't need the individual records - just the summary information
    request_body_for_count['size'] = 0
    
    # Specify the school
    request_body_for_count['forPersons'] = { "forOrganisations": { "uuids": [item["uuid"],] } }
    
    request_json = json.dumps(request_body_for_count)    
    
    # Make the request
    response = requests.post(url=request_url, headers=post_headers, data=request_json)
    research_software_json = response.json()
    
    # Add the result for this year to the results dictionary
    pub_counts[item["name"]["text"][0]["value"]] = research_software_json["count"]
    
print(pub_counts)

In [None]:
output_notebook()

# x labels need to be strings
x_labels = list(map(str, list(pub_counts.keys())))
y_values = list(pub_counts.values())

p = figure(x_range=x_labels, plot_height=500, title="Software items by School")
p.vbar(x=x_labels, top=y_values, width=0.9, color=PuBu[7][2])

p.xgrid.grid_line_color = None
p.y_range.start = 0
p.xaxis.major_label_orientation = math.pi/2

show(p)