# Pure API Demonstration: Research Outputs

These notebooks demonstrate some uses of the API of Elsevier's *Pure* Current Research Information System (CRIS). This notebook demonstrates some requests for research outputs.

**Enter API details - including an API key which gives access to the `research-outputs` endpoint - in [`_Config_DO_THIS_FIRST.ipynb`](./_Config_DO_THIS_FIRST.ipynb) and execute that notebook before executing this notebook.**

In [None]:
# We're using the requests library to talk to the API
import requests

# The display, HTML and Markdown libraries will help render HTML and Markdown
from IPython.core.display import display, HTML, Markdown

# The utility_functions.py script includes:
# - pretty_print_json(json_object, ind=4) - prints json with indentation and colours
import utility_functions as uf

In [None]:
# Retrieve the api_url and headers set in the config notebook
%store -r api_url
%store -r headers

In [None]:
# We'll be making requests to /research-outputs
request_url = "/".join([api_url,"research-outputs"])

## Simple unqualified request

In [None]:
# Make request with no parameters
# By default, returns most recent research outputs in descending date order
response = requests.get(url=request_url, headers=headers)
research_outputs_json = response.json()

In [None]:
# Display raw output
uf.pretty_print_json(research_outputs_json)

In [None]:
# Print the UUIDs and titles of all returned items
for item  in research_outputs_json["items"]:
    print(item["uuid"], ":", item["title"])

## Searching

We can search with the `q` parameter in a `GET` request, using the [Lucene syntax]("https://lucene.apache.org/core/2_9_4/queryparsersyntax.html")

In [None]:
# Make request to /research-outputs passing query terms via the 'q' parameter
# Let's look for research outputs which mention "software" or "programming" in their title, subtitle or abstract
request_params = {"q" : ["(^title:(software OR programming) OR ^subTitle:(software OR programming) OR ^abstract:(software OR programming))"]}
response = requests.get(url=request_url, headers=headers, params=request_params)
research_outputs_json = response.json()

In [None]:
# Display raw output
uf.pretty_print_json(research_outputs_json)

## Get only specified fields

In [None]:
# Set the request parameters specifying the desired fields:
# - pureId
# - uuid
# - title
# - open access status
# - names of associated persons
request_params = {"fields" : ["pureId", "uuid", "title", "openAccessPermission.value", "personAssociations.name.*"]}
response = requests.get(url=request_url, headers=headers, params=request_params)
research_outputs_json = response.json()

In [None]:
# Display raw output
uf.pretty_print_json(research_outputs_json)

## Get HTML renderings

Note that when requesting specific renderings, almost all other fields will not be included in the response unless explicitly requested using the "fields" parameter as above.

In [None]:
# Set the request parameters - specifying standard, harvard and apa renderings
request_params = {"rendering" : ["standard", "harvard", "apa"]}
response = requests.get(url=request_url, headers=headers, params=request_params)
research_outputs_json = response.json()

In [None]:
# Display raw output
uf.pretty_print_json(research_outputs_json)

In [None]:
# Render all HTML
for item in research_outputs_json["items"]:
    # Display the format and render the content of each rendering
    for r in item["rendering"]:
        display(HTML("<h4>" + r["format"] + "</h4>"))
        display(HTML(r["value"]))

## POST requests

A greater range of parameters are available to be passed in the body of POST requests (using JSON or XML format) than can be passed as GET parameters.

In [None]:
# We need the json library to create the POST body
import json

# We're going to get a count of research outputs published each year from 2001 to 2018
pub_counts = {}

for year in range(2001, 2019):
    # Create the POST body as a dictionary and serialize as JSON (json.dumps)
    # "size" = 0 because we're only interested in the count at the start of the response
    # - don't need to see any individual records
    post_body = json.dumps({"size": 0, "publishedAfterDate": f'{year}-01-01', "publishedBeforeDate": f'{year + 1}-01-01'})
    # We need to modify the headers to specify the type of data we're submitting
    post_headers = headers.copy()
    post_headers["Content-Type"] = 'application/json'
    # Make the request
    response = requests.post(url=request_url, headers=post_headers, data=post_body)
    research_outputs_json = response.json()
    # Add the result for this year to the results dictionary
    pub_counts[year] = research_outputs_json["count"]

print(pub_counts)

### Visualising this data

We're going to use the Bokeh library to visualise this data. We won't go into the details of using Bokeh here; it's presented as an example of what can be done with data from the API.

In [None]:
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
from bokeh.palettes import PuBu

output_notebook()

# x labels need to be strings
x_labels = list(map(str, list(pub_counts.keys())))
y_values = list(pub_counts.values())

p = figure(x_range=x_labels, plot_height=500, title="Publication Counts by Year")
p.vbar(x=x_labels, top=y_values, width=0.9, color=PuBu[7][2])

p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p)