# Using the PRIMO Search API

## The goal of this script

Download a list of databases from PRIMO that your institution subscribes to.

### Introduction to the Primo Search API

First, you must have the API endpoint. For Orbis Cascade Alliance users, the endpoint is: <code>https://api-na.hosted.exlibrisgroup.com/primo/v1/search?</code><br>

Second, you must have an API key. The API key must be turned on at the institution's API Dashboard.<br>

Third, to issue a query from the [Primo Search API](https://developers.exlibrisgroup.com/primo/apis/docs/primoSearch/R0VUIC9wcmltby92MS9zZWFyY2g=/), you must select a series of parameters. These are typically defined by your institution and are unique to your institution's configuration. For example, at the University of Idaho, we use several scope/tab combinations to search Primo for different types of resources.<br>

Scope options:
* All items
    * Scope: <code>DN_and_CI</code> 
    * Tab: <code>Everything</code>
* UI+EResources
    * Scope: <code>MyInst_and_CI</code>
    * Tab: <code>UI_EResources_Slot</code>
* UI+Summit (No EResources)
    * Scope: <code>NZ_PandUI_P</code>
    * Tab: <code>UI_Summit_Slot</code>
* EResources Only
    * Scope: <code>CentralIndex</code>

The vid paramters is also required, and unique for your institution. For example, the University of Idaho vid paramter is:
<code>01ALLIANCE_UID:UID</code><br>

Lastly, queries should begin with <code>q=</code>, then other terms, e.g. <code>q=any,contains,sage+grouse</code>. This is standard across institutions, and more parameters for query and other standard Primo elements are listed in the Ex Libris API documentation.

### Step 1 - Import the required Python libraries

In the cell below, we import four libraries that are useful for this script.
1. requests - this is a standard library for sending and receiving information using the HTTP protocol
2. pandas - this is optional, but is a very useful library for organizing information when using Python
3. json - this is a useful library for handling data in the JSON format. The Primo API will return JSON data to us.

In [1]:
import requests
import pandas as pd
import json

### Step 2 - Create the query and issue the request(s)

Using requests is simple. Set a series of paramters (unique to the API you are using), and then use the <code>.get()</code> method to send the request.
You will also need a response variable that stores the response from the API. The example below is a standard approach.

In this case, since our goal is to find databases, we will need to use a few special parameters outlined the Primo API documentation. We will use <code>qInclude=</code> and look for the facet "resource type" or <code>facet_rtype</code> and then set it to <code>Databases</code>. Together, this looks like: <code>qInclude=facet_rtype,exact,Databases</code>

In testing, this seems to not quite be precise, so if we add a generic keyword search for "database" that seems to help get the results we want. Together that is <code>q=sub,contains,database*</code>.

Everything else listed is a required parameter for any query: scope, tab, vid, key. See the introduction for the four; they are unique to the institution.

In [13]:
# First, we set the various parameters for our query.

key = #'key' # your unique API Key
limit = '100' # results limit (whatever you want)
scope = 'DN_and_CI' # UI-specific scope for UI resources only (no Summit)
tab = 'Everything'  # UI-specific tab for UI resources only (no Summit)
query = 'database*' # the search query
vid = '01ALLIANCE_UID:UID' # UI-specific parameter

# Second, we construct the query and pass the output to a response variable that we can work with later.

response = requests.get(f'https://api-na.hosted.exlibrisgroup.com/primo/v1/search?vid={vid}&qInclude=facet_rtype,exact,Databases&limit={limit}&tab={tab}&scope={scope}&q=sub,contains,{query}&apikey={key}')

# This is an optional step to receive the status code and explanation from the server.
# 200 is good; anything else probably resulted in an error.

print("Status: " + str(response.status_code))
print("Message: " + str(response.reason))

Status: 200
Message: OK


### Step 3 - Process the results

After a successful request, we should have results in [JSON](https://www.w3schools.com/js/js_json_intro.asp) format. Json is essentially a series of key-value pairs.

Generally, there will be some header information that may or may not be useful. If not, strip the data down to the records of interest for easier processing.

You can look at the file by using <code>response.json()</code>. You will see that all search results are nested within the 'docs' key. 
That is, the results are a list (aka an array) that constitutes the value of the key 'docs'. So, at some point, you will need to isolate the results.
We can do that by saving the results to a new variable which we will call 'docs'. Note that docs is no longer JSON. We saved a list from a JSON file, so the new
variable is simply a list of items. We can check the number of search results listed by asking for the length of the list (i.e. the number of items in the list).

In [17]:
# If the length is zero - we  had no results. Otherwise, we have something.
# Since we set the limit to 100 in the request, we should only have 100 results.

docs = response.json()['docs'] # save the result list as a Python list into the docs variable
len(docs) # check the number of results by looking at the length of the list

100

### Step 4 - Optional: Clean up the results for easier re-use

Since the results are a complex nested list of items, it's usually easier to pull out the parts you actually want. In this case, we just want a list of titles.

You can look at an individual item in the list by calling the variable name and the index number of the list (where it is in the list). It's usually easiest to look at the first one to get
a sense of the structure of the (former JSON) dataset. For example, you can use <code>docs[0]</code> or <code>print(docs[0])</code>.

After doing so, you'll see that the structure of a Primo results has the title nested under 'display', which is under 'pnx', which is under 'docs'.
Each search result has a title structured exactly this way. We can then loop through the results (i.e. the docs variable) and extract the titles.

In [25]:
# Quick check to see the titles of the results
[print(doc['pnx']['display']['title'][0]) for doc in docs if docs.index(doc) <= 5]

# Note: we limited the index to the first 5 titles to just make it simple. You can remove the if docs.index(doc) <=5 section.

Web of science. SciELO citation index
Swank Digital Campus.
Pronunciator.
PDR prescribers' digital reference 
Agricultural & environmental science database.
Mometrix eLibrary.


[None, None, None, None, None, None]

### Step 4 Continued

Using the pandas library, we can easily see the results in a clearer way. First, we need to read the json into a dataframe.
A dataframe is the basic structure used in pandas (like a dataframe in R, it's a simple grid or matrix structure). To do this, we use
the pandas method <code>.json_normalize</code>, which handles the complexity of moving JSON into the grid. Even though our docs variable is a list,
the data in it still uses a JSON structure, so the method generally works.

Second, pop out the title column. This just gives us a way of grabbing column from the dataframe. We'll use this to move it to the front for readability.

Third, insert it into the front and call the dataframe to see it, as we do below. Note - we do use <code>.head()</code> to limit the output.

In [28]:
dbList = pd.json_normalize(docs) # create a dataframe called dbList, into which the docs data will be normalized
title = dbList.pop('pnx.display.title') # create a column entitled 'title' using the pnx.display.title column
dbList.insert(0,'title',title) # put that new title at the front for readability
dbList.head() # display the first five rows of the dataframe

Unnamed: 0,title,context,adaptor,@id,pnx.display.source,pnx.display.type,pnx.display.subject,pnx.display.format,pnx.display.creationdate,pnx.display.publisher,...,pnx.display.relation,pnx.display.ispartof,pnx.display.edition,pnx.addata.edition,pnx.display.lds72,pnx.display.lds68,pnx.display.lds71,pnx.addata.au,pnx.addata.creatorfull,pnx.addata.isbn
0,[Web of science. SciELO citation index],L,Local Search Engine,https://na01.alma.exlibrisgroup.com/primaws/re...,[Alma],[database],[Science -- Periodicals -- Indexes -- Database...,[1 online resource],[2013],"[New York, NY : Thomson Reuters ; São Paulo, ...",...,,,,,,,,,,
1,[Swank Digital Campus.],L,Local Search Engine,https://na01.alma.exlibrisgroup.com/primaws/re...,[Alma],[database],[Motion pictures -- Databases],[1 online resource],[2012?-],"[Saint Louis, Missouri : Swank Motion Pictures...",...,,,,,,,,,,
2,[Pronunciator.],L,Local Search Engine,https://na01.alma.exlibrisgroup.com/primaws/re...,[Alma],[database],"[Languages, Modern -- Self-instruction -- Data...",[1 online resource],[2011-],"[Shepherdstown, WV : Pronunciator]",...,,,,,,,,,,
3,[PDR prescribers' digital reference ],L,Local Search Engine,https://na01.alma.exlibrisgroup.com/primaws/re...,[Alma],[database],"[Pharmacology -- Databases, Drugs -- Databases...",[1 online resource],[2017-],"[Whippany, NJ : ConnectiveRx]",...,,,,,,,,,,
4,[Agricultural & environmental science database.],L,Local Search Engine,https://na01.alma.exlibrisgroup.com/primaws/re...,[Alma],[database],"[Agriculture -- Databases, Agriculture -- Inde...",[1 online resource],[2016-],"[Ann Arbor, MI : Proquest LLC]",...,,,,,,,,,,


### Conclusion

From here, you can use pandas to export the data to a CSV, or to SQL database, visualize it in some way, or work with the data in a variety of ways.

### Credits/Acknowledgements

This script was created by Jeremy Kenyon (jkenyon@uidaho.edu, University of Idaho).

Thank you to Blake Galbraith at Washington State University, whose MS Powershell version of the script led to the creation of this one.