# Querying ISA investigations with SparQL and GraphQL

## Abstract
The ISA api comes packaged with a graphQL interface and a JSON-LD serializer to help users query investigations.
The aim of this notebook is to:
   - learn to load an ISA Investigation from a json file.
   - learn to execute a graphQL query on the ISA Investigation.
   - learn to serialize an ISA Investigation to JSON-LD with different contexts
   - generate an RDF graph from the JSON-LD
   - execute a sparQL query on that graph.

To illustrate this notebook, we will try to get the names of all the protocols types stored in an ISA investigation.

## 1. Getting the tools

In [1]:
# Let's first import all the packages we need

In [2]:
from os import path
import json

from rdflib import Graph, Namespace

from isatools.isajson import load
from isatools.model import set_context

## 2. Reading and loading an ISA Investigation in memory from an ISA-JSON instance

In [3]:
filepath = path.join('json', 'BII-S-3', 'BII-S-3.json')
with open(filepath, 'r') as f:
    investigation = load(f)

## 3. Write a graphQL query

In [4]:
query = """
{
    studies {
        protocols {
            type: protocolType { annotationValue }
        }
    }
}
"""
protocols_graphql = []
results = investigation.execute_query(query)
for study in results.data['studies']:
    protocols = study['protocols']
    for protocol in protocols:
        value = protocol['type']['annotationValue']
        if value not in protocols_graphql:
            protocols_graphql.append(value)
print(protocols_graphql)

['sample collection', 'nucleic acid extraction', 'reverse transcription', 'library construction', 'nucleic acid sequencing', 'data transformation']


In [5]:
## 4. Setting options for the contexts binding

In [6]:
set_context(new_context='wdt', combine=False, local=False)

The `set_context()` method takes three parameters:
    - new_context: to choose the vocabulary to use between ``sdo``, ``obo`` and ``wdt``
    - combine: if ``True``, only one context will be used
    - local: if ``True``, uses local files else the GitHub contexts

## 5. Generate a JSON-LD serialization

In [7]:
ld = investigation.to_dict(ld=True)

The investigation can be serialized to json with the `to_dict()` method. By passing the optional parameter `ld=True`, the serializer binds the `@type`, `@context` and `@id` to each object in the JSON.

## 6. Generate an RDF graph

Before we can generate a graph we need to create the proper namespaces and transform the `ld` variable into a string

In [8]:
# Creating the namespace
WDT = Namespace("http://www.wikidata.org/wiki/")
WDTP = Namespace('https://www.wikidata.org/wiki/Property:')
ISA = Namespace('https://isa.org/')

ld_string = json.dumps(ld) # Get a string representation of the ld variable
graph = Graph() # Create an empty graph
graph.parse(data=ld_string, format='json-ld') # Load the data into the graph

# Finally, bind the namespaces to the graph
graph.bind('wdt', WDT)
graph.bind('isa', ISA)
graph.bind('wdtp', WDTP)

Traceback (most recent call last):
  File "F:\Work\isa-api\venv3.10\lib\site-packages\rdflib\term.py", line 2084, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "F:\Work\isa-api\venv3.10\lib\site-packages\isodate\isodates.py", line 203, in parse_date
    raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: ''
Traceback (most recent call last):
  File "F:\Work\isa-api\venv3.10\lib\site-packages\rdflib\term.py", line 2084, in _castLexicalToPython
    return conv_func(lexical)  # type: ignore[arg-type]
  File "F:\Work\isa-api\venv3.10\lib\site-packages\isodate\isodates.py", line 203, in parse_date
    raise ISO8601Error('Unrecognised ISO 8601 date format: %r' % datestring)
isodate.isoerror.ISO8601Error: Unrecognised ISO 8601 date format: ''
Traceback (most recent call last):
  File "F:\Work\isa-api\venv3.10\lib\site-packages\rdflib\term.py", line 2084, in _castLexi

## 7. Create a small sparQL query and execute it

In [9]:
query = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wdtp: <https://www.wikidata.org/wiki/Property:>
PREFIX wdt: <http://www.wikidata.org/wiki/>

SELECT distinct ?protocolTypeName
WHERE {
    ?p rdf:type wdt:Q41689629 . # Is a protocol
    ?p wdtp:P7793 ?protocolType .
    ?protocolType wdtp:P527 ?protocolTypeName . # Get each protocol type name
    FILTER (?protocolTypeName!=""^^wdt:Q1417099) # Filter out empty protocol type name
}
"""
protocols_sparql = []
for node in graph.query(query):
    n = node.asdict()
    for fieldName in n:
        fieldVal = str(n[fieldName].toPython())
        if fieldVal not in protocols_sparql:
            protocols_sparql.append(fieldVal)
print(protocols_sparql)
assert(protocols_sparql == protocols_graphql)

['sample collection', 'nucleic acid extraction', 'reverse transcription', 'library construction', 'nucleic acid sequencing', 'data transformation']
