In [1]:
import requests
import json

This document provides a high level overview of the different ways to access ROBOKOP information programmatically. The first two examples rely on the Translator Reasoner API format for both the query submission and return of results. The first TRAPI example is submitted directly to the Automat system, which hosts the ROBOKOP knowledgegraph. This query is submitted without preprocessing and returned without postprocessing of the results. An alternative is the Aragorn interface, which accepts a TRAPI query and returns a TRAPI query. This interface will expand the query to include synonomous concepts and postprocesses the results to score the results for potential relevance. The final two examples illustrate how to query the ROBOKOP knowledgegraph directly using the Neo4j query language Cypher. This includes both a direct query against the standalone instance of the ROBOKOP knowledgegraph as well as the version hosted via the Automat system.  The latter instance is the knowledge source for all TRAPI interfaces.
More info about accessing ROBOKOP KG via TRAPI is available in the "HelloRobokop_TRAPI" and Cypher options are provided in much more detail in "HelloRobokop_Cypher".

The first example uses the TRAPI format to query the ROBOKOP instance hosted on the Automat system.

The TRAPI Documentation is available here: https://github.com/NCATSTranslator/ReasonerAPI

Most TRAPI documents contain a `message` key.  Within that `message` are a `query_graph` denoting the user query,
a `knowledge_graph` consisting of the union of all nodes and edges that match the `query_graph` pattern, and a list of `results` that bind `query_graph` elements to `knowledge_graph` elements.

When a user submits a query, the message contains only the `query_graph`.  The query graph below consists of 3 nodes connected together in a line.   Two of the nodes (`n00` and `n02`) have specified identifiers, while the middle node of the line does not.  Rather the middle node has a list of `categories` that are acceptable. Nodes and edges are specified as defined lists to ensure the correct ordering of output strings at the end of this Notebook. This is not required for running the queries or retrieving results.

This query asks "Find me a Biological Process or Activity, or a Gene, or a Pathway that is related to both `PUBCHEM.COMPOUND:644073` (Buprenorphine) and `HP:0001337` (Tremor).

In [2]:
edges = ["e00", "e01"]
nodes = ["n00", "n01", "n02"]
query={
    "message": {
      "query_graph": {
        "edges": {
         edges[0] : {
            "subject": nodes[0],
              "object": nodes[1],
          "predicates":["biolink:related_to"]
          },
          edges[1]: {
            "subject": nodes[1],
              "object": nodes[2],
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          nodes[0]: {
            "ids": ["PUBCHEM.COMPOUND:644073"],
            "categories": ["biolink:ChemicalEntity"]
          },
          nodes[1]: {
              "categories": ["biolink:BiologicalProcessOrActivity","biolink:Gene","biolink:Pathway"]
          },
          nodes[2]: {
            "ids": ["HP:0001337"],
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }
        }
      }
    }
  }


This query can be sent to various components of Translator as needed.  It can be sent directly to the ROBOKOP knowledgegraph hosted in the Automat system like this:

In [5]:
robokop_submit_url = "http://automat-u24.apps.renci.org/robokopkg/1.3/query"
response = requests.post(robokop_submit_url,json=query)

In [6]:
print(response.status_code)

200


In [7]:
print(len(response.json()['message']['results']))

7


In [8]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

The response in JSON form is a python dictionary with three main keys, the `message`, `log_level`, and `workflow`.  The `message` component contains the `query_graph` from the input query, ROBOKOP has added the `knowledge_graph`, and `results` which in combination contain the answer to the query graph. While we'll continue querying the response for the next few sections to reinforce the structure of the response, we'll go ahead and create separate variables for the three components to make future queries easier to read.

In [9]:
print(response.json().keys())
print(response.json()['message'].keys())
query_out = response.json()['message']['query_graph']
kg = response.json()['message']['knowledge_graph']
results = response.json()['message']['results']

dict_keys(['message', 'log_level', 'workflow'])
dict_keys(['query_graph', 'knowledge_graph', 'results'])


The `results` component contains pathways resulting from the query message. Each pathway is organized into edge_bindings and node bindings and contains identifiers that map to the query elements.  The order of results will change between runs, so the result below could include either `NCBIGene:4988` or `NCBIGene:1565` as the entity associated with the query node `n01`. We'll use the local variable we created above to pull this ID for the current run. This same info could be pulled from the `response.json` object. All of the information about these identifiers, such as name, properties, or sources can be found in the `knowledge_graph` component of the `message` section of the response.

In [10]:
# Illustrating the structure of each pathway result from the message component
pp.pprint(response.json()['message']['results'][0])
n01_id = results[0]['node_bindings']['n01'][0]['id']
print(n01_id)
e00_id = results[0]['edge_bindings']['e00'][0]['id']
print(e00_id)

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '79325668'},
                                       {'attributes': None, 'id': '113499113'},
                                       {'attributes': None, 'id': '88245379'},
                                       {'attributes': None, 'id': '8608859'}],
                           'e01': [{'attributes': None, 'id': '76822934'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'PUBCHEM.COMPOUND:644073',
                                            'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:1565',
                                            'query_id': None}],
                           'n02': [    {    'attributes': None,
                                            'id': 'HP:02000

The `knowledge_graph` contains information about each of the Nodes and Edges found in `results`.  An example of a Node and an Edge are shown below.

In [11]:
pp.pprint(response.json()['message']['knowledge_graph'].keys())

dict_keys(['nodes', 'edges'])


Information returned for the each Node includes the concept ID (key), biolink categories, the name/label, attributes, the value type, and others.  Note that each entry under the `nodes` level is itemized in dictionary format and not lists.  The attributes for `n01` from the result above are shown below. Depending on the ID selected for this run, we can see that this is either the CYP2D6 or OPRM1 gene and get other descriptive information about the gene.

In [12]:
response.json()['message']['knowledge_graph']['nodes'][n01_id]

{'categories': ['biolink:GeneOrGeneProduct',
  'biolink:BiologicalEntity',
  'biolink:NamedThing',
  'biolink:ChemicalEntityOrGeneOrGeneProduct',
  'biolink:MacromolecularMachineMixin',
  'biolink:Gene',
  'biolink:Entity',
  'biolink:ChemicalEntityOrProteinOrPolypeptide',
  'biolink:GeneProductMixin',
  'biolink:OntologyClass',
  'biolink:PhysicalEssence',
  'biolink:Polypeptide',
  'biolink:GenomicEntity',
  'biolink:Protein',
  'biolink:ThingWithTaxon',
  'biolink:PhysicalEssenceOrOccurrent'],
 'name': 'CYP2D6',
 'attributes': [{'attribute_type_id': 'biolink:Attribute',
   'value': '22q13.2',
   'value_type_id': 'EDAM:data_0006',
   'original_attribute_name': 'location',
   'value_url': None,
   'attribute_source': None,
   'description': None,
   'attributes': None},
  {'attribute_type_id': 'biolink:Attribute',
   'value': 'protein-coding gene',
   'value_type_id': 'EDAM:data_0006',
   'original_attribute_name': 'locus_group',
   'value_url': None,
   'attribute_source': None,
   '

Information returned for the each Edge includes the edge ID (key), the subject's concept ID, the object's concept ID, the edge's predicate, any qualifiers, and attributes.  Note that each entry under the `edges` level is itemized in dictionary format and not lists.  The content for one of the edges corresponding to e00 in the query graph is shown below. We can see that our buprenorphine compound has been shown to "affect" the OPRM1 gene based upon the subject-predicate-object triple that definess the edge. Additional qualifiers provide information like the direction of the effect (decreased in this case). There are also attributes that provide information about the original source of information used to establish the edge relationship.

In [13]:
response.json()['message']['knowledge_graph']['edges'][e00_id]

{'subject': 'PUBCHEM.COMPOUND:644073',
 'object': 'NCBIGene:1565',
 'predicate': 'biolink:regulates',
 'qualifiers': [{'qualifier_type_id': 'biolink:object_direction_qualifier',
   'qualifier_value': 'downregulated'}],
 'attributes': [{'attribute_type_id': 'biolink:Attribute',
   'value': ['tmkp:12699a64cae70b20411935b5e5028d220ccbe4f7a0ad13a81978026cb43111bf'],
   'value_type_id': 'EDAM:data_0006',
   'original_attribute_name': 'tmkp_ids',
   'value_url': None,
   'attribute_source': None,
   'description': None,
   'attributes': None},
  {'attribute_type_id': 'biolink:Attribute',
   'value': '0.99969625',
   'value_type_id': 'EDAM:data_0006',
   'original_attribute_name': 'biolink:tmkp_confidence_score',
   'value_url': None,
   'attribute_source': None,
   'description': None,
   'attributes': None},
  {'attribute_type_id': 'biolink:Attribute',
   'value': 'Buprenorphine exhibited potent, competitive inhibition of CYP2D6 (Ki 10 +/- 2 microM and 1.8 +/- 0.2 microM) and CYP3A4 (Ki 40 

Next we will summarize all results to provide an overview of the different result graphs matching our query. Each node and edge has all the additional info shown above available for further inspection. We will use the lists we created before assembling the initial query graph to ensure that our output is ordered the same way as our original query. We will also use the local variables we created from the `response.json()` object above to access the results and knowledge graph to make the code easier to read. Note that this logic will only work with linear queries such as the example used for this demonstration. If an edge is supported by more than one information source, there will be multiple bindings for a single edge. For our summary, we select the first label, but when interpreting results it is important to evaluate all sources of support for each edge.

In [14]:
print(nodes)
print(edges)
pp.pprint(query)
pp.pprint(query_out)

['n00', 'n01', 'n02']
['e00', 'e01']
{    'message': {    'query_graph': {    'edges': {    'e00': {    'object': 'n01',
                                                                   'predicates': [    'biolink:related_to'],
                                                                   'subject': 'n00'},
                                                       'e01': {    'object': 'n02',
                                                                   'predicates': [    'biolink:related_to'],
                                                                   'subject': 'n01'}},
                                         'nodes': {    'n00': {    'categories': [    'biolink:ChemicalEntity'],
                                                                   'ids': [    'PUBCHEM.COMPOUND:644073']},
                                                       'n01': {    'categories': [    'biolink:BiologicalProcessOrActivity',
                                                          

In [15]:
result_summaries = []
for r in results:
    rs = ""
    j = 0
    while j < len(nodes):
        node_id = r['node_bindings'][nodes[j]][0]['id']
        node_name = kg['nodes'][node_id]['name']
        rs = rs + f"{node_name} ({node_id})"
        if j < len(edges):
            edge_id = r['edge_bindings'][edges[j]][0]['id']
            edge_name = kg['edges'][edge_id]['predicate']
            rs = rs + f"--{edge_name}-->"
        j = j + 1
    result_summaries.append(rs)

In [16]:
for rs in result_summaries:
    print(rs)

Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Limb tremor (HP:0200085)
Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Action tremor (HP:0002345)
Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Resting tremor (HP:0002322)
Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:directly_physically_interacts_with-->OPRM1 (NCBIGene:4988)--biolink:genetic_association-->Asterixis (HP:0012164)
Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Tremor (HP:0001337)
Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Pill-rolling tremor (HP:0025387)
Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Postural tremor

The results above are just database matches, there are no scores or other additions.  You can instead send the TRAPI to the robokop application using Aragorn, which performs some additional post-processing of the results.

In [17]:
ara_robokop_submit_url = "https://aragorn-u24.apps.renci.org/robokop/query"
response = requests.post(ara_robokop_submit_url,json=query)

In [18]:
response.status_code

200

In [19]:
len(response.json()['message']['results'])

7

In [20]:
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {    'attributes': [    {    'attribute_type_id': 'biolink:has_numeric_value',
                                                                    'attributes': [    {    'attribute_type_id': 'biolink:has_qualitative_value',
                                                                                            'original_attribute_name': 'aragorn_weight_source',
                                                                                            'value': 'infores:textminingkp',
                                                                                            'value_type_id': 'biolink:InformationResource'}],
                                                                    'original_attribute_name': 'weight',
                                                                    'value': 0.5184467655944169,
                                                                    'value_type_id': 'EDAM:data_1669'}],
                  

In [21]:
for result in response.json()['message']['results']:
    print(result['score'])

0.2405420768685431
0.13169211512866802
0.13022177911375754
0.1295121474643961
0.12910851956982536
0.12910851956982522
0.12910851956982522


In [22]:
aragorn_result_summaries = []
for r in response.json()['message']['results']:
    rs = f"Score={round(r['score'], 3)}: "
    j = 0
    while j < len(nodes):
        node_id = r['node_bindings'][nodes[j]][0]['id']
        node_name = kg['nodes'][node_id]['name']
        rs = rs + f"{node_name} ({node_id})"
        if j < len(edges):
            edge_id = r['edge_bindings'][edges[j]][0]['id']
            edge_name = kg['edges'][edge_id]['predicate']
            rs = rs + f"--{edge_name}-->"
        j = j + 1
    aragorn_result_summaries.append(rs)

In [23]:
for rs in aragorn_result_summaries:
    print(rs)

Score=0.241: Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Tremor (HP:0001337)
Score=0.132: Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:affects-->OPRM1 (NCBIGene:4988)--biolink:genetic_association-->Asterixis (HP:0012164)
Score=0.13: Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Resting tremor (HP:0002322)
Score=0.13: Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Action tremor (HP:0002345)
Score=0.129: Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Pill-rolling tremor (HP:0025387)
Score=0.129: Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 (NCBIGene:1565)--biolink:genetic_association-->Limb tremor (HP:0200085)
Score=0.129: Buprenorphine (PUBCHEM.COMPOUND:644073)--biolink:regulates-->CYP2D6 

You can also bypass TRAPI entirely and just use cypher to talk to the graph.  There are two instances.  There is one at http://robokopkg.renci.org which has a cypher browser on it, or you can write cypher and post it there. This requires the neo4j package, which is likely not installed if you haven't accessed a neo4j database before. The code below should work, but if you encounter errors, look into how to best install this package for your local setup.

In [24]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install neo4j
from neo4j import GraphDatabase



In [25]:
pw=''
driver = GraphDatabase.driver('bolt://robokopkg.renci.org:7687', auth=('neo4j', pw))

Cypher queries can be posted to either the neo4j browser at robokopkg.renci.org or through automat at automat.renci.org.  Depending on how the Cypher query is structured, results may be returned differently between the two access points.  The query below is asking for slightly different information than the TRAPI message above.  The TRAPI query asks for results related to `Buprenorphine` and `Tremor` that are of the type `Gene`, `Pathway`, or `BiologicalProcessOrActivity`.  Because no results were present for `Pathway` or `BiologicalProcessOrActivity`, a Cypher query including these would return 0 results, so the below query has been modified to ask for results related to `Buprenorphine` and `Tremor` that are of the type `Gene`.

In [26]:
#cypher = f'MATCH (a:`biolink:Gene`) RETURN a LIMIT 1'
cypher = f"MATCH (n0_0:`biolink:ChemicalEntity`)-[r0_0]-(n1_0:`biolink:Gene`)-[r1_0]-(n2_0:`biolink:DiseaseOrPhenotypicFeature`) WHERE n0_0.name IN ['Buprenorphine'] AND n2_0.name IN ['Tremor'] RETURN * LIMIT 100"
with driver.session() as session:
    results = session.run(cypher)
    for result in results:
        print(result)

    #print(results)
    if (results) == 0:
        print("No results found")



<Record n0_0=<Node element_id='8421444' labels=frozenset({'biolink:Entity', 'biolink:ChemicalEntity', 'biolink:NamedThing', 'biolink:ChemicalEntityOrProteinOrPolypeptide', 'biolink:PhysicalEssence', 'biolink:MolecularEntity', 'biolink:ChemicalEntityOrGeneOrGeneProduct', 'biolink:PhysicalEssenceOrOccurrent', 'biolink:SmallMolecule', 'biolink:ChemicalOrDrugOrTreatment'}) properties={'CHEBI_ROLE_delta_opioid_agent': True, 'smiles': 'CO[C@]12CC[C@@]3(C[C@@H]1[C@](C)(O)C(C)(C)C)[C@H]1CC4=CC=C(O)C5=C4[C@@]3(CCN1CC1CC1)[C@H]2O5', 'description': 'A morphinane alkaloid that is 7,8-dihydromorphine 6-O-methyl ether in which positions 6 and 14 are joined by a -CH2CH2- bridge, one of the hydrogens of the N-methyl group is substituted by cyclopropyl, and a hydrogen at position 7 is substituted by a 2-hydroxy-3,3-dimethylbutan-2-yl group. It is highly effective for the treatment of opioid use disorder and is also increasingly being used in the treatment of chronic pain.', 'fda_labels': 74, 'rgb': 28,

You can also send the cypher through the automat interface instead:

In [None]:
j = {'query': cypher}
results = requests.post('https://automat.renci.org/robokopkg/cypher',json=j)

In [None]:
print(results)

In [None]:
print(results.json())