In [3]:
import requests
import json

TRAPI Documentation: https://github.com/NCATSTranslator/ReasonerAPI

Most TRAPI documents contain a `message` key.  Within that `message` are a `query_graph` denoting the user query,
a `knowledge_graph` consisting of the union of all nodes and edges that match the `query_graph` pattern, and a list of `results` that bind `query_graph` elements to `knowledge_graph` elements.

The following message contains only a `query_graph`.  This query graph consists of 3 nodes connected together in a line.   Two of the nodes (`n00` and `n02`) have specified identifiers, while the middle node of the line does not.  Rather the middle node has a list of `categories` that are acceptable.

More details can be found in the `HelloRobokop_TRAPI.ipynb` notebook.

For a researcher who is starting from a `name` who wants to use TRAPI, they can use the Node Resolver tool to get a list of identifers for the nodes.  For example, finding IDs related to `Buprenorphine` is below.

In [4]:
search_string = 'Buprenorphine'
results = requests.post(f'https://name-resolution-sri.renci.org/lookup?string={search_string}&offset=0&limit=10')
results_json = results.json()
#print(json.dumps(results_json,indent=4))

In [5]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

In [8]:
input_node_id_list = list(results_json.keys())
print(f"IDs related to 'Buprenorphine': {input_node_id_list}")
# pp.pprint(results_json)

IDs related to 'Buprenorphine': ['PUBCHEM.COMPOUND:644073', 'PUBCHEM.COMPOUND:9848990', 'UMLS:C0524040', 'PUBCHEM.COMPOUND:9811785', 'UMLS:C0799646', 'PUBCHEM.COMPOUND:91745467', 'UMLS:C0701445', 'PUBCHEM.COMPOUND:3033050', 'UMLS:C0366373', 'UMLS:C1171048']


For confirmation of the labels for each of these IDs, the Node Normalizer tool can be used to show contents.

In [9]:
nn_query = {
  "curies": input_node_id_list,
  "conflate": True
}
results = requests.post('https://nodenormalization-sri.renci.org/get_normalized_nodes',json=nn_query)

This query asks "Find me a Biological Process or Activity, or a Gene, or a Pathway that is related to both `HP:0001337` (Tremor) and the list of IDs related to `Buprenorphine` found searching the Name Resolver.

In [6]:
query={
    "message": {
      "query_graph": {
        "edges": {
          "e00": {
            "subject": "n00",
              "object": "n01",
          "predicates":["biolink:related_to"]
          },
          "e01": {
            "subject": "n01",
              "object": "n02",
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          "n00": {
            "ids": input_node_id_list,
            "categories": ["biolink:ChemicalEntity"]
          },
          "n01": {
              "categories": ["biolink:BiologicalProcessOrActivity","biolink:Gene","biolink:Pathway"]
          },
          "n02": {
            "ids": ["HP:0001337"],
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }
        }
      }
    }
  }

Like in `HelloRobokop_TRAPI.ipynb`, the TRAPI query can be sent to automat like below.  Because more search terms are being added to the query, more results are expected.  In the example with searching for some `gene` being related to `Buprenorphine` or `Tremors`, seven results were returned.  Now we are searching for some `gene` related to `Tremors` and a whole list of terms.  In this example, eight results are returned below instead of seven.

In [7]:
robokop_submit_url = "https://automat.renci.org/robokopkg/1.3/query"
response = requests.post(robokop_submit_url,json=query)
print(response.status_code)
number_pathway_results = len(response.json()['message']['results'])
print(len(response.json()['message']['results']))

200
8


In [8]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

The response in JSON form is a python dictionary with three main keys, the `message`, `log_level`, and `workflow`.  The `message` property contains `query_graph`, `knowledge_graph`, and `results` from the query.

In [9]:
print(response.json().keys())
print(response.json()['message'].keys())

dict_keys(['message', 'log_level', 'workflow'])
dict_keys(['query_graph', 'knowledge_graph', 'results'])


The `results` property contains pathways resulting from the query message. Each pathway is organized into edge_bindings and node bindings and contains results for the edges and nodes specified in the query message.  Results are represented in identifier form, which can be found in the `knowledge_graph` property of the `message` section of the response.

In [10]:
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '114862569'},
                                       {'attributes': None, 'id': '89338256'},
                                       {'attributes': None, 'id': '8697234'}],
                           'e01': [{'attributes': None, 'id': '59984979'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'PUBCHEM.COMPOUND:644073',
                                            'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:1565',
                                            'query_id': None}],
                           'n02': [    {    'attributes': None,
                                            'id': 'HP:0025387',
                                            'qnode_id': 'HP:0001337',
    

The `knowledge_graph` contains information about each of the Nodes and Edges found in `results`.  An example of a Node and an Edge are shown below.

In [11]:
pp.pprint(response.json()['message']['knowledge_graph'].keys())

dict_keys(['nodes', 'edges'])


Information returned for the each Node includes the concept ID (key), biolink categories, the name/label, attributes, the value type, and others.  Note that each entry under the `nodes` level is itemized in dictionary format and not lists.  The content for one Node is shown below.

In [12]:
next(iter( response.json()['message']['knowledge_graph']['nodes'].items() ))

('HP:0012164',
 {'categories': ['biolink:Entity',
   'biolink:DiseaseOrPhenotypicFeature',
   'biolink:PhenotypicFeature',
   'biolink:NamedThing',
   'biolink:BiologicalEntity',
   'biolink:ThingWithTaxon'],
  'name': 'Asterixis',
  'attributes': [{'attribute_type_id': 'biolink:same_as',
    'value': ['MEDDRA:10003547',
     'MEDDRA:10057580',
     'HP:0012164',
     'NCIT:C86048',
     'UMLS:C0232766',
     'SNOMEDCT:32838008'],
    'value_type_id': 'metatype:uriorcurie',
    'original_attribute_name': 'equivalent_identifiers',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None}]})

Information returned for the each Edge includes the edge ID (key), the subject's concept ID, the object's concept ID, the edge's predicate, any qualifiers, and attributes.  Note that each entry under the `edges` level is itemized in dictionary format and not lists.  The content for one Edge is shown below.

In [13]:
next(iter( response.json()['message']['knowledge_graph']['edges'].items() ))

('1498880',
 {'subject': 'NCBIGene:1565',
  'object': 'HP:0002322',
  'predicate': 'biolink:genetic_association',
  'qualifiers': None,
  'attributes': [{'attribute_type_id': 'biolink:aggregator_knowledge_source',
    'value': ['infores:pharos', 'infores:automat-robokop'],
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:aggregator_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:primary_knowledge_source',
    'value': 'infores:disgenet',
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:primary_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None}]})

In [14]:
# Illustrating the structure of each pathway result from the message property
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '114862569'},
                                       {'attributes': None, 'id': '89338256'},
                                       {'attributes': None, 'id': '8697234'}],
                           'e01': [{'attributes': None, 'id': '59984979'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'PUBCHEM.COMPOUND:644073',
                                            'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:1565',
                                            'query_id': None}],
                           'n02': [    {    'attributes': None,
                                            'id': 'HP:0025387',
                                            'qnode_id': 'HP:0001337',
    

In [15]:
from datetime import datetime
from pathlib import Path

now = datetime.now()
dt_string = now.strftime("%Y-%m-%d_%H%M%S")
write_dir = Path("output/TRAPI",str(dt_string))
write_dir.mkdir(parents=True, exist_ok=True)

The code below writes out all of the pathway results returned, NOT the edges for each pathway.

In [16]:
import pandas as pd
import os

kg = response.json()['message']['knowledge_graph']
cols = []
for node in sorted(response.json()['message']['results'][0]['node_bindings'].keys()):
    cols.append(node)
    cols.append(node + '_name')
results_df = pd.DataFrame(columns = cols)

results_list = []
for result in response.json()['message']['results']:
    result_dict = {}
    for node in result['node_bindings'].keys():
        node_id = result['node_bindings'][node][0]['id']
        result_dict[node] = node_id
        result_dict[node + '_name'] = kg['nodes'][node_id]['name']
    # print(result_dict)

    results_list.append(pd.DataFrame([result_dict]))
results_df = pd.concat(results_list)
display(results_df)
results_df.to_csv(os.path.join(write_dir,'results_TRAPI_multi_ID.csv'), index=False)

combined_node_list = ["_".join([row[1].replace(" ", "_"), row[3].replace(" ", "_"), row[5].replace(" ", "_")]) for row in results_df[cols].to_numpy()]
pp.pprint(combined_node_list)

Unnamed: 0,n02,n02_name,n01,n01_name,n00,n00_name
0,HP:0025387,Pill-rolling tremor,NCBIGene:1565,CYP2D6,PUBCHEM.COMPOUND:644073,Buprenorphine
0,HP:0200085,Limb tremor,NCBIGene:1565,CYP2D6,PUBCHEM.COMPOUND:644073,Buprenorphine
0,HP:0012164,Asterixis,NCBIGene:4988,OPRM1,PUBCHEM.COMPOUND:9848990,Brixadi
0,HP:0002345,Action tremor,NCBIGene:1565,CYP2D6,PUBCHEM.COMPOUND:644073,Buprenorphine
0,HP:0012164,Asterixis,NCBIGene:4988,OPRM1,PUBCHEM.COMPOUND:644073,Buprenorphine
0,HP:0001337,Tremor,NCBIGene:1565,CYP2D6,PUBCHEM.COMPOUND:644073,Buprenorphine
0,HP:0002322,Resting tremor,NCBIGene:1565,CYP2D6,PUBCHEM.COMPOUND:644073,Buprenorphine
0,HP:0002174,Postural tremor,NCBIGene:1565,CYP2D6,PUBCHEM.COMPOUND:644073,Buprenorphine


[    'Buprenorphine_CYP2D6_Pill-rolling_tremor',
     'Buprenorphine_CYP2D6_Limb_tremor',
     'Brixadi_OPRM1_Asterixis',
     'Buprenorphine_CYP2D6_Action_tremor',
     'Buprenorphine_OPRM1_Asterixis',
     'Buprenorphine_CYP2D6_Tremor',
     'Buprenorphine_CYP2D6_Resting_tremor',
     'Buprenorphine_CYP2D6_Postural_tremor']


The following writes out each unique edge for each of the pathways in the format of `subject` -> `predicate` -> `object`.

In [17]:
from collections import Counter
import json
import pprint
pp = pprint.PrettyPrinter(indent=5)

for i in range(number_pathway_results):
    # if i >= 1:
        # break
    print(f"Pathway result: {combined_node_list[i]}")
    edge_bindings = response.json()['message']['results'][i]['edge_bindings']
    # print(edge_bindings)

    edge_ids = []
    for edge_name, edge_list in edge_bindings.items():
        edge_ids.append({edge_name: [x['id'] for x in edge_list]})

    # print(edge_ids)
    string_out_list = []
    for edge_dict in edge_ids:
        for edge_name, edge_list in edge_dict.items():
            # print(f"{edge_name}: {edge_list}")
            for edge_id in edge_list:
                subject_id = response.json()['message']['knowledge_graph']['edges'][edge_id]['subject']
                subject = response.json()['message']['knowledge_graph']['nodes'][subject_id]['name']
                predicate = response.json()['message']['knowledge_graph']['edges'][edge_id]['predicate']
                object_id = response.json()['message']['knowledge_graph']['edges'][edge_id]['object']
                object = response.json()['message']['knowledge_graph']['nodes'][object_id]['name']
                string_out = f"{subject} -> {predicate} -> {object}"
                string_out_list.append(string_out)
    string_out_dict = dict(Counter(string_out_list).items())
    pp.pprint(string_out_dict)
    print("")
    
    with open(os.path.join(write_dir,combined_node_list[i]+".txt"), 'w') as convert_file:
        convert_file.write(json.dumps(string_out_dict))
        

Pathway result: Buprenorphine_CYP2D6_Pill-rolling_tremor
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenorphine': 1,
     'CYP2D6 -> biolink:genetic_association -> Pill-rolling tremor': 1}

Pathway result: Buprenorphine_CYP2D6_Limb_tremor
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenorphine': 1,
     'CYP2D6 -> biolink:genetic_association -> Limb tremor': 1}

Pathway result: Brixadi_OPRM1_Asterixis
{    'Brixadi -> biolink:affects -> OPRM1': 1,
     'OPRM1 -> biolink:genetic_association -> Asterixis': 1}

Pathway result: Buprenorphine_CYP2D6_Action_tremor
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenor

## Aragorn

The results above are just database matches, there are no scores or other additions.  You can instead send the TRAPI to the robokop application using Aragorn (rather than just to the graph)

In [18]:
ara_robokop_submit_url = "https://aragorn.renci.org/robokop/query"
response_ara = requests.post(ara_robokop_submit_url,json=query)

In [19]:
response_ara.status_code

200

In [20]:
len(response_ara.json()['message']['results'])
pp.pprint(response_ara.json()['message']['results'][0].keys())
pp.pprint(response_ara.json()['message']['results'][0]['node_bindings'])
pp.pprint(response_ara.json()['message']['results'][0]['edge_bindings'].keys())
pp.pprint(response_ara.json()['message']['results'][0]['score'])

dict_keys(['node_bindings', 'edge_bindings', 'score'])
{    'n00': [    {    'id': 'PUBCHEM.COMPOUND:644073',
                      'qnode_id': 'PUBCHEM.COMPOUND:644073'}],
     'n01': [{'id': 'NCBIGene:1565'}],
     'n02': [{'id': 'HP:0001337', 'qnode_id': 'HP:0001337'}]}
dict_keys(['e00', 'e01', 's2', 's5', 's6'])
0.4306377466353769


In [22]:
kg = response_ara.json()['message']['knowledge_graph']
cols = []
for node in sorted(response_ara.json()['message']['results'][0]['node_bindings'].keys()):
    cols.append(node)
    cols.append(node + '_name')
results_df = pd.DataFrame(columns = cols)

for result in response_ara.json()['message']['results']:
    result_dict = {}
    for node in result['node_bindings'].keys():
        node_id = result['node_bindings'][node][0]['id']
        result_dict[node] = node_id
        result_dict[node + '_name'] = kg['nodes'][node_id]['name']
    result_dict['score'] = result['score']
    #print(result_dict)

    results_df = results_df.append(result_dict, ignore_index=True)
display(results_df)
results_df.to_csv(os.path.join(write_dir,'results_TRAPI_multi_ID_aragorn.csv'))


  results_df = results_df.append(result_dict, ignore_index=True)
  results_df = results_df.append(result_dict, ignore_index=True)
  results_df = results_df.append(result_dict, ignore_index=True)
  results_df = results_df.append(result_dict, ignore_index=True)
  results_df = results_df.append(result_dict, ignore_index=True)
  results_df = results_df.append(result_dict, ignore_index=True)
  results_df = results_df.append(result_dict, ignore_index=True)
  results_df = results_df.append(result_dict, ignore_index=True)


Unnamed: 0,n00,n00_name,n01,n01_name,n02,n02_name,score
0,PUBCHEM.COMPOUND:644073,Buprenorphine,NCBIGene:1565,CYP2D6,HP:0001337,Tremor,0.430638
1,PUBCHEM.COMPOUND:9848990,Brixadi,NCBIGene:4988,OPRM1,HP:0012164,Asterixis,0.277682
2,PUBCHEM.COMPOUND:644073,Buprenorphine,NCBIGene:4988,OPRM1,HP:0012164,Asterixis,0.263952
3,PUBCHEM.COMPOUND:644073,Buprenorphine,NCBIGene:1565,CYP2D6,HP:0002322,Resting tremor,0.235553
4,PUBCHEM.COMPOUND:644073,Buprenorphine,NCBIGene:1565,CYP2D6,HP:0002345,Action tremor,0.234348
5,PUBCHEM.COMPOUND:644073,Buprenorphine,NCBIGene:1565,CYP2D6,HP:0002174,Postural tremor,0.233694
6,PUBCHEM.COMPOUND:644073,Buprenorphine,NCBIGene:1565,CYP2D6,HP:0025387,Pill-rolling tremor,0.233694
7,PUBCHEM.COMPOUND:644073,Buprenorphine,NCBIGene:1565,CYP2D6,HP:0200085,Limb tremor,0.233694
