In [11]:
import requests
import json

TRAPI Documentation: https://github.com/NCATSTranslator/ReasonerAPI

Most TRAPI documents contain a `message` key.  Within that `message` are a `query_graph` denoting the user query,
a `knowledge_graph` consisting of the union of all nodes and edges that match the `query_graph` pattern, and a list of `results` that bind `query_graph` elements to `knowledge_graph` elements.

The following message contains only a `query_graph`.  This query graph consists of 3 nodes connected together in a line.   Two of the nodes (`n00` and `n02`) have specified identifiers, while the middle node of the line does not.  Rather the middle node has a list of `categories` that are acceptable.

For a researcher who is starting from a `name` who wants to use TRAPI, they can use the Node Resolver tool to get the identifers for the nodes.  For example, finding IDs for `Tremor` and `Buprenorphine` are below.

This query asks "Find me a Biological Process or Activity, or a Gene, or a Pathway that is related to both `PUBCHEM.COMPOUND:644073` (Buprenorphine) and `HP:0001337` (Tremor).

In [12]:
query={
    "message": {
      "query_graph": {
        "edges": {
          "e00": {
            "subject": "n00",
              "object": "n01",
          "predicates":["biolink:related_to"]
          },
          "e01": {
            "subject": "n01",
              "object": "n02",
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          "n00": {
            "ids": ["PUBCHEM.COMPOUND:644073"],
            "categories": ["biolink:ChemicalEntity"]
          },
          "n01": {
              "categories": ["biolink:BiologicalProcessOrActivity","biolink:Gene","biolink:Pathway"]
          },
          "n02": {
            "ids": ["HP:0001337"],
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }
        }
      }
    }
  }


This query can be sent to various components of Translator as needed.  It can be sent directly to the ROBOKOP database like this:

In [31]:
robokop_submit_url = "https://automat.renci.org/robokopkg/1.3/query"
response = requests.post(robokop_submit_url,json=query)
print(response.status_code)
number_pathway_results = len(response.json()['message']['results'])
print(len(response.json()['message']['results']))

200
7


In [16]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

The response in JSON form is a python dictionary with three main keys, the `message`, `log_level`, and `workflow`.  The `message` component contains the `query_graph`, `knowledge_graph`, and `results` from the query.

In [17]:
print(response.json().keys())
print(response.json()['message'].keys())

dict_keys(['message', 'log_level', 'workflow'])
dict_keys(['query_graph', 'knowledge_graph', 'results'])


The `results` component contains pathways resulting from the query message. Each pathway is organized into edge_bindings and node bindings and contains results for the edges and nodes specified in the query message.  Results are represented in identifier form, which can be found in the `knowledge_graph` component of the `message` section of the response.

In [18]:
pp.pprint(response.json()['message']['results'])

[    {    'edge_bindings': {    'e00': [    {    'attributes': None,
                                                 'id': '114007302'},
                                            {    'attributes': None,
                                                 'id': '53316700'},
                                            {    'attributes': None,
                                                 'id': '73767346'},
                                            {    'attributes': None,
                                                 'id': '69960824'},
                                            {    'attributes': None,
                                                 'id': '17217758'},
                                            {    'attributes': None,
                                                 'id': '99662894'}],
                                'e01': [    {    'attributes': None,
                                                 'id': '122487516'}]},
          'node_bindings': {    'n00

In [None]:
pp.pprint(response.json()['message']['knowledge_graph'])

The `knowledge_graph` contains information about each of the Nodes and Edges found in `results`.  An example of a Node and an Edge are shown below.

In [20]:
pp.pprint(response.json()['message']['knowledge_graph'].keys())


dict_keys(['nodes', 'edges'])


Information returned for the each Node includes the concept ID (key), biolink categories, the name/label, attributes, the value type, and others.  Note that each entry under the `nodes` level is itemized in dictionary format and not lists.  The content for one Node is shown below.

In [21]:
next(iter( response.json()['message']['knowledge_graph']['nodes'].items() ))

('HP:0012164',
 {'categories': ['biolink:Entity',
   'biolink:PhenotypicFeature',
   'biolink:DiseaseOrPhenotypicFeature',
   'biolink:NamedThing',
   'biolink:BiologicalEntity',
   'biolink:ThingWithTaxon'],
  'name': 'Asterixis',
  'attributes': [{'attribute_type_id': 'biolink:Attribute',
    'value': 100.0,
    'value_type_id': 'EDAM:data_0006',
    'original_attribute_name': 'information_content',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:same_as',
    'value': ['NCIT:C86048',
     'MEDDRA:10057580',
     'HP:0012164',
     'UMLS:C0232766',
     'SNOMEDCT:32838008',
     'MEDDRA:10003547'],
    'value_type_id': 'metatype:uriorcurie',
    'original_attribute_name': 'equivalent_identifiers',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None}]})

Information returned for the each Edge includes the edge ID (key), the subject's concept ID, the object's concept ID, the edge's predicate, any qualifiers, and attributes.  Note that each entry under the `edges` level is itemized in dictionary format and not lists.  The content for one Edge is shown below.

In [22]:
next(iter( response.json()['message']['knowledge_graph']['edges'].items() ))

('76927493',
 {'subject': 'NCBIGene:1565',
  'object': 'HP:0001337',
  'predicate': 'biolink:genetic_association',
  'qualifiers': None,
  'attributes': [{'attribute_type_id': 'biolink:primary_knowledge_source',
    'value': 'infores:disgenet',
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:primary_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:aggregator_knowledge_source',
    'value': ['infores:automat-robokop', 'infores:pharos'],
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:aggregator_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None}]})

Since the edge and node contents are returned as CURIES (Compact URIs) instead of labels, a sample workflow was written to show how to use the above edge information and the Node Normalizer tool to convert results into a human readble form.  This workflow does the following steps:
- combine edge_binding IDs and convert edge IDs to labels for each result
- use output from the `knowledge_graph` to convert node IDs to labels
- write the output dictionary as a CSV

In [23]:
import pandas as pd
from pathlib import Path

nodes = response.json()['message']['results'][0]['node_bindings'].keys()
output_dict = {}
for key in nodes:
    output_dict[key] = {}

In [24]:
# Illustrating the structure of each pathway result from the message component
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '114007302'},
                                       {'attributes': None, 'id': '53316700'},
                                       {'attributes': None, 'id': '73767346'},
                                       {'attributes': None, 'id': '69960824'},
                                       {'attributes': None, 'id': '17217758'},
                                       {'attributes': None, 'id': '99662894'}],
                           'e01': [{'attributes': None, 'id': '122487516'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'PUBCHEM.COMPOUND:644073',
                                            'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:4988',
                           

In [25]:
i = 0
for entry in response.json()['message']['results']:
    # for edge_id in edges:
    #     output_dict[edge_id][i] = entry['edge_bindings'][edge_id][0]['id']
    for node_id in nodes:
        output_dict[node_id][i] = entry['node_bindings'][node_id][0]['id']
    i = i + 1
    
pp.pprint(output_dict)

{    'n00': {    0: 'PUBCHEM.COMPOUND:644073',
                 1: 'PUBCHEM.COMPOUND:644073',
                 2: 'PUBCHEM.COMPOUND:644073',
                 3: 'PUBCHEM.COMPOUND:644073',
                 4: 'PUBCHEM.COMPOUND:644073',
                 5: 'PUBCHEM.COMPOUND:644073',
                 6: 'PUBCHEM.COMPOUND:644073'},
     'n01': {    0: 'NCBIGene:4988',
                 1: 'NCBIGene:1565',
                 2: 'NCBIGene:1565',
                 3: 'NCBIGene:1565',
                 4: 'NCBIGene:1565',
                 5: 'NCBIGene:1565',
                 6: 'NCBIGene:1565'},
     'n02': {    0: 'HP:0012164',
                 1: 'HP:0025387',
                 2: 'HP:0002322',
                 3: 'HP:0200085',
                 4: 'HP:0001337',
                 5: 'HP:0002174',
                 6: 'HP:0002345'}}


In [26]:
node_list = [x for x in output_dict.keys() if 'n' in x]

for node_id in node_list:
    for node_index, node_value in output_dict[node_id].items():
        # Getting the 'name' label using the node ID from the knowledge graph
        output_dict[node_id][node_index] = response.json()['message']['knowledge_graph']['nodes'][node_value]['name']
        
pp.pprint(output_dict)

{    'n00': {    0: 'Buprenorphine',
                 1: 'Buprenorphine',
                 2: 'Buprenorphine',
                 3: 'Buprenorphine',
                 4: 'Buprenorphine',
                 5: 'Buprenorphine',
                 6: 'Buprenorphine'},
     'n01': {    0: 'OPRM1',
                 1: 'CYP2D6',
                 2: 'CYP2D6',
                 3: 'CYP2D6',
                 4: 'CYP2D6',
                 5: 'CYP2D6',
                 6: 'CYP2D6'},
     'n02': {    0: 'Asterixis',
                 1: 'Pill-rolling tremor',
                 2: 'Resting tremor',
                 3: 'Limb tremor',
                 4: 'Tremor',
                 5: 'Postural tremor',
                 6: 'Action tremor'}}


In [27]:
from datetime import datetime
from pathlib import Path

now = datetime.now()
dt_string = now.strftime("%Y-%m-%d_%H%M%S")
write_dir = Path("output/TRAPI",str(dt_string))
write_dir.mkdir(parents=True, exist_ok=True)

The code below writes out all of the pathway results returned, NOT the edges for each pathway.

In [33]:
import pandas as pd
import os

# Convert JSON String to CSV File
# Read json from String
json_str = json.dumps(output_dict, indent=4)
df = pd.read_json(json_str)
cols = df.columns.tolist()
cols.sort()
df = df[cols]
df.to_csv(os.path.join(write_dir,'results_TRAPI_automat.csv'))

combined_node_list = ["_".join([row[0].replace(" ", "_"), row[1].replace(" ", "_"), row[2].replace(" ", "_")]) for row in df[cols].to_numpy()]
pp.pprint(combined_node_list)

The following writes out each unique edge for each of the pathways in the format of `subject` -> `predicate` -> `object`.

In [32]:
from collections import Counter
import json
import pprint
pp = pprint.PrettyPrinter(indent=5)

for i in range(number_pathway_results):
    # if i >= 1:
        # break
    print(f"Pathway result: {combined_node_list[i]}")
    edge_bindings = response.json()['message']['results'][i]['edge_bindings']
    # print(edge_bindings)

    edge_ids = []
    for edge_name, edge_list in edge_bindings.items():
        edge_ids.append({edge_name: [x['id'] for x in edge_list]})

    # print(edge_ids)
    string_out_list = []
    for edge_dict in edge_ids:
        for edge_name, edge_list in edge_dict.items():
            # print(f"{edge_name}: {edge_list}")
            for edge_id in edge_list:
                subject_id = response.json()['message']['knowledge_graph']['edges'][edge_id]['subject']
                subject = response.json()['message']['knowledge_graph']['nodes'][subject_id]['name']
                predicate = response.json()['message']['knowledge_graph']['edges'][edge_id]['predicate']
                object_id = response.json()['message']['knowledge_graph']['edges'][edge_id]['object']
                object = response.json()['message']['knowledge_graph']['nodes'][object_id]['name']
                string_out = f"{subject} -> {predicate} -> {object}"
                string_out_list.append(string_out)
    string_out_dict = dict(Counter(string_out_list).items())
    pp.pprint(string_out_dict)
    print("")
    
    with open(os.path.join(write_dir,combined_node_list[i]+".txt"), 'w') as convert_file:
        convert_file.write(json.dumps(string_out_dict))
        

Pathway result: Buprenorphine_OPRM1_Asterixis
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenorphine': 1,
     'CYP2D6 -> biolink:genetic_association -> Limb tremor': 1}

Pathway result: Buprenorphine_CYP2D6_Pill-rolling_tremor
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenorphine': 1,
     'CYP2D6 -> biolink:genetic_association -> Pill-rolling tremor': 1}

Pathway result: Buprenorphine_CYP2D6_Resting_tremor
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenorphine': 1,
     'CYP2D6 -> biolink:genetic_association -> Tremor': 2}

Pathway result: Buprenorphine_CYP2D6_Limb_tremor
{    'Buprenorphine -> biolink:

## Aragorn is being discontinued.  The code below is not expected to work.

The results above are just database matches, there are no scores or other additions.  You can instead send the TRAPI to the robokop application using Aragorn (rather than just to the graph)

In [None]:
ara_robokop_submit_url = "https://aragorn.renci.org/robokop/query"
response = requests.post(ara_robokop_submit_url,json=query)

In [50]:
response.status_code

500

In [48]:
len(response_ara.json()['message']['results'])

NameError: name 'response_ara' is not defined

In [None]:
pp.pprint(response_ara.json()['message']['results'])

In [None]:
for result in response.json()['message']['results']:
    print(result['score'])