In [3]:
import requests
import json

TRAPI Documentation: https://github.com/NCATSTranslator/ReasonerAPI

Most TRAPI documents contain a `message` key.  Within that `message` are a `query_graph` denoting the user query,
a `knowledge_graph` consisting of the union of all nodes and edges that match the `query_graph` pattern, and a list of `results` that bind `query_graph` elements to `knowledge_graph` elements.

The following message contains only a `query_graph`.  This query graph consists of 3 nodes connected together in a line.   Two of the nodes (`n00` and `n02`) have specified identifiers, while the middle node of the line does not.  Rather the middle node has a list of `categories` that are acceptable.

For a researcher who is starting from a `name` who wants to use TRAPI, they can use the Name Resolver tool to get the identifers for the nodes.  This is illustrated in the "HelloRobokop_TRAPI_multiple_IDs" notebook. 

This query asks "Find me a Biological Process or Activity, or a Gene, or a Pathway that is related to both `PUBCHEM.COMPOUND:644073` (Buprenorphine) and `HP:0001337` (Tremor).

In [2]:
query={
    "message": {
      "query_graph": {
        "edges": {
          "e00": {
            "subject": "n00",
              "object": "n01",
          "predicates":["biolink:related_to"]
          },
          "e01": {
            "subject": "n01",
              "object": "n02",
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          "n00": {
            "ids": ["PUBCHEM.COMPOUND:644073"],
            "categories": ["biolink:ChemicalEntity"]
          },
          "n01": {
              "categories": ["biolink:BiologicalProcessOrActivity","biolink:Gene","biolink:Pathway"]
          },
          "n02": {
            "ids": ["HP:0001337"],
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }
        }
      }
    }
  }


This query can be sent to various components of Translator as needed.  It can be sent directly to the ROBOKOP database like this:

In [3]:
robokop_submit_url = "https://automat.renci.org/robokopkg/1.3/query"
response = requests.post(robokop_submit_url,json=query)
print(response.status_code)
number_pathway_results = len(response.json()['message']['results'])
print(len(response.json()['message']['results']))

200
7


In [4]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

The response in JSON form is a python dictionary with three main keys, the `message`, `log_level`, and `workflow`.  The `message` property contains the `query_graph`, `knowledge_graph`, and `results` from the query.

In [5]:
print(response.json().keys())
print(response.json()['message'].keys())

dict_keys(['message', 'log_level', 'workflow'])
dict_keys(['query_graph', 'knowledge_graph', 'results'])


The `results` property contains pathways resulting from the query message. Each pathway is organized into edge_bindings and node bindings and contains results for the edges and nodes specified in the query message.  Results are defined using the node and edge identifiers. The attributes for those nodes and edges (including the names) are available via the `knowledge_graph` component of the `message` section of the response.

In [6]:
pp.pprint(response.json()['message']['results'])

[    {    'edge_bindings': {    'e00': [    {    'attributes': None,
                                                 'id': '114862569'},
                                            {    'attributes': None,
                                                 'id': '89338256'},
                                            {    'attributes': None,
                                                 'id': '8697234'}],
                                'e01': [    {    'attributes': None,
                                                 'id': '51570037'},
                                            {    'attributes': None,
                                                 'id': '76915076'}]},
          'node_bindings': {    'n00': [    {    'attributes': None,
                                                 'id': 'PUBCHEM.COMPOUND:644073',
                                                 'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                                 'query_id': None}],
    

The `knowledge_graph` contains information about each of the Nodes and Edges found in `results`.  An example of a Node and an Edge are shown below.  

In [10]:
pp.pprint(response.json()['message']['knowledge_graph'].keys())

dict_keys(['nodes', 'edges'])


Information returned for the each Node includes the concept ID (key), biolink categories, the name/label, attributes, the value type, and others.  Note that each entry under the `nodes` level is itemized in dictionary format with the key corresponding to the identifier used to define the `Result`.  The content for one Node is shown below.

In [16]:
next(iter( response.json()['message']['knowledge_graph']['nodes'].items() ))

('HP:0012164',
 {'categories': ['biolink:Entity',
   'biolink:DiseaseOrPhenotypicFeature',
   'biolink:PhenotypicFeature',
   'biolink:NamedThing',
   'biolink:BiologicalEntity',
   'biolink:ThingWithTaxon'],
  'name': 'Asterixis',
  'attributes': [{'attribute_type_id': 'biolink:same_as',
    'value': ['MEDDRA:10003547',
     'MEDDRA:10057580',
     'HP:0012164',
     'NCIT:C86048',
     'UMLS:C0232766',
     'SNOMEDCT:32838008'],
    'value_type_id': 'metatype:uriorcurie',
    'original_attribute_name': 'equivalent_identifiers',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None}]})

Information returned for the each Edge includes the edge ID (key), the subject's concept ID, the object's concept ID, the edge's predicate, any qualifiers, and attributes.  Note that each entry under the `edges` level is itemized in dictionary format  with the key corresponding to the identifier used to define the `Result`.  The content for one Edge is shown below.

In [17]:
next(iter( response.json()['message']['knowledge_graph']['edges'].items() ))

('1498880',
 {'subject': 'NCBIGene:1565',
  'object': 'HP:0002322',
  'predicate': 'biolink:genetic_association',
  'qualifiers': None,
  'attributes': [{'attribute_type_id': 'biolink:aggregator_knowledge_source',
    'value': ['infores:pharos', 'infores:automat-robokop'],
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:aggregator_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:primary_knowledge_source',
    'value': 'infores:disgenet',
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:primary_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None}]})

Since the edge and node contents are returned as CURIES (Compact URIs) instead of labels, a sample workflow was written to convert the CURIES to labels. We start by collecting the node names and storing them in a dictionary.

In [13]:
import pandas as pd
from pathlib import Path

nodes = response.json()['message']['results'][0]['node_bindings'].keys()
output_dict = {}
for key in nodes:
    output_dict[key] = {}

In [14]:
# Illustrating the structure of each pathway result from the message property
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '114862569'},
                                       {'attributes': None, 'id': '89338256'},
                                       {'attributes': None, 'id': '8697234'}],
                           'e01': [    {'attributes': None, 'id': '51570037'},
                                       {'attributes': None, 'id': '76915076'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'PUBCHEM.COMPOUND:644073',
                                            'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:1565',
                                            'query_id': None}],
                           'n02': [    {    'attributes': None,
                                            'id': 'HP:0

First we pull the node identifiers from the `results` component of the answer.

In [16]:
i = 0
for entry in response.json()['message']['results']:
    for node_id in nodes:
        output_dict[node_id][i] = entry['node_bindings'][node_id][0]['id']
    i = i + 1
    
pp.pprint(output_dict)

{    'n00': {    0: 'PUBCHEM.COMPOUND:644073',
                 1: 'PUBCHEM.COMPOUND:644073',
                 2: 'PUBCHEM.COMPOUND:644073',
                 3: 'PUBCHEM.COMPOUND:644073',
                 4: 'PUBCHEM.COMPOUND:644073',
                 5: 'PUBCHEM.COMPOUND:644073',
                 6: 'PUBCHEM.COMPOUND:644073'},
     'n01': {    0: 'NCBIGene:1565',
                 1: 'NCBIGene:1565',
                 2: 'NCBIGene:1565',
                 3: 'NCBIGene:1565',
                 4: 'NCBIGene:4988',
                 5: 'NCBIGene:1565',
                 6: 'NCBIGene:1565'},
     'n02': {    0: 'HP:0001337',
                 1: 'HP:0200085',
                 2: 'HP:0002345',
                 3: 'HP:0002174',
                 4: 'HP:0012164',
                 5: 'HP:0002322',
                 6: 'HP:0025387'}}


Then we substitute the names for each node, which are obtained from the knowledge_graph component of the answer using the identifiers shown above.

In [17]:
node_list = [x for x in output_dict.keys() if 'n' in x]

for node_id in node_list:
    for node_index, node_value in output_dict[node_id].items():
        # Converting from curies to Node Labels
        output_dict[node_id][node_index] = response.json()['message']['knowledge_graph']['nodes'][node_value]['name']
        
pp.pprint(output_dict)

{    'n00': {    0: 'Buprenorphine',
                 1: 'Buprenorphine',
                 2: 'Buprenorphine',
                 3: 'Buprenorphine',
                 4: 'Buprenorphine',
                 5: 'Buprenorphine',
                 6: 'Buprenorphine'},
     'n01': {    0: 'CYP2D6',
                 1: 'CYP2D6',
                 2: 'CYP2D6',
                 3: 'CYP2D6',
                 4: 'OPRM1',
                 5: 'CYP2D6',
                 6: 'CYP2D6'},
     'n02': {    0: 'Tremor',
                 1: 'Limb tremor',
                 2: 'Action tremor',
                 3: 'Postural tremor',
                 4: 'Asterixis',
                 5: 'Resting tremor',
                 6: 'Pill-rolling tremor'}}


To export the node information into a csv file, we create a directory with today's date to hold the output file and then convert our output dictionary into a table, which can be printed to the file.

In [20]:
from datetime import datetime
from pathlib import Path

now = datetime.now()
dt_string = now.strftime("%Y-%m-%d_%H%M%S")
write_dir = Path("output/TRAPI",str(dt_string))
write_dir.mkdir(parents=True, exist_ok=True)

The code below writes out nodes from our `results`, but NOT the edges connecting the nodes.

In [21]:
import pandas as pd
import os

# Convert JSON String to CSV File
# Read json from String
json_str = json.dumps(output_dict, indent=4)
df = pd.read_json(json_str)
cols = df.columns.tolist()
cols.sort()
df = df[cols]
df.to_csv(os.path.join(write_dir,'results_TRAPI_automat.csv'))

combined_node_list = ["_".join([row[0].replace(" ", "_"), row[1].replace(" ", "_"), row[2].replace(" ", "_")]) for row in df[cols].to_numpy()]
pp.pprint(combined_node_list)

[    'Buprenorphine_CYP2D6_Tremor',
     'Buprenorphine_CYP2D6_Limb_tremor',
     'Buprenorphine_CYP2D6_Action_tremor',
     'Buprenorphine_CYP2D6_Postural_tremor',
     'Buprenorphine_OPRM1_Asterixis',
     'Buprenorphine_CYP2D6_Resting_tremor',
     'Buprenorphine_CYP2D6_Pill-rolling_tremor']


The edge IDs are then retrieved from the `results` and used to find the corresponding `predicate` label in the `knowledge_graph`. Two nodes can have more than one edge connecting them because each edge represents a distinct type of association derived from a single data source. Because of this, the number of rows in this report may differ from the set of results above. For an answer with three nodes such as this one, there will be at least one edge between nodes 1 and 2 as well as nodes 2 and 3. The direction of the edge connecting two nodes can differ as well, such as for "Buprenorphine" and "CYP2D6" below. Sometimes a single association type will be derived from multiple data sources. In the code below, we print a single edge for each association type and include the count of the number of data sources where that association was found. The following writes out all unique edges for each of the results in the format of `subject` -> `predicate` -> `object`.

In [23]:
from collections import Counter
import json
import pprint
pp = pprint.PrettyPrinter(indent=5)

for i in range(number_pathway_results):
    # if i >= 1:
        # break
    print(f"Pathway result: {combined_node_list[i]}")
    edge_bindings = response.json()['message']['results'][i]['edge_bindings']
    # print(edge_bindings)

    edge_ids = []
    for edge_name, edge_list in edge_bindings.items():
        edge_ids.append({edge_name: [x['id'] for x in edge_list]})

    # print(edge_ids)
    string_out_list = []
    for edge_dict in edge_ids:
        for edge_name, edge_list in edge_dict.items():
            # print(f"{edge_name}: {edge_list}")
            for edge_id in edge_list:
                subject_id = response.json()['message']['knowledge_graph']['edges'][edge_id]['subject']
                subject = response.json()['message']['knowledge_graph']['nodes'][subject_id]['name']
                predicate = response.json()['message']['knowledge_graph']['edges'][edge_id]['predicate']
                object_id = response.json()['message']['knowledge_graph']['edges'][edge_id]['object']
                object = response.json()['message']['knowledge_graph']['nodes'][object_id]['name']
                string_out = f"{subject} -> {predicate} -> {object}"
                string_out_list.append(string_out)
    string_out_dict = dict(Counter(string_out_list).items())
    pp.pprint(string_out_dict)
    print("")
    
    with open(os.path.join(write_dir,combined_node_list[i]+".txt"), 'w') as convert_file:
        convert_file.write(json.dumps(string_out_dict))
        

Pathway result: Buprenorphine_CYP2D6_Tremor
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenorphine': 1,
     'CYP2D6 -> biolink:genetic_association -> Tremor': 2}

Pathway result: Buprenorphine_CYP2D6_Limb_tremor
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenorphine': 1,
     'CYP2D6 -> biolink:genetic_association -> Limb tremor': 1}

Pathway result: Buprenorphine_CYP2D6_Action_tremor
{    'Buprenorphine -> biolink:affects -> CYP2D6': 1,
     'Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6': 1,
     'CYP2D6 -> biolink:affects -> Buprenorphine': 1,
     'CYP2D6 -> biolink:genetic_association -> Action tremor': 1}

Pathway result: Buprenorphine_CYP2D6_Postural_tremor
{    'Buprenorphine -> biolink:affects -> CY

## Aragorn.

The results above are just database matches, there are no scores or other additions.  You can instead send the TRAPI to the robokop application using Aragorn (rather than just to the graph)

In [24]:
ara_robokop_submit_url = "https://aragorn.renci.org/robokop/query"
response_ara = requests.post(ara_robokop_submit_url,json=query)

In [25]:
response_ara.status_code

200

In [75]:
len(response_ara.json()['message']['results'])
pp.pprint(response_ara.json()['message']['results'][0].keys())
pp.pprint(response_ara.json()['message']['results'][0]['node_bindings'])
pp.pprint(response_ara.json()['message']['results'][0]['edge_bindings'].keys())
pp.pprint(response_ara.json()['message']['results'][0]['score'])

dict_keys(['node_bindings', 'edge_bindings', 'score'])
{    'n00': [    {    'id': 'PUBCHEM.COMPOUND:644073',
                      'qnode_id': 'PUBCHEM.COMPOUND:644073'}],
     'n01': [{'id': 'NCBIGene:1565'}],
     'n02': [{'id': 'HP:0001337', 'qnode_id': 'HP:0001337'}]}
dict_keys(['e00', 'e01', 's2', 's5', 's6'])
0.4306377466353769


The following assumes that the node names will sort in the correct order. This exports the results showing the nodes and the score assigned to each result.

In [84]:
kg = response_ara.json()['message']['knowledge_graph']
cols = []
for node in sorted(response_ara.json()['message']['results'][0]['node_bindings'].keys()):
    cols.append(node)
    cols.append(node + '_name')
results_df = pd.DataFrame(columns = cols)

for result in response_ara.json()['message']['results']:
    result_dict = {}
    for node in result['node_bindings'].keys():
        node_id = result['node_bindings'][node][0]['id']
        result_dict[node] = node_id
        result_dict[node + '_name'] = kg['nodes'][node_id]['name']
    result_dict['score'] = result['score']
    #print(result_dict)

    results_df = results_df.append(result_dict, ignore_index=True)
print(results_df)
results_df.to_csv(os.path.join(write_dir,'results_TRAPI_aragorn.csv'))


                       n00       n00_name            n01 n01_name         n02  \
0  PUBCHEM.COMPOUND:644073  Buprenorphine  NCBIGene:1565   CYP2D6  HP:0001337   
1  PUBCHEM.COMPOUND:644073  Buprenorphine  NCBIGene:4988    OPRM1  HP:0012164   
2  PUBCHEM.COMPOUND:644073  Buprenorphine  NCBIGene:1565   CYP2D6  HP:0002322   
3  PUBCHEM.COMPOUND:644073  Buprenorphine  NCBIGene:1565   CYP2D6  HP:0002345   
4  PUBCHEM.COMPOUND:644073  Buprenorphine  NCBIGene:1565   CYP2D6  HP:0025387   
5  PUBCHEM.COMPOUND:644073  Buprenorphine  NCBIGene:1565   CYP2D6  HP:0200085   
6  PUBCHEM.COMPOUND:644073  Buprenorphine  NCBIGene:1565   CYP2D6  HP:0002174   

              n02_name     score  
0               Tremor  0.430638  
1            Asterixis  0.263952  
2       Resting tremor  0.235553  
3        Action tremor  0.234348  
4  Pill-rolling tremor  0.233694  
5          Limb tremor  0.233694  
6      Postural tremor  0.233694  
