In [1]:
import requests
import json

TRAPI Documentation: https://github.com/NCATSTranslator/ReasonerAPI

Most TRAPI documents contain a `message` key.  Within that `message` are a `query_graph` denoting the user query,
a `knowledge_graph` consisting of the union of all nodes and edges that match the `query_graph` pattern, and a list of `results` that bind `query_graph` elements to `knowledge_graph` elements.

The following message contains only a `query_graph`.  This query graph consists of 3 nodes connected together in a line.   Two of the nodes (`n00` and `n02`) have specified identifiers, while the middle node of the line does not.  Rather the middle node has a list of `categories` that are acceptable.

For a researcher who is starting from a `name` who wants to use TRAPI, they can use the Node Resolver tool to get the identifers for the nodes.  For example, finding IDs for `Tremor` and `Buprenorphine` are below.

This query asks "Find me a Biological Process or Activity, or a Gene, or a Pathway that is related to both `PUBCHEM.COMPOUND:644073` (Buprenorphine) and `HP:0001337` (Tremor).

In [2]:
query={
    "message": {
      "query_graph": {
        "edges": {
          "e00": {
            "subject": "n00",
              "object": "n01",
          "predicates":["biolink:related_to"]
          },
          "e01": {
            "subject": "n01",
              "object": "n02",
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          "n00": {
            "ids": ["PUBCHEM.COMPOUND:644073"],
            "categories": ["biolink:ChemicalEntity"]
          },
          "n01": {
              "categories": ["biolink:BiologicalProcessOrActivity","biolink:Gene","biolink:Pathway"]
          },
          "n02": {
            "ids": ["HP:0001337"],
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }
        }
      }
    }
  }


This query can be sent to various components of Translator as needed.  It can be sent directly to the ROBOKOP database like this:

In [3]:
robokop_submit_url = "https://automat.renci.org/robokopkg/1.3/query"
response = requests.post(robokop_submit_url,json=query)

In [4]:
print(response.status_code)

200


In [5]:
print(len(response.json()['message']['results']))

7


In [6]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

The response in JSON form is a python dictionary with three main keys, the `message`, `log_level`, and `workflow`.  The `message` component contains the `query_graph`, `knowledge_graph`, and `results` from the query.

In [7]:
print(response.json().keys())
print(response.json()['message'].keys())

dict_keys(['message', 'log_level', 'workflow'])
dict_keys(['query_graph', 'knowledge_graph', 'results'])


The `results` component contains pathways resulting from the query message. Each pathway is organized into edge_bindings and node bindings and contains results for the edges and nodes specified in the query message.  Results are represented in identifier form, which can be found in the `knowledge_graph` component of the `message` section of the response.

In [None]:
print(response.json()['message']['results'])

In [None]:
print(response.json()['message']['knowledge_graph'])

The `knowledge_graph` contains information about each of the Nodes and Edges found in `results`.  An example of a Node and an Edge are shown below.

In [53]:
pp.pprint(response.json()['message']['knowledge_graph'].keys())


dict_keys(['nodes', 'edges'])


Information returned for the each Node includes the concept ID (key), biolink categories, the name/label, attributes, the value type, and others.  Note that each entry under the `nodes` level is itemized in dictionary format and not lists.  The content for one Node is shown below.

In [54]:
next(iter( response.json()['message']['knowledge_graph']['nodes'].items() ))

('HP:0012164',
 {'categories': ['biolink:ThingWithTaxon',
   'biolink:DiseaseOrPhenotypicFeature',
   'biolink:BiologicalEntity',
   'biolink:Entity',
   'biolink:NamedThing',
   'biolink:PhenotypicFeature'],
  'name': 'Asterixis',
  'attributes': [{'attribute_type_id': 'biolink:same_as',
    'value': ['SNOMEDCT:32838008',
     'MEDDRA:10003547',
     'HP:0012164',
     'MEDDRA:10057580',
     'NCIT:C86048',
     'UMLS:C0232766'],
    'value_type_id': 'metatype:uriorcurie',
    'original_attribute_name': 'equivalent_identifiers',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:Attribute',
    'value': 100.0,
    'value_type_id': 'EDAM:data_0006',
    'original_attribute_name': 'information_content',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None}]})

Information returned for the each Edge includes the edge ID (key), the subject's concept ID, the object's concept ID, the edge's predicate, any qualifiers, and attributes.  Note that each entry under the `edges` level is itemized in dictionary format and not lists.  The content for one Edge is shown below.

In [55]:
next(iter( response.json()['message']['knowledge_graph']['edges'].items() ))

('29138268',
 {'subject': 'PUBCHEM.COMPOUND:644073',
  'object': 'NCBIGene:1565',
  'predicate': 'biolink:directly_physically_interacts_with',
  'qualifiers': None,
  'attributes': [{'attribute_type_id': 'biolink:aggregator_knowledge_source',
    'value': ['infores:automat-robokop'],
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:aggregator_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:Attribute',
    'value': ['DrugBank (enzyme)'],
    'value_type_id': 'EDAM:data_0006',
    'original_attribute_name': 'hetio_source',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:primary_knowledge_source',
    'value': 'infores:hetio',
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:primary_knowled

Since the edge and node contents are returned as CURIES (Compact URIs) instead of labels, a sample workflow was written to show how to use the above edge information and the Node Normalizer tool to convert results into a human readble form.  This workflow does the following steps:
- combine edge_binding IDs and convert edge IDs to labels for each result
- use output from the `knowledge_graph` to convert node IDs to labels
- write the output dictionary as a CSV

In [109]:
edges = response.json()['message']['results'][0]['edge_bindings'].keys()
nodes = response.json()['message']['results'][0]['node_bindings'].keys()
edges_nodes = sorted(edges|nodes)

edges_nodes_keys = [s for s in edges_nodes if not s.startswith('s')]
print(edges_nodes_keys)

# Initializing a base dictionary containing edge and node IDs from message results
base_dict = {}
for key in nodes:
    base_dict[key] = {}
print(base_dict)

['e00', 'e01', 'n00', 'n01', 'n02']
{'n02': {}, 'n01': {}, 'n00': {}}


In [36]:
# Illustrating the structure of each pathway result from the message component
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '52499821'},
                                       {'attributes': None, 'id': '74263186'},
                                       {'attributes': None, 'id': '17332614'},
                                       {'attributes': None, 'id': '70496519'},
                                       {'attributes': None, 'id': '53736753'},
                                       {'attributes': None, 'id': '114858437'}],
                           'e01': [{'attributes': None, 'id': '122488121'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'PUBCHEM.COMPOUND:644073',
                                            'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:4988',
                           

In [110]:
# - populate base_dict with entries from response.json()['message']['results']
i = 0
for entry in response.json()['message']['results']:
    # for edge_id in edges:
    #     base_dict[edge_id][i] = entry['edge_bindings'][edge_id][0]['id']
    for node_id in nodes:
        base_dict[node_id][i] = entry['node_bindings'][node_id][0]['id']
    i = i + 1
        
pp.pprint(base_dict)

{    'n00': {    0: 'PUBCHEM.COMPOUND:644073',
                 1: 'PUBCHEM.COMPOUND:644073',
                 2: 'PUBCHEM.COMPOUND:644073',
                 3: 'PUBCHEM.COMPOUND:644073',
                 4: 'PUBCHEM.COMPOUND:644073',
                 5: 'PUBCHEM.COMPOUND:644073',
                 6: 'PUBCHEM.COMPOUND:644073'},
     'n01': {    0: 'NCBIGene:4988',
                 1: 'NCBIGene:1565',
                 2: 'NCBIGene:1565',
                 3: 'NCBIGene:1565',
                 4: 'NCBIGene:1565',
                 5: 'NCBIGene:1565',
                 6: 'NCBIGene:1565'},
     'n02': {    0: 'HP:0012164',
                 1: 'HP:0002174',
                 2: 'HP:0025387',
                 3: 'HP:0200085',
                 4: 'HP:0002345',
                 5: 'HP:0001337',
                 6: 'HP:0002322'}}


In [111]:
# Manually extracting Buprenorphine -> CYP2D6 -> Tremor from Results and then extracting edge info from the Knowledge Graph

desired_result = [i for i in base_dict['n02'] if base_dict['n02'][i]=="HP:0001337"]
# print(desired_result[0])
# print(base_dict['n02'][desired_result[0]])
edge_bindings = response.json()['message']['results'][desired_result[0]]['edge_bindings']
# print(edge_bindings)

edge_ids = []
for edge_name, edge_list in edge_bindings.items():
    edge_ids.append({edge_name: [x['id'] for x in edge_list]})

# print(edge_ids)

for desired_edge_dict in edge_ids:
    # print(desired_edge_list.keys())
    # print(desired_edge_dict)
    # desired_edge_list = list(desired_edge_dict.values())[0]
    for edge_name, desired_edge_list in desired_edge_dict.items():
        print(f"{edge_name}: {desired_edge_list}")
        for edge_id in desired_edge_list:
            subject_id = response.json()['message']['knowledge_graph']['edges'][edge_id]['subject']
            subject = response.json()['message']['knowledge_graph']['nodes'][subject_id]['name']
            predicate = response.json()['message']['knowledge_graph']['edges'][edge_id]['predicate']
            object_id = response.json()['message']['knowledge_graph']['edges'][edge_id]['object']
            object = response.json()['message']['knowledge_graph']['nodes'][object_id]['name']
            print(f"{subject} -> {predicate} -> {object}")
            if edge_name == 'e01':
                print(response.json()['message']['knowledge_graph']['edges'][edge_id]['attributes'])
        print("\n")

e00: ['115753223', '29138268', '8786159']
Buprenorphine -> biolink:affects -> CYP2D6
Buprenorphine -> biolink:directly_physically_interacts_with -> CYP2D6
CYP2D6 -> biolink:affects -> Buprenorphine


e01: ['104677555', '14735496']
CYP2D6 -> biolink:genetic_association -> Tremor
[{'attribute_type_id': 'biolink:primary_knowledge_source', 'value': 'DisGeNET', 'value_type_id': 'biolink:InformationResource', 'original_attribute_name': 'biolink:primary_knowledge_source', 'value_url': None, 'attribute_source': 'infores:automat-robokop', 'description': None, 'attributes': None}, {'attribute_type_id': 'biolink:aggregator_knowledge_source', 'value': ['infores:pharos', 'infores:automat-robokop'], 'value_type_id': 'biolink:InformationResource', 'original_attribute_name': 'biolink:aggregator_knowledge_source', 'value_url': None, 'attribute_source': 'infores:automat-robokop', 'description': None, 'attributes': None}]
CYP2D6 -> biolink:genetic_association -> Tremor
[{'attribute_type_id': 'biolink:pri

In [112]:
import copy
output_dict = copy.deepcopy(base_dict)
edge_list = [x for x in output_dict.keys() if 'e' in x]
#print(edge_list)

# - convert and combine edge_binding IDs for each result
# for edge_id in edge_list:
#     for edge_index, edge_number in output_dict[edge_id].items():
#         # Getting the predicate/label using the edge_binding ID from the knowledge graph
#         output_dict[edge_id][edge_index] = response.json()['message']['knowledge_graph']['edges'][edge_number]['predicate']

# pp.pprint(output_dict)

In [113]:
node_list = [x for x in output_dict.keys() if 'n' in x]

for node_id in node_list:
    for node_index, node_value in output_dict[node_id].items():
        # Getting the 'name' label using the node ID from the knowledge graph
        output_dict[node_id][node_index] = response.json()['message']['knowledge_graph']['nodes'][node_value]['name']
        
pp.pprint(output_dict)

{    'n00': {    0: 'Buprenorphine',
                 1: 'Buprenorphine',
                 2: 'Buprenorphine',
                 3: 'Buprenorphine',
                 4: 'Buprenorphine',
                 5: 'Buprenorphine',
                 6: 'Buprenorphine'},
     'n01': {    0: 'OPRM1',
                 1: 'CYP2D6',
                 2: 'CYP2D6',
                 3: 'CYP2D6',
                 4: 'CYP2D6',
                 5: 'CYP2D6',
                 6: 'CYP2D6'},
     'n02': {    0: 'Asterixis',
                 1: 'Postural tremor',
                 2: 'Pill-rolling tremor',
                 3: 'Limb tremor',
                 4: 'Action tremor',
                 5: 'Tremor',
                 6: 'Resting tremor'}}


In [None]:
import pandas as pd

# Convert JSON String to CSV File
# Read json from String
json_str = json.dumps(output_dict, indent=4)
df = pd.read_json(json_str)
df.to_csv('output/results_TRAPI_automat.csv')

The results above are just database matches, there are no scores or other additions.  You can instead send the TRAPI to the robokop application using Aragorn (rather than just to the graph)

In [115]:
ara_robokop_submit_url = "https://aragorn.renci.org/robokop/query"
response = requests.post(ara_robokop_submit_url,json=query)

KeyboardInterrupt: 

In [50]:
response.status_code

500

In [48]:
len(response_ara.json()['message']['results'])

NameError: name 'response_ara' is not defined

In [None]:
pp.pprint(response_ara.json()['message']['results'])

In [None]:
for result in response.json()['message']['results']:
    print(result['score'])