In [1]:
import requests
import json

TRAPI Documentation: https://github.com/NCATSTranslator/ReasonerAPI

Most TRAPI documents contain a `message` key.  Within that `message` are a `query_graph` denoting the user query,
a `knowledge_graph` consisting of the union of all nodes and edges that match the `query_graph` pattern, and a list of `results` that bind `query_graph` elements to `knowledge_graph` elements.

The following message contains only a `query_graph`.  This query graph consists of 3 nodes connected together in a line.   Two of the nodes (`n00` and `n02`) have specified identifiers, while the middle node of the line does not.  Rather the middle node has a list of `categories` that are acceptable.

For a researcher who is starting from a `name` who wants to use TRAPI, they can use the Node Resolver tool to get the identifers for the nodes.  For example, finding IDs for `Tremor` and `Buprenorphine` are below.

This query asks "Find me a Biological Process or Activity, or a Gene, or a Pathway that is related to both `PUBCHEM.COMPOUND:644073` (Buprenorphine) and `HP:0001337` (Tremor).

In [2]:
query={
    "message": {
      "query_graph": {
        "edges": {
          "e00": {
            "subject": "n00",
              "object": "n01",
          "predicates":["biolink:related_to"]
          },
          "e01": {
            "subject": "n01",
              "object": "n02",
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          "n00": {
            "ids": ["PUBCHEM.COMPOUND:644073"],
            "categories": ["biolink:ChemicalEntity"]
          },
          "n01": {
              "categories": ["biolink:BiologicalProcessOrActivity","biolink:Gene","biolink:Pathway"]
          },
          "n02": {
            "ids": ["HP:0001337"],
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }
        }
      }
    }
  }


This query can be sent to various components of Translator as needed.  It can be sent directly to the ROBOKOP database like this:

In [3]:
robokop_submit_url = "https://automat.renci.org/robokopkg/1.3/query"
response = requests.post(robokop_submit_url,json=query)

In [4]:
print(response.status_code)

200


In [5]:
print(len(response.json()['message']['results']))

7


In [6]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

The response in JSON form is a python dictionary with three main keys, the `message`, `log_level`, and `workflow`.  The `message` component contains the `query_graph`, `knowledge_graph`, and `results` from the query.

In [7]:
print(response.json().keys())
print(response.json()['message'].keys())

dict_keys(['message', 'log_level', 'workflow'])
dict_keys(['query_graph', 'knowledge_graph', 'results'])


The `results` component contains pathways resulting from the query message. Each pathway is organized into edge_bindings and node bindings and contains results for the edges and nodes specified in the query message.  Results are represented in identifier form, which can be found in the `knowledge_graph` component of the `message` section of the response.

In [8]:
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '70512752'},
                                       {'attributes': None, 'id': '53738502'},
                                       {'attributes': None, 'id': '114867359'},
                                       {'attributes': None, 'id': '34805007'},
                                       {'attributes': None, 'id': '52520264'},
                                       {'attributes': None, 'id': '17306890'}],
                           'e01': [{'attributes': None, 'id': '122503186'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'PUBCHEM.COMPOUND:644073',
                                            'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:4988',
                           

The `knowledge_graph` contains information about each of the Nodes and Edges found in `results`.  An example of a Node and an Edge are shown below.

In [9]:
pp.pprint(response.json()['message']['knowledge_graph'].keys())


dict_keys(['nodes', 'edges'])


Information returned for the each Node includes the concept ID (key), biolink categories, the name/label, attributes, the value type, and others.  Note that each entry under the `nodes` level is itemized in dictionary format and not lists.  The content for one Node is shown below.

In [10]:
next(iter( response.json()['message']['knowledge_graph']['nodes'].items() ))

('HP:0012164',
 {'categories': ['biolink:Entity',
   'biolink:ThingWithTaxon',
   'biolink:DiseaseOrPhenotypicFeature',
   'biolink:PhenotypicFeature',
   'biolink:BiologicalEntity',
   'biolink:NamedThing'],
  'name': 'Asterixis',
  'attributes': [{'attribute_type_id': 'biolink:same_as',
    'value': ['UMLS:C0232766',
     'MEDDRA:10003547',
     'HP:0012164',
     'MEDDRA:10057580',
     'SNOMEDCT:32838008',
     'NCIT:C86048'],
    'value_type_id': 'metatype:uriorcurie',
    'original_attribute_name': 'equivalent_identifiers',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:Attribute',
    'value': 100.0,
    'value_type_id': 'EDAM:data_0006',
    'original_attribute_name': 'information_content',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None}]})

Information returned for the each Edge includes the edge ID (key), the subject's concept ID, the object's concept ID, the edge's predicate, any qualifiers, and attributes.  Note that each entry under the `edges` level is itemized in dictionary format and not lists.  The content for one Edge is shown below.

In [11]:
next(iter( response.json()['message']['knowledge_graph']['edges'].items() ))

('104671278',
 {'subject': 'NCBIGene:1565',
  'object': 'HP:0001337',
  'predicate': 'biolink:genetic_association',
  'qualifiers': None,
  'attributes': [{'attribute_type_id': 'biolink:primary_knowledge_source',
    'value': ['DisGeNET'],
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:primary_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:aggregator_knowledge_source',
    'value': ['infores:automat-robokop'],
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:aggregator_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:aggregator_knowledge_source',
    'value': ['infores:pharos'],
    'value_type_id': 'biolink:InformationResource',
    'original_at

Since the edge and node contents are returned as CURIES (Compact URIs) instead of labels, a sample workflow was written to show how to use the above edge information and the Node Normalizer tool to convert results into a human readble form.  This workflow does the following steps:
- combine edge_binding IDs and convert edge IDs to labels for each result
- use output from the `knowledge_graph` to convert node IDs to labels
- write the output dictionary as a CSV

In [12]:
edges = response.json()['message']['results'][0]['edge_bindings'].keys()
nodes = response.json()['message']['results'][0]['node_bindings'].keys()
edges_nodes = sorted(edges|nodes)

edges_nodes_keys = [s for s in edges_nodes if not s.startswith('s')]
print(edges_nodes_keys)

# Initializing a base dictionary containing edge and node IDs from message results
base_dict = {}
for key in edges_nodes_keys:
    base_dict[key] = {}
print(base_dict)

['e00', 'e01', 'n00', 'n01', 'n02']
{'e00': {}, 'e01': {}, 'n00': {}, 'n01': {}, 'n02': {}}


In [13]:
# Illustrating the structure of each pathway result from the message component
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '70512752'},
                                       {'attributes': None, 'id': '53738502'},
                                       {'attributes': None, 'id': '114867359'},
                                       {'attributes': None, 'id': '34805007'},
                                       {'attributes': None, 'id': '52520264'},
                                       {'attributes': None, 'id': '17306890'}],
                           'e01': [{'attributes': None, 'id': '122503186'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'PUBCHEM.COMPOUND:644073',
                                            'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:4988',
                           

In [14]:
# - populate base_dict with entries from response.json()['message']['results']
i = 0
for entry in response.json()['message']['results']:
    for edge_id in edges:
        base_dict[edge_id][i] = entry['edge_bindings'][edge_id][0]['id']
    for node_id in nodes:
        base_dict[node_id][i] = entry['node_bindings'][node_id][0]['id']
    i = i + 1
        
pp.pprint(base_dict)

{    'e00': {    0: '70512752',
                 1: '29152440',
                 2: '29152440',
                 3: '29152440',
                 4: '29152440',
                 5: '29152440',
                 6: '29152440'},
     'e01': {    0: '122503186',
                 1: '36237079',
                 2: '40300787',
                 3: '100241886',
                 4: '14737677',
                 5: '93297435',
                 6: '132474660'},
     'n00': {    0: 'PUBCHEM.COMPOUND:644073',
                 1: 'PUBCHEM.COMPOUND:644073',
                 2: 'PUBCHEM.COMPOUND:644073',
                 3: 'PUBCHEM.COMPOUND:644073',
                 4: 'PUBCHEM.COMPOUND:644073',
                 5: 'PUBCHEM.COMPOUND:644073',
                 6: 'PUBCHEM.COMPOUND:644073'},
     'n01': {    0: 'NCBIGene:4988',
                 1: 'NCBIGene:1565',
                 2: 'NCBIGene:1565',
                 3: 'NCBIGene:1565',
                 4: 'NCBIGene:1565',
                 5: 'NCBIGene:15

In [15]:
import copy
output_dict = copy.deepcopy(base_dict)
edge_list = [x for x in output_dict.keys() if 'e' in x]
#print(edge_list)

# - convert and combine edge_binding IDs for each result
for edge_id in edge_list:
    for edge_index, edge_number in output_dict[edge_id].items():
        # Getting the predicate/label using the edge_binding ID from the knowledge graph
        output_dict[edge_id][edge_index] = response.json()['message']['knowledge_graph']['edges'][edge_number]['predicate']

pp.pprint(output_dict)

{    'e00': {    0: 'biolink:affects',
                 1: 'biolink:directly_physically_interacts_with',
                 2: 'biolink:directly_physically_interacts_with',
                 3: 'biolink:directly_physically_interacts_with',
                 4: 'biolink:directly_physically_interacts_with',
                 5: 'biolink:directly_physically_interacts_with',
                 6: 'biolink:directly_physically_interacts_with'},
     'e01': {    0: 'biolink:genetic_association',
                 1: 'biolink:genetic_association',
                 2: 'biolink:genetic_association',
                 3: 'biolink:genetic_association',
                 4: 'biolink:genetic_association',
                 5: 'biolink:genetic_association',
                 6: 'biolink:genetic_association'},
     'n00': {    0: 'PUBCHEM.COMPOUND:644073',
                 1: 'PUBCHEM.COMPOUND:644073',
                 2: 'PUBCHEM.COMPOUND:644073',
                 3: 'PUBCHEM.COMPOUND:644073',
                 4

In [16]:
node_list = [x for x in output_dict.keys() if 'n' in x]

for node_id in node_list:
    for node_index, node_value in output_dict[node_id].items():
        # Getting the 'name' label using the node ID from the knowledge graph
        output_dict[node_id][node_index] = response.json()['message']['knowledge_graph']['nodes'][node_value]['name']
        
pp.pprint(output_dict)

{    'e00': {    0: 'biolink:affects',
                 1: 'biolink:directly_physically_interacts_with',
                 2: 'biolink:directly_physically_interacts_with',
                 3: 'biolink:directly_physically_interacts_with',
                 4: 'biolink:directly_physically_interacts_with',
                 5: 'biolink:directly_physically_interacts_with',
                 6: 'biolink:directly_physically_interacts_with'},
     'e01': {    0: 'biolink:genetic_association',
                 1: 'biolink:genetic_association',
                 2: 'biolink:genetic_association',
                 3: 'biolink:genetic_association',
                 4: 'biolink:genetic_association',
                 5: 'biolink:genetic_association',
                 6: 'biolink:genetic_association'},
     'n00': {    0: 'Buprenorphine',
                 1: 'Buprenorphine',
                 2: 'Buprenorphine',
                 3: 'Buprenorphine',
                 4: 'Buprenorphine',
                 5: '

In [19]:
import pandas as pd

# Convert JSON String to CSV File
# Read json from String
json_str = json.dumps(output_dict, indent=4)
df = pd.read_json(json_str)
df.to_csv('output/results_TRAPI_automat.csv')

OSError: Cannot save file into a non-existent directory: '\output'

A separate name resolver is shown here.  The name-resolver: https://name-resolution-sri.renci.org/docs has a lookup function that can take a string and return potential identifiers.  Here, we look up the string "tremor"

In [11]:
search_string = 'tremor'
results = requests.post(f'https://name-resolution-sri.renci.org/lookup?string={search_string}&offset=0&limit=10')

In [12]:
print(json.dumps(results.json(),indent=4))

{
    "HP:0001337": [
        "Tremor",
        "tremor",
        "TREMOR",
        "TREMORS",
        "Tremors",
        "tremors",
        "d tremors",
        "Tremor NOS",
        "Tremor, NOS",
        "Has a tremor",
        "A46-A47 TREMORS",
        "Shaking/Tremors",
        "Tremor (finding)",
        "tremors as symptom",
        "tremor (diagnosis)",
        "Tremor, unspecified",
        "tremor (physical finding)",
        "motor exam involuntary movements tremor trembles",
        "involuntary shaking or trembling movements (tremor)",
        "shake",
        "shakes",
        "quiver",
        "Shakes",
        "tremble",
        "Shaking",
        "shaking",
        "Tremble",
        "quivers",
        "SHAKING",
        "Trembled",
        "Quivered",
        "trembles",
        "Trembles",
        "TREMBLING",
        "TREMULOUS",
        "Trembling",
        "trembling",
        "Quivering",
        "quivering",
        "tremulous",
        "The shakes",
        "t

The node normalizer (https://nodenormalization-sri.renci.org/docs) takes CURIES and returns all other CURIES that are synonymous with the input. It also returns labels for the node, the biolink classes of the node, and often the information content of the node.

In [13]:
nn_query = {
  "curies": [
    "NCBIGene:4988",
  ],
  "conflate": True
}
results = requests.post('https://nodenormalization-sri.renci.org/get_normalized_nodes',json=nn_query)

In [14]:
print(json.dumps(results.json(),indent=4))

{
    "NCBIGene:4988": {
        "id": {
            "identifier": "NCBIGene:4988",
            "label": "OPRM1"
        },
        "equivalent_identifiers": [
            {
                "identifier": "NCBIGene:4988",
                "label": "OPRM1"
            },
            {
                "identifier": "ENSEMBL:ENSG00000112038"
            },
            {
                "identifier": "HGNC:8156",
                "label": "OPRM1"
            },
            {
                "identifier": "OMIM:600018"
            },
            {
                "identifier": "UMLS:C1417965",
                "label": "OPRM1 gene"
            },
            {
                "identifier": "UniProtKB:B8K2Q5",
                "label": "B8K2Q5_HUMAN Mu opioid receptor splice variant MOR-1H (Fragment) (trembl)"
            },
            {
                "identifier": "UniProtKB:G8XRH4",
                "label": "G8XRH4_HUMAN Mu opioid receptor splice variant hMOR-1S (trembl)"
            },
    

The results above are just database matches, there are no scores or other additions.  You can instead send the TRAPI to the robokop application using Aragorn (rather than just to the graph)

In [22]:
ara_robokop_submit_url = "https://aragorn.renci.org/robokop/query"
response = requests.post(ara_robokop_submit_url,json=query)

In [23]:
response.status_code

500

In [16]:
len(response_ara.json()['message']['results'])

KeyError: 'results'

In [12]:
pp.pprint(response_ara.json()['message']['results'])

[    {    'edge_bindings': {    'e00': [    {    'attributes': None,
                                                 'id': '8766602'},
                                            {    'attributes': None,
                                                 'id': '29152440'},
                                            {    'attributes': None,
                                                 'id': '115770418'}],
                                'e01': [    {    'attributes': None,
                                                 'id': '14737677'},
                                            {    'attributes': None,
                                                 'id': '104671278'}]},
          'node_bindings': {    'n00': [    {    'attributes': None,
                                                 'id': 'PUBCHEM.COMPOUND:644073',
                                                 'qnode_id': 'PUBCHEM.COMPOUND:644073',
                                                 'query_id': None}],
   

In [15]:
for result in response.json()['message']['results']:
    print(result['score'])

0.17813345082933518
0.1425630949413046
0.13187243298012225
0.11703906793268541
0.11044291163893882
0.11020806196762686
0.10694929389416649
0.1062578101411532
0.10610022630555455
0.10201048277292121
0.10139427809144123
0.10108536708640792
0.1010853670864079
0.1010853670864079
