In [1]:
import requests
import json

TRAPI Documentation: https://github.com/NCATSTranslator/ReasonerAPI

Most TRAPI documents contain a `message` key.  Within that `message` are a `query_graph` denoting the user query,
a `knowledge_graph` consisting of the union of all nodes and edges that match the `query_graph` pattern, and a list of `results` that bind `query_graph` elements to `knowledge_graph` elements.

The following message contains only a `query_graph`.  This query graph consists of 3 nodes connected together in a line.   Two of the nodes (`n00` and `n02`) have specified identifiers, while the middle node of the line does not.  Rather the middle node has a list of `categories` that are acceptable.

For a researcher who is starting from a `name` who wants to use TRAPI, they can use the Node Resolver tool to get a list of identifers for the nodes.  For example, finding IDs for `Buprenorphine` is below.

In [56]:
search_string = 'ppara'
results = requests.post(f'https://name-resolution-sri.renci.org/lookup?string={search_string}&offset=0&limit=10')
results_json = results.json()
#print(json.dumps(results_json,indent=4))

In [57]:
input_node_id_list = list(results_json.keys())
print(input_node_id_list)

['UniProtKB:P37230', 'UniProtKB:Q07869', 'UniProtKB:Q95N78', 'PR:000013056', 'UniProtKB:P23204', 'NCBIGene:19013', 'NCBIGene:25747', 'NCBIGene:5465', 'NCBIGene:557714', 'NCBIGene:30755']


For confirmation of the labels for each of these IDs, the Node Normalizer tool can be used to show contents.

In [58]:
nn_query = {
  "curies": input_node_id_list,
  "conflate": True
}
results = requests.post('https://nodenormalization-sri.renci.org/get_normalized_nodes',json=nn_query)

In [59]:
print(json.dumps(results.json(),indent=4))

{
    "UniProtKB:P37230": {
        "id": {
            "identifier": "NCBIGene:25747",
            "label": "Ppara"
        },
        "equivalent_identifiers": [
            {
                "identifier": "NCBIGene:25747",
                "label": "Ppara"
            },
            {
                "identifier": "ENSEMBL:ENSRNOG00000021463"
            },
            {
                "identifier": "RGD:3369",
                "label": "Ppara"
            },
            {
                "identifier": "UniProtKB:P37230",
                "label": "PPARA_RAT Peroxisome proliferator-activated receptor alpha (sprot)"
            },
            {
                "identifier": "PR:P37230",
                "label": "peroxisome proliferator-activated receptor alpha (rat)"
            },
            {
                "identifier": "ENSEMBL:ENSRNOP00000038651"
            }
        ],
        "type": [
            "biolink:Gene",
            "biolink:GeneOrGeneProduct",
            "biolink:G

Now to find the IDs associated with our disease

In [112]:
search_string = 'liver fibrosis'
results = requests.post(f'https://name-resolution-sri.renci.org/lookup?string={search_string}&offset=0&limit=10')
results_json = results.json()

In [113]:
output_node_id_list = list(results_json.keys())
print(output_node_id_list)

['HP:0001395', 'UMLS:C4227681', 'UMLS:C4034373', 'UMLS:C5189427', 'UMLS:C0544816', 'MONDO:0100430', 'MONDO:0018840', 'UMLS:C1397317', 'UMLS:C4068302', 'UMLS:C4481250']


In [114]:
nn_query = {
  "curies": output_node_id_list,
  "conflate": True
}
results = requests.post('https://nodenormalization-sri.renci.org/get_normalized_nodes',json=nn_query)
print(json.dumps(results.json(),indent=4))

{
    "HP:0001395": {
        "id": {
            "identifier": "HP:0001395",
            "label": "Hepatic fibrosis"
        },
        "equivalent_identifiers": [
            {
                "identifier": "HP:0001395",
                "label": "Hepatic fibrosis"
            },
            {
                "identifier": "NCIT:C168581",
                "label": "Liver Fibrosis"
            },
            {
                "identifier": "UMLS:C0239946",
                "label": "Fibrosis, Liver"
            },
            {
                "identifier": "MEDDRA:10016648"
            },
            {
                "identifier": "MEDDRA:10019668"
            },
            {
                "identifier": "SNOMEDCT:62484002"
            }
        ],
        "type": [
            "biolink:PhenotypicFeature",
            "biolink:DiseaseOrPhenotypicFeature",
            "biolink:ThingWithTaxon",
            "biolink:BiologicalEntity",
            "biolink:NamedThing",
            "bioli

This query asks "Find me a Biological Process or Activity, or a Gene, or a Pathway that is related to both `HP:0001337` (Tremor) and the list of IDs above found searching the Name Resolver with `Buprenorphine`.

In [119]:
query={
    "message": {
      "query_graph": {
        "edges": {
          "e00": {
            "subject": "n00",
              "object": "n01",
          "predicates":["biolink:related_to"]
          },
          "e01": {
            "subject": "n01",
              "object": "n02",
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          "n00": {
            "ids": input_node_id_list,
            "categories": ["biolink:BiologicalEntity"]
          },
          "n01": {
              #"categories": ["biolink:BiologicalEntity"]
              "categories": ["biolink:BiologicalProcessOrActivity","biolink:Gene","biolink:Pathway"]
          },
          "n02": {
            "ids": output_node_id_list, #["HP:0001395"],
            #  "categories": ["biolink:BiologicalEntity"]
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }

        }
      }
    }
  }

This query can be sent to various components of Translator as needed.  It can be sent directly to the ROBOKOP database like this:

In [137]:
robokop_submit_url = "https://automat.renci.org/robokopkg/1.3/query"
response = requests.post(robokop_submit_url,json=query)

In [138]:
print(response.status_code)

200


In [139]:
print(len(response.json()['message']['results']))

36


In [94]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

The response in JSON form is a python dictionary with three main keys, the `message`, `log_level`, and `workflow`.  The `message` component contains the `query_graph`, `knowledge_graph`, and `results` from the query.

In [95]:
print(response.json().keys())
print(response.json()['message'].keys())

dict_keys(['message', 'log_level', 'workflow'])
dict_keys(['query_graph', 'knowledge_graph', 'results'])


The `results` component contains pathways resulting from the query message. Each pathway is organized into edge_bindings and node bindings and contains results for the edges and nodes specified in the query message.  Results are represented in identifier form, which can be found in the `knowledge_graph` component of the `message` section of the response.

In [96]:
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '77699818'},
                                       {'attributes': None, 'id': '65555672'}],
                           'e01': [{'attributes': None, 'id': '90073165'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'UniProtKB:Q07869-1',
                                            'qnode_id': 'NCBIGene:5465',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:6347',
                                            'query_id': None}],
                           'n02': [    {    'attributes': None,
                                            'id': 'HP:0001395',
                                            'qnode_id': 'HP:0001395',
                                            'query_id': None}]},
     'score': None}


The `knowledge_graph` contains information about each of the Nodes and Edges found in `results`.  An example of a Node and an Edge are shown below.

In [134]:
pp.pprint(response.json()['message']['knowledge_graph'].keys())


{    'edges': {    '103813207': {    'attributes': [    {    'attribute_source': 'infores:automat-robokop',
                                                             'attribute_type_id': 'biolink:aggregator_knowledge_source',
                                                             'attributes': None,
                                                             'description': None,
                                                             'original_attribute_name': 'biolink:aggregator_knowledge_source',
                                                             'value': [    'infores:automat-robokop',
                                                                           'infores:sri-reference-kg'],
                                                             'value_type_id': 'biolink:InformationResource',
                                                             'value_url': None},
                                                        {    'attribute_source': 'inf

                                                                              'ENSEMBL:ENSP00000505985.1',
                                                                              'ENSEMBL:ENSP00000355627.5',
                                                                              'ENSEMBL:ENSP00000505963',
                                                                              'UMLS:C1366631',
                                                                              'ENSEMBL:ENSP00000504866',
                                                                              'UniProtKB:B2R5S1',
                                                                              'UMLS:C2713666',
                                                                              'ENSEMBL:ENSP00000505063.1',
                                                                              'ENSEMBL:ENSG00000135744',
                                                                            

Information returned for the each Node includes the concept ID (key), biolink categories, the name/label, attributes, the value type, and others.  Note that each entry under the `nodes` level is itemized in dictionary format and not lists.  The content for one Node is shown below.

In [132]:
#response.json()['message']['knowledge_graph']['nodes'].keys()
next(iter( response.json()['message']['knowledge_graph']['nodes'].items() ))

('HP:0001395',
 {'categories': ['biolink:PhenotypicFeature',
   'biolink:DiseaseOrPhenotypicFeature',
   'biolink:Entity',
   'biolink:BiologicalEntity',
   'biolink:NamedThing',
   'biolink:ThingWithTaxon'],
  'name': 'Hepatic fibrosis',
  'attributes': [{'attribute_type_id': 'biolink:Attribute',
    'value': 89.8,
    'value_type_id': 'EDAM:data_0006',
    'original_attribute_name': 'information_content',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:same_as',
    'value': ['NCIT:C168581',
     'MEDDRA:10016648',
     'HP:0001395',
     'SNOMEDCT:62484002',
     'UMLS:C0239946',
     'MEDDRA:10019668'],
    'value_type_id': 'metatype:uriorcurie',
    'original_attribute_name': 'equivalent_identifiers',
    'value_url': None,
    'attribute_source': None,
    'description': None,
    'attributes': None}]})

Information returned for the each Edge includes the edge ID (key), the subject's concept ID, the object's concept ID, the edge's predicate, any qualifiers, and attributes.  Note that each entry under the `edges` level is itemized in dictionary format and not lists.  The content for one Edge is shown below.

In [126]:
next(iter( response.json()['message']['knowledge_graph']['edges'].items() ))

('29300493',
 {'subject': 'NCBIGene:5465',
  'object': 'NCBIGene:6696',
  'predicate': 'biolink:regulates',
  'qualifiers': None,
  'attributes': [{'attribute_type_id': 'biolink:aggregator_knowledge_source',
    'value': ['infores:hetio', 'infores:automat-robokop'],
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:aggregator_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None},
   {'attribute_type_id': 'biolink:primary_knowledge_source',
    'value': 'infores:lincs',
    'value_type_id': 'biolink:InformationResource',
    'original_attribute_name': 'biolink:primary_knowledge_source',
    'value_url': None,
    'attribute_source': 'infores:automat-robokop',
    'description': None,
    'attributes': None}]})

Since the edge and node contents are returned as CURIES (Compact URIs) instead of labels, a sample workflow was written to show how to use the above edge information and the Node Normalizer tool to convert results into a human readble form.  This workflow does the following steps:
- combine edge_binding IDs and convert edge IDs to labels for each result
- use output from the `knowledge_graph` to convert node IDs to labels
- write the output dictionary as a CSV

In [102]:
edges = response.json()['message']['results'][0]['edge_bindings'].keys()
nodes = response.json()['message']['results'][0]['node_bindings'].keys()
edges_nodes = sorted(edges|nodes)

edges_nodes_keys = [s for s in edges_nodes if not s.startswith('s')]
print(edges_nodes_keys)

# Initializing a base dictionary containing edge and node IDs from message results
base_dict = {}
for key in edges_nodes_keys:
    base_dict[key] = {}
print(base_dict)

['e00', 'e01', 'n00', 'n01', 'n02']
{'e00': {}, 'e01': {}, 'n00': {}, 'n01': {}, 'n02': {}}


In [103]:
# Illustrating the structure of each pathway result from the message component
pp.pprint(response.json()['message']['results'][0])

{    'edge_bindings': {    'e00': [    {'attributes': None, 'id': '77699818'},
                                       {'attributes': None, 'id': '65555672'}],
                           'e01': [{'attributes': None, 'id': '90073165'}]},
     'node_bindings': {    'n00': [    {    'attributes': None,
                                            'id': 'UniProtKB:Q07869-1',
                                            'qnode_id': 'NCBIGene:5465',
                                            'query_id': None}],
                           'n01': [    {    'attributes': None,
                                            'id': 'NCBIGene:6347',
                                            'query_id': None}],
                           'n02': [    {    'attributes': None,
                                            'id': 'HP:0001395',
                                            'qnode_id': 'HP:0001395',
                                            'query_id': None}]},
     'score': None}


In [104]:
# - populate base_dict with entries from response.json()['message']['results']
i = 0
for entry in response.json()['message']['results']:
    for edge_id in edges:
        base_dict[edge_id][i] = entry['edge_bindings'][edge_id][0]['id']
    for node_id in nodes:
        base_dict[node_id][i] = entry['node_bindings'][node_id][0]['id']
    i = i + 1
        
pp.pprint(base_dict)

{    'e00': {    0: '77699818',
                 1: '103813207',
                 2: '32256396',
                 3: '57629658',
                 4: '20299346',
                 5: '17412017',
                 6: '20299346',
                 7: '73429394',
                 8: '29300493',
                 9: '112023883',
                 10: '19535631',
                 11: '40673173',
                 12: '103813207',
                 13: '107697294',
                 14: '110120221',
                 15: '13704030',
                 16: '13704030',
                 17: '29300493',
                 18: '45775086',
                 19: '40673173',
                 20: '57629658',
                 21: '3445204',
                 22: '73429394',
                 23: '32256396',
                 24: '17412017',
                 25: '86622785',
                 26: '107697294',
                 27: '60874778',
                 28: '86622785',
                 29: '110120221',
              

In [105]:
import copy
output_dict = copy.deepcopy(base_dict)
edge_list = [x for x in output_dict.keys() if 'e' in x]
#print(edge_list)

# - convert and combine edge_binding IDs for each result
for edge_id in edge_list:
    for edge_index, edge_number in output_dict[edge_id].items():
        # Getting the predicate/label using the edge_binding ID from the knowledge graph
        output_dict[edge_id][edge_index] = response.json()['message']['knowledge_graph']['edges'][edge_number]['predicate']

pp.pprint(output_dict)

{    'e00': {    0: 'biolink:coexpressed_with',
                 1: 'biolink:interacts_with',
                 2: 'biolink:coexpressed_with',
                 3: 'biolink:coexpressed_with',
                 4: 'biolink:coexpressed_with',
                 5: 'biolink:coexpressed_with',
                 6: 'biolink:coexpressed_with',
                 7: 'biolink:regulates',
                 8: 'biolink:regulates',
                 9: 'biolink:interacts_with',
                 10: 'biolink:directly_physically_interacts_with',
                 11: 'biolink:coexpressed_with',
                 12: 'biolink:interacts_with',
                 13: 'biolink:coexpressed_with',
                 14: 'biolink:interacts_with',
                 15: 'biolink:coexpressed_with',
                 16: 'biolink:coexpressed_with',
                 17: 'biolink:regulates',
                 18: 'biolink:coexpressed_with',
                 19: 'biolink:coexpressed_with',
                 20: 'biolink:coexpressed

In [106]:
node_list = [x for x in output_dict.keys() if 'n' in x]

for node_id in node_list:
    for node_index, node_value in output_dict[node_id].items():
        # Getting the 'name' label using the node ID from the knowledge graph
        output_dict[node_id][node_index] = response.json()['message']['knowledge_graph']['nodes'][node_value]['name']
        
pp.pprint(output_dict)

{    'e00': {    0: 'biolink:coexpressed_with',
                 1: 'biolink:interacts_with',
                 2: 'biolink:coexpressed_with',
                 3: 'biolink:coexpressed_with',
                 4: 'biolink:coexpressed_with',
                 5: 'biolink:coexpressed_with',
                 6: 'biolink:coexpressed_with',
                 7: 'biolink:regulates',
                 8: 'biolink:regulates',
                 9: 'biolink:interacts_with',
                 10: 'biolink:directly_physically_interacts_with',
                 11: 'biolink:coexpressed_with',
                 12: 'biolink:interacts_with',
                 13: 'biolink:coexpressed_with',
                 14: 'biolink:interacts_with',
                 15: 'biolink:coexpressed_with',
                 16: 'biolink:coexpressed_with',
                 17: 'biolink:regulates',
                 18: 'biolink:coexpressed_with',
                 19: 'biolink:coexpressed_with',
                 20: 'biolink:coexpressed

In [109]:
import pandas as pd
import os

if os.path.exists('output'):
    if not os.path.isdir:
        raise Exception('output is not a directory')
else:
    os.makedirs('output')
# Convert JSON String to CSV File
# Read json from String
json_str = json.dumps(output_dict, indent=4)
df = pd.read_json(json_str)
df.to_csv('output/results_TRAPI_automat_multiple_ids.csv')

The results above are just database matches, there are no scores or other additions.  You can instead send the TRAPI to the robokop application using Aragorn (rather than just to the graph)

In [135]:
ara_robokop_submit_url = "https://aragorn.renci.org/robokop/query"
response = requests.post(ara_robokop_submit_url,json=query)

In [136]:
response.status_code

500

In [58]:
len(response_ara.json()['message']['results'])

NameError: name 'response_ara' is not defined

In [None]:
pp.pprint(response_ara.json()['message']['results'])

In [None]:
for result in response.json()['message']['results']:
    print(result['score'])