In [1]:
# Parameter inputs
aragorn_submit_url = "https://robokop-ara.apps.renci.org/robokop/query"
input_search_string = 'ppara'
output_search_string = 'liver fibrosis'

# Initializing directory to write
from datetime import datetime
from pathlib import Path

now = datetime.now()
dt_string = now.strftime("%Y-%m-%d_%H%M%S")
write_dir = Path("output/TRAPI",str(dt_string))
write_dir.mkdir(parents=True, exist_ok=True)

In [2]:
import requests
import json
import pprint
import pandas as pd
import os
pp = pprint.PrettyPrinter(indent=5)

This notebook serves as the starting point for an investigation into the relationship between PPAR$\alpha$ and Liver Fibrosis by using the APIs available in ROBOKOP.

Users unfamiliar with these APIs and want an introduction to these APIs should look into the RoboDocumentation repo on GitHub: https://github.com/RobokopU24/RoboDocumentation.

Two lists of CURIES are generated by using the Name Resolver tool, where the submissions are strings "ppara" and "liver fibrosis".

In [3]:
results = requests.post(f'https://name-resolution-sri.renci.org/lookup?string={input_search_string}&offset=0&limit=10')
results_json = results.json()
# print(json.dumps(results_json,indent=4))

input_node_id_list = []
for result in results_json:
    if result["curie"] not in input_node_id_list:
        input_node_id_list.append(result["curie"])
print(input_node_id_list)


['NCBIGene:5465', 'NCBIGene:19013', 'NCBIGene:25747', 'NCBIGene:281992', 'NCBIGene:374120', 'NCBIGene:397239', 'NCBIGene:403654', 'NCBIGene:443457', 'NCBIGene:458910', 'NCBIGene:613250']


In [4]:
results = requests.post(f'https://name-resolution-sri.renci.org/lookup?string={output_search_string}&offset=0&limit=10')
results_json = results.json()
# print(json.dumps(results_json,indent=4))

output_node_id_list = []
for result in results_json:
    if result["curie"] not in output_node_id_list:
        output_node_id_list.append(result["curie"])
print(output_node_id_list)

['UBERON:0002107', 'UMLS:C1397317', 'UMLS:C1954436', 'HP:0001395', 'UMLS:C4034373', 'UMLS:C4227681', 'UMLS:C5548946', 'UMLS:C4695228', 'UMLS:C4695229', 'UMLS:C5189427']


A simple three node, two hop pathway is constructed using these lists of CURIES.

In [5]:
query={
    "message": {
      "query_graph": {
        "edges": {
          "e00": {
            "subject": "n00",
              "object": "n01",
          "predicates":["biolink:related_to"]
          },
          "e01": {
            "subject": "n01",
              "object": "n02",
          "predicates":["biolink:related_to"]
          }
        },
        "nodes": {
          "n00": {
            "ids": input_node_id_list, #['NCBIGene:5465'], #
            "categories": ["biolink:GeneOrGeneProduct"]
          },
          "n01": {
              "categories": ["biolink:BiologicalEntity"]
          },
          "n02": {
            "ids": output_node_id_list, #["HP:0001395"],
            "categories": ["biolink:DiseaseOrPhenotypicFeature"]
          }
        }
      }
    }
  }


The query is sent to Aragorn to get scores for the pathways.

In [6]:
response = requests.post(aragorn_submit_url,json=query)
print(response.status_code)
number_pathway_results = len(response.json()['message']['results'])
print(len(response.json()['message']['results']))

200
56


The TRAPI response can be written to a JSON file like below.

In [7]:
json_object = json.dumps(results_json, indent=4)
 
# Writing TRAPI response_ara to JSON file
with open(Path(write_dir,"trapi_query_response_ara.json"), "w") as outfile:
    outfile.write(json_object)

The response in JSON form is a python dictionary with three main keys, the `message`, `log_level`, and `workflow`.  The `message` property contains the `query_graph`, `knowledge_graph`, and `results` from the query.

In [8]:
print(response.json().keys())
print(response.json()['message'].keys())

query_out = response.json()['message']['query_graph']
kg = response.json()['message']['knowledge_graph']
results = response.json()['message']['results']

edges = ["e00", "e01"]
nodes = ["n00", "n01", "n02"]

dict_keys(['message', 'logs', 'status', 'pid'])
dict_keys(['query_graph', 'knowledge_graph', 'results'])


The `results` property contains pathways resulting from the query message. Each pathway is organized into edge_bindings and node bindings and contains results for the edges and nodes specified in the query message.  Results are defined using the node and edge identifiers. The attributes for those nodes and edges (including the names) are available via the `knowledge_graph` component of the `message` section of the response.

The code below writes out nodes from our `results`, but NOT the edges connecting the nodes.  This includes the scores provided by Aragorn.

In [9]:
aragorn_result_summaries = []
for r in results:
    rs = f"Score={round(r['analyses'][0]['score'], 3)}: "
    j = 0
    while j < len(nodes):
        node_id = r['node_bindings'][nodes[j]][0]['id']
        node_name = kg['nodes'][node_id]['name']
        rs = rs + f"{node_name} ({node_id})"
        if j < len(edges):
            edge_id = r['analyses'][0]['edge_bindings'][edges[j]][0]['id']
            edge_name = kg['edges'][edge_id]['predicate']
            rs = rs + f"--{edge_name}-->"
        j = j + 1
    aragorn_result_summaries.append(rs)

In [10]:
for rs in aragorn_result_summaries:
    print(rs)

Score=0.679: PPARA (NCBIGene:5465)--biolink:affects-->PKHD1 (NCBIGene:5314)--biolink:contributes_to-->isolated congenital hepatic fibrosis (MONDO:0018840)
Score=0.495: PPARA (NCBIGene:5465)--biolink:affects-->THBS1 (NCBIGene:7057)--biolink:contributes_to-->isolated congenital hepatic fibrosis (MONDO:0018840)
Score=0.495: PPARA (NCBIGene:5465)--biolink:interacts_with-->AGT (NCBIGene:183)--biolink:contributes_to-->isolated congenital hepatic fibrosis (MONDO:0018840)
Score=0.476: PPARA (NCBIGene:5465)--biolink:contributes_to-->cirrhosis of liver (MONDO:0005155)--biolink:subclass_of-->isolated congenital hepatic fibrosis (MONDO:0018840)
Score=0.473: PPARA (NCBIGene:5465)--biolink:affects-->TGFB1 (NCBIGene:7040)--biolink:contributes_to-->isolated congenital hepatic fibrosis (MONDO:0018840)
Score=0.448: PPARA (NCBIGene:5465)--biolink:contributes_to-->liver disorder (MONDO:0005154)--biolink:subclass_of-->Hepatic bridging fibrosis (HP:0012852)
Score=0.448: PPARA (NCBIGene:5465)--biolink:contri

The following assumes that the node names will sort in the correct order, which is the case with the default naming conventions. This exports the results showing the nodes and the score assigned to each result.

In [11]:
cols = []
for node in sorted(results[0]['node_bindings'].keys()):
    cols.append(node)
    cols.append(node + '_name')
results_df = pd.DataFrame(columns = cols)

results_dict_list = []
for result in results:
    result_dict = {}
    for node in result['node_bindings'].keys():
        node_id = result['node_bindings'][node][0]['id']
        result_dict[node] = node_id
        result_dict[node + '_name'] = kg['nodes'][node_id]['name']
    result_dict['score'] = result['analyses'][0]['score']
    #print(result_dict)

    results_dict_list.append(result_dict)
results_df = pd.concat([results_df,pd.DataFrame.from_records(results_dict_list)])
print(results_df.shape)
results_df.to_csv(os.path.join(write_dir,'results_TRAPI_aragorn.csv'))


(56, 7)


In the code below, we print a single edge for each association type and include the count of the number of data sources where that association was found. The following writes out all unique edges for each of the results in the format of `subject` -> `predicate` -> `object` and exports the information into a single file per `result`.

In [12]:
cols = []
for node in sorted(results[0]['node_bindings'].keys()):
    cols.append(node)
    cols.append(node + '_name')
results_df = pd.DataFrame(columns = cols)

results_list = []
for result in results:
    result_dict = {}
    for node in sorted(result['node_bindings'].keys()):
        node_id = result['node_bindings'][node][0]['id']
        result_dict[node] = node_id
        result_dict[node + '_name'] = kg['nodes'][node_id]['name']

    results_list.append(pd.DataFrame([result_dict]))
results_df = pd.concat(results_list)
# display(results_df)
# results_df.to_csv(os.path.join(write_dir,'results_TRAPI.csv'), index=False)

combined_node_list = ["_".join([row[1].replace(" ", "_"), row[3].replace(" ", "_"), row[5].replace(" ", "_")]) for row in results_df[cols].to_numpy()]
pp.pprint(combined_node_list)

[    'PPARA_PKHD1_isolated_congenital_hepatic_fibrosis',
     'PPARA_THBS1_isolated_congenital_hepatic_fibrosis',
     'PPARA_AGT_isolated_congenital_hepatic_fibrosis',
     'PPARA_cirrhosis_of_liver_isolated_congenital_hepatic_fibrosis',
     'PPARA_TGFB1_isolated_congenital_hepatic_fibrosis',
     'PPARA_liver_disorder_Hepatic_bridging_fibrosis',
     'PPARA_liver_disorder_Hepatic_fibrosis',
     'PPARA_liver_disorder_Periportal_fibrosis',
     'PPARA_liver_disorder_isolated_congenital_hepatic_fibrosis',
     'PPARA_REN_isolated_congenital_hepatic_fibrosis',
     'peroxisome_proliferator-activated_receptor_alpha_isoform_1_(human)_TGFB1_isolated_congenital_hepatic_fibrosis',
     'PPARA_ALB_Hepatic_fibrosis',
     'PPARA_NF1_isolated_congenital_hepatic_fibrosis',
     'PPARA_HNF1A_isolated_congenital_hepatic_fibrosis',
     'PPARA_THBS1_Hepatic_fibrosis',
     'PPARA_AGT_Hepatic_fibrosis',
     'PPARA_SMAD3_Hepatic_fibrosis',
     'PPARA_CCL2_Hepatic_fibrosis',
     'PPARA_SPP1_Hepati

In [13]:
from collections import Counter
import json
import pprint
pp = pprint.PrettyPrinter(indent=5)

for i in range(number_pathway_results):
    print(f"Pathway result: {combined_node_list[i]}")
    edge_bindings = results[i]['analyses'][0]['edge_bindings']

    edge_ids = []
    for edge_name, edge_list in edge_bindings.items():
        edge_ids.append({edge_name: [x['id'] for x in edge_list]})

    string_out_list = []
    for edge_dict in edge_ids:
        for edge_name, edge_list in edge_dict.items():
            for edge_id in edge_list:
                subject_id = kg['edges'][edge_id]['subject']
                subject = kg['nodes'][subject_id]['name']
                predicate = kg['edges'][edge_id]['predicate']
                object_id = kg['edges'][edge_id]['object']
                object = kg['nodes'][object_id]['name']
                string_out = f"{subject} -> {predicate} -> {object}"
                string_out_list.append(string_out)
    string_out_dict = dict(Counter(string_out_list).items())
    pp.pprint(string_out_dict)
    print("")
    
    with open(os.path.join(write_dir,combined_node_list[i]+".txt"), 'w') as convert_file:
        convert_file.write(json.dumps(string_out_dict))
        

Pathway result: PPARA_PKHD1_isolated_congenital_hepatic_fibrosis
{    'PKHD1 -> biolink:affects -> PPARA': 1,
     'PKHD1 -> biolink:contributes_to -> isolated congenital hepatic fibrosis': 1,
     'PKHD1 -> biolink:genetically_associated_with -> isolated congenital hepatic fibrosis': 1}

Pathway result: PPARA_THBS1_isolated_congenital_hepatic_fibrosis
{    'PPARA -> biolink:affects -> THBS1': 1,
     'THBS1 -> biolink:affects -> PPARA': 1,
     'THBS1 -> biolink:contributes_to -> isolated congenital hepatic fibrosis': 1}

Pathway result: PPARA_AGT_isolated_congenital_hepatic_fibrosis
{    'AGT -> biolink:affects -> PPARA': 1,
     'AGT -> biolink:contributes_to -> isolated congenital hepatic fibrosis': 1,
     'PPARA -> biolink:affects -> AGT': 1,
     'PPARA -> biolink:interacts_with -> AGT': 1}

Pathway result: PPARA_cirrhosis_of_liver_isolated_congenital_hepatic_fibrosis
{    'PPARA -> biolink:contributes_to -> cirrhosis of liver': 1,
     'isolated congenital hepatic fibrosis -> b