In [None]:
import requests
import json

In [None]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

## Intro to ExEmPLAR

ExEmPLAR (https://www.exemplar.mml.unc.edu/) is an additional GUI-based tool that uses Cypher to explore the ROBOKOP KG. More information on the use of this tool is available here: https://github.com/beasleyjonm/AOP-COP-Path-Extractor. Start and End Nodes can be specified by type and have multiple terms specified.  The labels applied to the Start and End Nodes can be checked against the KG for presence or absence.  Additionally, the number of nodes in-between can be specified along with edge types.  By default, a query from ExEmPLAR will return a collapsed version of edge results. This behavior can be changed by clicking the "Get Result Metadata" checkbox. Note that directionality is not preserved in the query results, but the orientation of the nodes are preserved.  For example, `Buprenorphine` - `CYP2D6` - `Tremor` will be returned with edge labels, but not which direction the edge applies.

Using the same `Buprenorphine` - `Gene` - `Tremor` example as above, three rows are returned.  These are in a collapsed form of the results from queries to `robokopkg.renci.org` above, summarized below.
 - `Buprenorphine` - `affects` - `CYP2D6`
 - `Buprenorphine` - `directly physically interacts with` - `CYP2D6`
 - `Buprenorphine` - `regulates` - `CYP2D6`
 - `CYP2D6` - `genetic association` - `Tremor`

## Cypher - robokopkg.renci.org

The ExEmPLAR tool uses a separate access point to ROBOKOP at `robokopkg.renci.org` as compared to the method outlined in `HelloRobokop_Cypher.ipynb`.  A cypher query here is sent to http://robokopkg.renci.org using the bolt protocol.  After defining the helper Neo4jConnection class, the query is sent and results can be extracted below.

In [None]:
# Buprenorphine -> [Gene] -> Tremor
cypher = f"""MATCH (n0_0:`biolink:ChemicalEntity`)-[r0_0]-(n1_0:`biolink:Gene`)-[r1_0]-(n2_0:`biolink:DiseaseOrPhenotypicFeature`)
WHERE n0_0.name IN ['Buprenorphine'] AND n2_0.name IN ['Tremor']
RETURN [startNode(r0_0),[type(r0_0),properties(r0_0)],endNode(r0_0)] as edge_1,
[startNode(r1_0),[type(r1_0),properties(r1_0)],endNode(r1_0)] as edge_2,
[n0_0.name, n1_0.name, n2_0.name] as node_names LIMIT 100"""

In [None]:
from neo4j import GraphDatabase
class Neo4jConnection:
    
    def __init__(self, uri, user, pwd):
        self.__uri = uri
        self.__user = user
        self.__pwd = pwd
        self.__driver = None
        try:
            self.__driver = GraphDatabase.driver(self.__uri, auth=(self.__user, self.__pwd))
        except Exception as e:
            print("Failed to create the driver:", e)
        
    def close(self):
        if self.__driver is not None:
            self.__driver.close()
        
    def query(self, query, db=None):
        assert self.__driver is not None, "Driver not initialized!"
        session = None
        response = None
        try: 
            session = self.__driver.session(database=db) if db is not None else self.__driver.session()
            response = list(session.run(query))
        except Exception as e:
            print("Query failed:", e)
        finally: 
            if session is not None:
                session.close()
        return response

In [None]:
pw = ''
conn = Neo4jConnection(uri="bolt://robokopkg.renci.org:7687", user = 'neo4j', pwd = pw)
record_list = conn.query(cypher)

In [None]:
from datetime import datetime
from pathlib import Path

now = datetime.now()
dt_string = now.strftime("%Y-%m-%d_%H%M%S")
write_dir = Path("output/Cypher_robokopkg",str(dt_string))
write_dir.mkdir(parents=True, exist_ok=True)


A list of records is returned from the query. The structure of each record is defined by the `RETURN` section of the query above.
```
<Record edge_1=[<Node element containing properties for first node of r0_0>,
                 [list containing the type and properties for the edge],
                 <Node element containing properties for second node of r0_0>]
         edge_2=[<Node element containing properties for first node of r1_0>,
                 [list containing the type and properties for the edge],
                 <Node element containing properties for second node of r1_0>]
         node_names=[list of node names]>
```

In [None]:
record = record_list[0]
print(record)

The data can be accessed using the data() method. Known keys can be passed to data(), but leaving it blank will return everything as a Dictionary. Data in records are returned with keys based on the original query labels. 

In [None]:
record_data = record.data()
pp.pprint(record_data)

Results are extracted and stored in the format of subject -> predicate -> object, followed by the remaining edge properties. Including the edge properties helps to distinguish edges that may have the same predicates. Unique entries are appended to a list, counted, and then written to a text file.

The code below extracts results based on the structure of the original cypher query in the section above.  Any changes to the `RETURN` part of the query will require adjustments to the code below.

In [None]:
import os
from collections import Counter

string_out_list = []
for record in record_list: 
    record_data = record.data()
    #only grab the edge information and skip the list of node names
    record_data_first2 = {k: record_data[k] for k in list(record_data)[:2]}
    for label, data in record_data_first2.items():
        string_out = f"{label} - {data[0]['name']} -> {data[1][0]} -> {data[2]['name']}||{data[1][1]}"
        # print(f"{data[0]['name']} -> {data[1][0]} -> {data[2]['name']}")
        # print(f"Edge properties: {data[1][1]}")
        if string_out not in string_out_list:
            string_out_list.append(string_out)

combined_node_list = "_".join(list(record_list[0].data('node_names').values())[0])
print(combined_node_list)

string_out_list = [i.split('||', 1)[0] for i in string_out_list]

string_out_dict = dict(Counter(string_out_list).items())
pp.pprint(string_out_dict)

Writing results below after confirming that the output looks good.

In [None]:
with open(os.path.join(write_dir,combined_node_list+".txt"), 'w') as convert_file:
    convert_file.write(json.dumps(string_out_dict))

Close the Cypher connection when finished.

In [None]:
conn.close()

## ExEmPLAR - RETURN * notation

ExEmPLAR also has a `Copy` tool that generates cypher queries after setting up a query pattern.  An example using the copied pre-genereated cypher query to each of `automat.renci.org` and `robokopkg.renci.org` is below.  This can be useful for setting up an initial Cypher query if a user is not familiar with writing in Cypher.

The difficulty faced with the copied query is that "RETURN \*" is not specific with the structures given back.  In the queries to `automat.renci.org` and `robokopkg.renci.org`, while the nodes and their respective information are preserved, the edges have differing amounts of information returned.  The query to `automat.renci.org` gets most of the edge properties except for the edge predicate and subject/object nodes, thus directionality.  The query to `robokopkg.renci.org` returns the opposite, where the edge predicate is solely returned with no edge properties, which makes the returned information appear redundant.

In [None]:
cypher_exemplar = "MATCH (n0_0:`biolink:ChemicalEntity`)-[r0_0]-(n1_0:`biolink:Gene`)-[r1_0]-(n2_0:`biolink:DiseaseOrPhenotypicFeature`) WHERE n0_0.name IN ['Buprenorphine'] AND n2_0.name IN ['Tremor'] RETURN * LIMIT 100"

In [None]:
import pprint
pp = pprint.PrettyPrinter(indent=5)

In [None]:
import requests
import json

j = {'query': cypher_exemplar}
results = requests.post('https://automat.renci.org/robokopkg/cypher',json=j)
results_json = results.json()

In [None]:
print("Query to automat.renci.org")
i = 0
for result in results_json['results'][0]['data']:
    #j = 0
    i = i + 1
    n1 = result['row'][0].get('name') or "NOT FOUND"
    n2 = result['row'][1].get('name') or "NOT FOUND"
    n3 = result['row'][2].get('name') or "NOT FOUND"
    e1 = result['row'][3].get('qualified_predicate') or "NOT FOUND"
    e2 = result['row'][4].get('qualified_predicate') or "NOT FOUND"
    print(f"Result {i}")
    print(f"{n1} -> {e1} -> {n2} -> {e2} -> {n3}")
    print(f"Edge 1 info: {result['row'][3]}")
    print(f"Edge 2 info: {result['row'][4]}\n")
    

In [None]:
pw = ''
conn = Neo4jConnection(uri="bolt://robokopkg.renci.org:7687", user = 'neo4j', pwd = pw)
record_list = conn.query(cypher_exemplar)

In [None]:
import os

print("Query to robokopkg.renci.org")

i = 0
for record in record_list: 
    i = i + 1
    record_data = record.data()
    
    print(f"Result {i}:")
    for label, data in record_data.items():
        if 'r' in label:
            print(f"{data[0]['name']} - {data[1]} - {data[2]['name']}")
    print()

To summarize, we encountered issues with using the notation "RETURN \*", mainly because this is not specific with what to return.  Queries to `automat` return edge properties, not including direction or predicates, but queries to `robokopkg.renci.org` return the direction and predicates, not including edge properties.  Above, for `robokopkg.renci.org`, it appears as if results 1 & 5, 2 & 6, 3 & 7, and 4 & 8 are duplicates of each other, while for `automat.renci.org`, edge predicates are missing.  We replaced the "\*" with a format to get node pairs and specific relationship information, including the type and properties.  This is demonstrated in the sections [Cypher - robokopkg.renci.org](#Cypher---robokopkg.renci.org) and [Cypher - automat.renci.org](#Cypher---automat.renci.org)