# PSKG Validation Testing

The Patient Safety Knowledge Graph (PSKG) is implemented on a graph database called Neo4J, and runs in aibench.  This notebook can be used to review the results of a data load running within AI bench, or can be run from the AZ network on a laptop with ZScaler installed.

Endpoint for local docker instance within AIBench
* [http://localhost:7687] (Local instance)

Test and Production Endpoints for PSKG outside of AIBench

* [http://kckb075.1000-672.service.azaibenchdev.net:7474/browser/](http://kckb075.1000-672.service.azaibenchdev.net:7474/browser/) (TEST instance)
* [http://pskg.1000-672.service.azaibenchdev.net:7474/browser/](http://kckb075.1000-672.service.azaibenchdev.net:7474/browser/) (PRODUCTION instance)

as well as programatically from from Python.  This notebook is a self contained example to connect to the database, run a query, and return results.

**Important**: This notebook requires the neo4j package (install using pip) as well as pandas.  The notebook should be run with lab_black to 
format Python neatly when editing (see next cell); however this is not neeeded to run the notebook.

In [None]:
# %load_ext lab_black

In [None]:
import pandas as pd
from functools import reduce

# Library for accessing neo4j
from neo4j import GraphDatabase

In [None]:
NEO4J_USER = "neo4j"
NEO4J_PW = "pskg"
NEO4J_LOCAL_URI = "neo4j://localhost:7687"
NEO4J_PROD_URI = "neo4j://pskg.1000-672.service.azaibenchdev.net:7687"
NEO4J_TEST_URI = "neo4j://kckb075.1000-672.service.azaibenchdev.net:7687"

## **IMPORTANT** Set to instance you wish to test

In [None]:
NEO4J_URI = NEO4J_PROD_URI

In [None]:
# this function (and other helpers for working with PSKG are available)
def run_query(tx, query, verbose=False):
    """
    Minimal query run, returns results as a dataframe

    Parameters
    ----------
    tx: Neo4j transaction
        Valid transaction from an active Neo4J session

    query: str
        Query to execute

    Returns
    -------
    dataframe
        Pandas dataframe with results
    """
    try:
        result = tx.run(query)
    except Exception as x:
        raise type(x)(f"tx.run: {query}\n" + str(x))
    if verbose:
        print("Result columns:", result.keys())
    df = pd.DataFrame([r.values() for r in result], columns=result.keys())
    info = result.consume()
    return df, info

In [None]:
def query_to_df(cypher, driver, db=None):
    """
    Minimal wrapper to run queries designated server
    """
    try:
        with driver.session(database=db) as session:
            result, info = session.read_transaction(run_query, cypher)
        return result
    except Exception as x:
        raise type(x)(f"read_transaction: {cyper}\n" + str(x))

In [None]:
print("Connect to Neo4J on AI Bench")
aib_neo4j_driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PW))

In [None]:
show_dbs_query = """
SHOW DATABASES YIELD * WHERE name <> 'neo4j' AND name <> 'system'
"""

In [None]:
# Run the "show databases" statement as a transaction, using default database
db_result = query_to_df(cypher=show_dbs_query, driver=aib_neo4j_driver)

In [None]:
# Build list of databases names (other than system/neo4j databases)
AVAILABLE_DBS = list(db_result["name"])

## Connect to each database and gather information on cases

In [None]:
# Descriptives for Nodes
node_count_query = """
CALL db.labels() YIELD label
CALL apoc.cypher.run('MATCH (:`' + label + '`) RETURN count(*) AS count',{}) YIELD value
RETURN label AS Node, value.count AS Count
ORDER BY Count DESC
"""

relationship_count_query = """
CALL db.relationshipTypes() YIELD relationshipType as label
CALL apoc.cypher.run('MATCH ()-[:`' + label + '`]->() RETURN count(*) AS count',{}) YIELD value
RETURN label AS Relationship, value.count AS Count
ORDER BY Count DESC
"""

In [None]:
result_node_counts = {}
result_edge_counts = {}
for db in AVAILABLE_DBS:
    result_node_counts[db] = query_to_df(
        cypher=node_count_query, driver=aib_neo4j_driver, db=db
    )
    result_node_counts[db].columns = ["Node", db]
    result_edge_counts[db] = query_to_df(
        cypher=relationship_count_query, driver=aib_neo4j_driver, db=db
    )
    result_edge_counts[db].columns = ["Relationship", db]

In [None]:
final_node_counts = reduce(
    lambda left, right: pd.merge(left, right, on=["Node"], how="outer",),
    list(result_node_counts.values()),
)

final_edge_counts = reduce(
    lambda left, right: pd.merge(left, right, on=["Relationship"], how="outer",),
    list(result_edge_counts.values()),
)

### Node Comparisons

In [None]:
final_node_counts

### Relationship (Edge) Comparison

In [None]:
final_edge_counts