# Example Query for Survival Probability of 1-hop Queries

Queries our system in the form of:<br>
$P(survival\_time > X | Drug)$<br>
Returned is a knowledge graph containing genes that contributed strongly to the question of survival time w.r.t a drug. It is our hope that these have some indication of gene sensitivites.

In [1]:
import requests
import json
import csv

# /predicate functionality example
By running /predicates you can extract a json object with the following predicates:<br>
1.) biolink:GeneToDiseaseAssociation<br>
2.) biolink:ChemicalToDiseaseOrPhenotypicFeatureAssociation<br>
3.) biolink:ChemicalToGeneAssociation<br>
4.) biolink:DiseaseToPhenotypicFeatureAssociation<br>

The above predicates match the following biolink entities:<br>
1.) biolink:Gene<br>
2.) biolink:Drug<br>
3.) biolink:Disease<br>
4.) biolink:PhenotypicFeature<br>

It should be noted for this handler that only the edge predicate, biolink:ChemicalToGeneAssociation will be used to connect a biolink:Drug to a biolink:Gene.

In [2]:
r = requests.get('http://chp.thayer.dartmouth.edu/predicates/')
json_formatted_str = json.dumps(json.loads(r.content), indent=2)
print(json_formatted_str)

{
  "biolink:Gene": {
    "biolink:Disease": [
      "biolink:GeneToDiseaseAssociation"
    ]
  },
  "biolink:Drug": {
    "biolink:Disease": [
      "biolink:ChemicalToDiseaseOrPhenotypicFeatureAssociation"
    ],
    "biolink:Gene": [
      "biolink:ChemicalToGeneAssociation"
    ]
  },
  "biolink:Disease": {
    "biolink:PhenotypicFeature": [
      "biolink:DiseaseToPhenotypicFeatureAssociation"
    ]
  }
}


# Build Query
Constructs a json query object and can take in a survival time single drug. The gene node will be left without a curie and ultimately in the KG will be replace with a series of contributing genes.

In [3]:
# Function: buildQuery
#
# Input:
# -----------
# a single drug
#
# Output:
# -----------
# A query graph that asks this probablistic question: 
# P(survival_time > X | Drug = d1)

def buildQuery(st, drug):
    
    # empty response
    reasoner_std = { "query_graph": dict(),
                     "knowledge_graph": dict(),
                     "results": list()
                   }
    # empty query graph
    reasoner_std["query_graph"] = { "edges": dict(),
                                    "nodes": dict()
                                  }
    # empty knowledge graph
    reasoner_std["knowledge_graph"] = { "edges": dict(),
                                        "nodes": dict()
                                      }

    # drug
    reasoner_std['query_graph']['nodes']['n0'] = { 'category':'biolink:Drug',
                                                   'id':'{}'.format(drug[1])
                                                 }
    
    # wildcard gene slot
    reasoner_std['query_graph']['nodes']['n1'] = { 'category':'biolink:Gene'
                                                 }

    
    # link drug to gene
    reasoner_std['query_graph']['edges']['e0'] = { 'predicate':'biolink:ChemicalToGeneAssociation',
                                                   'subject': 'n0',
                                                   'object': 'n1'
                                                 }
    return reasoner_std

# Read Drugs
Functionality to read in our set of available drugs with respective chembl curie IDs.

In [4]:
def readDrugs():
    with open('drug_curie_map.csv', 'r') as drug_file:
        reader = csv.reader(drug_file)
        next(reader)
        rows = [(row[0],row[1]) for row in reader]
    return rows

# Constructing the Query and pinging CHP
You can use the commented out functionality to check which drugs are available. A single drug tuple may be passed in as evidence.

In [5]:
# list of drugs (and curies) we can query over
#drug_list = readDrugs()

survival_time = 1000
drug = ('CYCLOPHOSPHAMIDE', 'CHEMBL:CHEMBL88')

query = buildQuery(survival_time, drug)
payload = {'message': query}

r = requests.post('http://chp.thayer.dartmouth.edu/query/', json=payload)
chp_res = json.loads(r.content)

# Extract sensitive genes
Sensitivty values range between -1 and 1. Genes closer to -1 can be thought of as having contributed more to the false assignment of $P(survival\_time > X | Drug)$. Similarly genes closer to 1 can be thought of as having contributed more to the true assignment. Gene sensitivities are order by their absolute value.

## Extract sensitive gene rankings

In [6]:
KG = chp_res['message']['knowledge_graph']
QG = chp_res['message']['query_graph']
results = chp_res['message']['results']

# holds gene sensitivites
sensitivity_results = results[1:]

genes = []
for sr in sensitivity_results:
    for qge_id in sr['edge_bindings'].keys():
        if QG['edges'][qge_id]['predicate'] == 'biolink:ChemicalToGeneAssociation':
            kge_id = sr['edge_bindings'][qge_id][0]['id']
            sensitivity = KG['edges'][kge_id]
            gene_curie = sensitivity['subject']
            gene_weight = sensitivity['value']    
    for qgn_id in sr['node_bindings'].keys():
        if QG['nodes'][qgn_id]['category'] == 'biolink:Gene':
            kgn_id = sr['node_bindings'][qgn_id][0]['id']
            gene_name = KG['nodes'][kgn_id]['name']
    genes.append((gene_name, gene_curie, gene_weight))
    
for gene in genes:
    print(gene)

('PIK3CA', 'CHEMBL:CHEMBL88', 0.015628888557065554)
('WNK3', 'CHEMBL:CHEMBL88', -0.014322333383106955)
('MUC16', 'CHEMBL:CHEMBL88', -0.013899258374396584)
('ROBO1', 'CHEMBL:CHEMBL88', -0.013812154696132686)
('SGIP1', 'CHEMBL:CHEMBL88', -0.013812154696132686)
('MYCBP2', 'CHEMBL:CHEMBL88', -0.013812154696132686)
('RYR2', 'CHEMBL:CHEMBL88', 0.013725051017868792)
('CDH1', 'CHEMBL:CHEMBL88', -0.012368722313473766)
('ABCA13', 'CHEMBL:CHEMBL88', -0.012070081130854688)
