# Example Query for Survival Probability of 1-hop Queries

Queries our system in the form of:<br>
$P(survival\_time > X | Drug)$<br>
Returned is a knowledge graph containing genes that contributed strongly to the question of survival time w.r.t a drug. It is our hope that these have some indication of gene sensitivites.

In [1]:
import requests
import json
import csv

# /predicate functionality example
By running /predicates you can extract a json object with the following predicates:<br>
1.) biolink:affects<br>
2.) biolink:has_phenotype<br>

The above predicates link the following biolink entities:<br>
1.) biolink:Gene<br>
2.) biolink:Drug<br>
3.) biolink:Disease<br>
4.) biolink:PhenotypicFeature<br>

In [2]:
r = requests.get('http://chp.thayer.dartmouth.edu/predicates/')
json_formatted_str = json.dumps(json.loads(r.content), indent=2)
print(json_formatted_str)

{
  "biolink:Gene": {
    "biolink:Disease": [
      "biolink:gene_associated_with_condition"
    ],
    "biolink:Drug": [
      "biolink:interacts_with"
    ]
  },
  "biolink:Drug": {
    "biolink:Disease": [
      "biolink:treats"
    ],
    "biolink:Gene": [
      "biolink:interacts_with"
    ]
  },
  "biolink:Disease": {
    "biolink:PhenotypicFeature": [
      "biolink:has_phenotype"
    ]
  }
}


# Build Query
Constructs a json query object and can take in a survival time single drug. The gene node will be left without a curie and ultimately in the KG will be replace with a series of contributing genes.

In [3]:
# Function: buildQuery
#
# Input:
# -----------
# a single drug
#
# Output:
# -----------
# A query graph that asks this probablistic question: 
# P(survival_time > X | Drug = d1)

def buildQuery(st, drug):
    
    # empty response
    reasoner_std = { "query_graph": dict(),
                   }
    # empty query graph
    reasoner_std["query_graph"] = { "edges": dict(),
                                    "nodes": dict()
                                  }

    # wildcard gene slot
    reasoner_std['query_graph']['nodes']['n0'] = { 'category':'biolink:Gene'
                                                 }
    
    # drug
    reasoner_std['query_graph']['nodes']['n1'] = { 'category':'biolink:Drug',
                                                   'id':'{}'.format(drug[1])
                                                 }
    
    # link drug to gene
    reasoner_std['query_graph']['edges']['e0'] = { 'predicate':'biolink:interacts_with',
                                                   'subject': 'n0',
                                                   'object': 'n1'
                                                 }
    return reasoner_std

# Read Drugs
Functionality to read in our set of available drugs with respective chembl curie IDs.

In [4]:
def readDrugs():
    with open('drug_curie_map.csv', 'r') as drug_file:
        reader = csv.reader(drug_file)
        next(reader)
        rows = [(row[0],row[1]) for row in reader]
    return rows

# Constructing the Query and pinging CHP
You can use the commented out functionality to check which drugs are available. A single drug tuple may be passed in as evidence.

In [5]:
# list of drugs (and curies) we can query over
#drug_list = readDrugs()

survival_time = 1000
drug = ('CYCLOPHOSPHAMIDE', 'CHEMBL:CHEMBL88')

query = buildQuery(survival_time, drug)
payload = {'message': query}

#increase max_results
payload['max_results'] = 10

r = requests.post('http://chp.thayer.dartmouth.edu/query/', json=payload)
chp_res = json.loads(r.content)

# Extract sensitive genes
Sensitivty values range between -1 and 1. Genes closer to -1 can be thought of as having contributed more to the false assignment of $P(survival\_time > X | Drug)$. Similarly genes closer to 1 can be thought of as having contributed more to the true assignment. Gene sensitivities are order by their absolute value.

## Extract sensitive gene rankings

In [6]:
KG = chp_res['message']['knowledge_graph']
QG = chp_res['message']['query_graph']
results = chp_res['message']['results']

# holds gene sensitivites
sensitivity_results = results[1:]

genes = []
for sr in sensitivity_results:
    for qge_id in sr['edge_bindings'].keys():
        if QG['edges'][qge_id]['predicate'] == 'biolink:interacts_with':
            kge_id = sr['edge_bindings'][qge_id][0]['id']
            sensitivity = KG['edges'][kge_id]
            gene_curie = sensitivity['subject']
            gene_weight = sensitivity['attributes'][0]['value']    
    for qgn_id in sr['node_bindings'].keys():
        if QG['nodes'][qgn_id]['category'] == 'biolink:Gene':
            kgn_id = sr['node_bindings'][qgn_id][0]['id']
            gene_name = KG['nodes'][kgn_id]['name']
    genes.append((gene_name, gene_curie, gene_weight))
    
for gene in genes:
    print(gene)

('WNK3', 'ENSEMBL:ENSG00000196632', -0.029651380276518785)
('ROBO1', 'ENSEMBL:ENSG00000169855', -0.028486670606384976)
('MYCBP2', 'ENSEMBL:ENSG00000005810', -0.027851218557822646)
('SGIP1', 'ENSEMBL:ENSG00000118473', -0.027499445254417717)
('MUC16', 'ENSEMBL:ENSG00000181143', -0.026864718756743795)
('RYR2', 'ENSEMBL:ENSG00000198626', 0.02590807163352221)
('ERBB2', 'ENSEMBL:ENSG00000141736', -0.02417889048482236)
('CDH1', 'ENSEMBL:ENSG00000039068', -0.023975780606831892)
('SPTA1', 'ENSEMBL:ENSG00000163554', 0.023688321656584372)
