<a href="https://colab.research.google.com/github/di2ag/chp_client/blob/main/notebooks/StandardProbabilisticQueriesTutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Install the CHP client
!pip install -e git+https://github.com/di2ag/chp_client#egg=chp_client

Obtaining chp_client from git+https://github.com/di2ag/chp_client#egg=chp_client
  Updating ./src/chp-client clone
  Running command git fetch -q --tags
  Running command git reset --hard -q b62941cbdad3ab66c1e02b9b929bbcda2c12360d
Installing collected packages: chp-client
  Found existing installation: chp-client 1.0.0
    Can't uninstall 'chp-client'. No files were found to uninstall.
  Running setup.py develop for chp-client
Successfully installed chp-client


In [2]:
# Import modules
import site
site.main()
from chp_client import get_client
import json

In [3]:
# A function to print json objects pretty
def pretty_print(json_obj):
  print(json.dumps(json_obj, indent=2))

## The Basics

In [4]:
# Instantiate a client
client = get_client()

In [5]:
# Check the predicates and available curies
curies = client.curies()
predicates = client.predicates()

In [6]:
# Curies returns a nested dictionary of form curies[{BiolinkEntityType}: {CurieID}: [{List of english names and synonyms}]]
biolink_entities = list(curies.keys())
print(biolink_entities)

['biolink:Gene', 'biolink:Drug', 'biolink:PhenotypicFeature']


In [7]:
# Prediects returns a nested dictionary of the form predicates[EdgeSourceType][EdgeTargetType] = {Options}
pretty_print(predicates)

{
  "biolink:Gene": {
    "biolink:Disease": [
      "biolink:GeneToDiseaseAssociation"
    ]
  },
  "biolink:Drug": {
    "biolink:Disease": [
      "biolink:ChemicalToDiseaseOrPhenotypicFeatureAssociation"
    ],
    "biolink:Gene": [
      "biolink:ChemicalToGeneAssociation"
    ]
  },
  "biolink:Disease": {
    "biolink:PhenotypicFeature": [
      "biolink:DiseaseToPhenotypicFeatureAssociation"
    ]
  }
}


## Query Building
### Single Standard Probablistic Query

In [8]:
# First import the query building helper utility
from chp_client.query import build_query

In [9]:
# Let's build a simple single query
q = build_query(genes = ['ENSEMBL:ENSG00000132155'],
                therapeutic='CHEMBL:CHEMBL88',
                disease='MONDO:0007254',
                outcome=('EFO:0000714', '>=', 1000))
pretty_print(q)

{
  "message": {
    "query_graph": {
      "edges": {
        "e0": {
          "predicate": "biolink:GeneToDiseaseAssociation",
          "subject": "n0",
          "object": "n2"
        },
        "e1": {
          "predicate": "biolink:ChemicalToDiseaseOrPhenotypicFeatureAssociation",
          "subject": "n1",
          "object": "n2"
        },
        "e2": {
          "predicate": "biolink:DiseaseToPhenotypicFeatureAssociation",
          "subject": "n2",
          "object": "n3",
          "properties": {
            "qualifier": ">=",
            "days": 1000
          }
        }
      },
      "nodes": {
        "n0": {
          "category": "biolink:Gene",
          "id": "ENSEMBL:ENSG00000132155"
        },
        "n1": {
          "category": "biolink:Drug",
          "id": "CHEMBL:CHEMBL88"
        },
        "n2": {
          "category": "biolink:Disease",
          "id": "MONDO:0007254"
        },
        "n3": {
          "category": "biolink:PhenotypicFeature",
  

In [10]:
# Now let's run the query
res = client.query(q)
pretty_print(res)

{
  "message": {
    "query_graph": {
      "edges": {
        "e0": {
          "predicate": "biolink:GeneToDiseaseAssociation",
          "subject": "n0",
          "object": "n2"
        },
        "e1": {
          "predicate": "biolink:ChemicalToDiseaseOrPhenotypicFeatureAssociation",
          "subject": "n1",
          "object": "n2"
        },
        "e2": {
          "predicate": "biolink:DiseaseToPhenotypicFeatureAssociation",
          "subject": "n2",
          "object": "n3",
          "properties": {
            "qualifier": ">=",
            "days": 1000
          }
        }
      },
      "nodes": {
        "n0": {
          "category": "biolink:Gene",
          "id": "ENSEMBL:ENSG00000132155"
        },
        "n1": {
          "category": "biolink:Drug",
          "id": "CHEMBL:CHEMBL88"
        },
        "n2": {
          "category": "biolink:Disease",
          "id": "MONDO:0007254"
        },
        "n3": {
          "category": "biolink:PhenotypicFeature",
  

In [11]:
# Now let's extract the important probabilistic information from the response
prob = client.get_outcome_prob(res)
print('Probability of survival',prob)

Probability of survival 0.5


### Batch queries

In [12]:
# Now let's build and run a batch of queries
genes = list(curies['biolink:Gene'].keys())[:50]
therapeutic='CHEMBL:CHEMBL88'
disease='MONDO:0007254'
outcome=('EFO:0000714', '>=', 1000)

# Now we will iterate through the genes building a query for each gene in the genes ilst.
queries = []
for _gene in genes:
  queries.append(build_query(
      genes=[_gene],
      therapeutic=therapeutic,
      disease=disease,
      outcome=outcome,
  ))

In [13]:
# Now use the query_all endpoint to run the batch of queries
res = client.query_all(queries)

In [14]:
# Now let's extract the probabilities of each of these. The resultant message is
# a list corresponding to your batch, so make sure to extract appropriately.
for result, _gene in zip(res["message"], genes):
  prob = client.get_outcome_prob(result)
  print('Probability of survival for {_gene} =', prob)

Probability of survival for {_gene} = 1.0
Probability of survival for {_gene} = 0.6666666666666666
Probability of survival for {_gene} = 0.5
Probability of survival for {_gene} = -1
Probability of survival for {_gene} = 0.5
Probability of survival for {_gene} = -1
Probability of survival for {_gene} = 0
Probability of survival for {_gene} = 0.5
Probability of survival for {_gene} = -1
Probability of survival for {_gene} = 0
Probability of survival for {_gene} = 1.0
Probability of survival for {_gene} = 0.6666666666666666
Probability of survival for {_gene} = 0.3333333333333333
Probability of survival for {_gene} = -1
Probability of survival for {_gene} = 1.0
Probability of survival for {_gene} = 0.5
Probability of survival for {_gene} = -1
Probability of survival for {_gene} = 0.4285714285714286
Probability of survival for {_gene} = 1.0
Probability of survival for {_gene} = 1.0
Probability of survival for {_gene} = 1.0
Probability of survival for {_gene} = 1.0
Probability of survival f

## What does -1 Probability mean?

You will notice that some answers give a probability of -1. This is not a bug, it's a feature. This means that we were not able to make an inference about the query. Therefore, you can treat these probabilities as zeros or present the result to the user as something different like, "not enough information", "no inference can be made", etc. Since our KP can handle incomplete information, semantically a -1 result is a little bit different than 0 and we like to classify that difference for you.