# Introduction

This notebook demonstrates how BioThings Explorer can be used to answer the following query:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*"Finding Marketed Drugs that Might Treat an Unknown Syndrome by Perturbing the Disease Mechanism Pathway"*

![](img/tidbit4_300.png)

This query corresponds to [Tidbit 4](https://ncats.nih.gov/tidbit/tidbit_04.html) which was formulated as a demonstration of the NCATS Translator program.

**Background of BTE**: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/EXPLAIN_demo.ipynb), and PREDICT queries are described in [PREDICT_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/PREDICT_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them.  A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

**Background of TIDBIT 04**:

A five-year-old patient was brought to the emergency room with recurrent polymicrobial lung infections, only 29% small airway function and was unresponsive to antibiotics.

The patient’s medical records included a genetics report from age 1, which showed a 1p34.1 chromosomal duplication encompassing 1.9 Mb, including the PRDX1 gene, which encodes Peroxiredoxin 1. The gene has been linked to airway disease in both rats and humans, and is known to act as an agonist of toll-like receptor 4 (TLR4), a pro-inflammatory receptor. In addition, two patients at another clinic were found to have 1p34.1 duplications:

1. One patient with a duplication including PRDX1 died with similar phenotypes
2. One patient with a duplication that did NOT include PRDX1 showed no airway disease phenotype
While recurrent lung infections are typically treated with antibiotics, this patient was unresponsive to standard treatments. The patient’s earlier genetics report and data from other patients with similar duplications gave the physician evidence that PRDX1 may play a role in the disease, but no treatments directly related to the gene were known. With this information in mind, the physician asked a researcher familiar with Translator to try to find possible treatments for this patient.

**How Might Translator Help?**
The patient’s duplication of the 1p34.1 region of chromosome 1 gave Translator researchers a good place to start. Since PRDX1 is an agonist of TLR4, the duplication of the PRDX1 gene likely causes overexpression of PRDX1, which could lead to overactivity of both of the gene products. The researcher decided to try to find drugs that could be used to reduce the activity of those two proteins. An exhaustive search of chemical databases and PubMed to find safe drug options could take days to weeks.

For a known genetic mutation, can Translator be used to quickly find existing modulators to compensate for the dysfunctional gene product?

## Step 1: Find representation of "PRDX1" in BTE

In this step, BioThings Explorer translates our query string "PRDX1"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [1]:
from biothings_explorer.hint import Hint
ht = Hint()
prdx1 = ht.query("PRDX1")['Gene'][0]

prdx1

{'entrez': '5052',
 'name': 'peroxiredoxin 1',
 'symbol': 'PRDX1',
 'taxonomy': 9606,
 'umls': 'C1418879',
 'uniprot': 'Q06830',
 'hgnc': '9352',
 'ensembl': 'ENSG00000117450',
 'display': 'entrez(5052) name(peroxiredoxin 1) symbol(PRDX1) taxonomy(9606) umls(C1418879) uniprot(Q06830) hgnc(9352) ensembl(ENSG00000117450) ',
 'type': 'Gene',
 'primary': {'identifier': 'entrez', 'cls': 'Gene', 'value': '5052'}}

## Step 2: Find drugs that are associated with genes which are associated with PRDX1

In this section, we find all paths in the knowledge graph that connect PRDX1 to any entity that is a chemical compound.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 

The parameters for `FindConnection` are described below:


In [2]:
from biothings_explorer.user_query_dispatcher import FindConnection

fc = FindConnection(input_obj=prdx1, output_obj='ChemicalSubstance', intermediate_nodes=['Gene'])
fc.connect(verbose=True)


BTE will find paths that join 'PRDX1' and 'ChemicalSubstance'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene





==== Step #1: Query path planning ====

Because PRDX1 is of type 'Gene', BTE will query our meta-KG for APIs that can take 'Gene' as input and 'Gene' as output

BTE found 3 apis:

API 1. mygene.info(3 API calls)
API 2. semmedgene(11 API calls)
API 3. biolink_geneinteraction(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 1.1: http://mygene.info/v3/query (POST "q=5052&scopes=entrezgene&fields=pantherdb.ortholog&species=human&size=100")
API 1.2: http://mygene.info/v3/query (POST "q=ENSG00000117450&scopes=pantherdb.ortholog.Ensembl&fields=entrezgene&species=human&size=100")
API 1.3: http://mygene.info/v3/query (POST "q=9352&scopes=pantherdb.ortholog.HGNC&fields=entrezgene&species=human&size=100")
API 2.2: https://pen

API 2.13: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000146648&datasource=chembl&size=15&fields=drug
API 3.25: http://www.dgidb.org/api/v2/interactions.json?genes=UCHL5
API 3.26: http://www.dgidb.org/api/v2/interactions.json?genes=RC3H1
API 3.6: http://www.dgidb.org/api/v2/interactions.json?genes=PRRX1
API 3.52: http://www.dgidb.org/api/v2/interactions.json?genes=DCUN1D1
API 2.18: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000167193&datasource=chembl&size=15&fields=drug
API 3.27: http://www.dgidb.org/api/v2/interactions.json?genes=YAP1
API 2.16: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000023191&datasource=chembl&size=15&fields=drug
API 3.57: http://www.dgidb.org/api/v2/interactions.json?genes=FYN
API 2.15: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000157326&datasource=chembl&size=15&fields=drug
API 3.23: http://www.dgidb

API 2.64: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000285441&datasource=chembl&size=15&fields=drug
API 3.130: http://www.dgidb.org/api/v2/interactions.json?genes=PEBP1
API 3.129: http://www.dgidb.org/api/v2/interactions.json?genes=LARP7
API 2.60: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000063244&datasource=chembl&size=15&fields=drug
API 2.66: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000142208&datasource=chembl&size=15&fields=drug
API 3.131: http://www.dgidb.org/api/v2/interactions.json?genes=CUL1
API 3.132: http://www.dgidb.org/api/v2/interactions.json?genes=CUL4A
API 2.62: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000143543&datasource=chembl&size=15&fields=drug
API 2.68: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000204490&datasource=chembl&size=15&fields=drug
API 3.133: http

API 2.99: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000138823&datasource=chembl&size=15&fields=drug
API 2.114: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000134954&datasource=chembl&size=15&fields=drug
API 2.116: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000160957&datasource=chembl&size=15&fields=drug
API 2.126: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000100823&datasource=chembl&size=15&fields=drug
API 2.131: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000001626&datasource=chembl&size=15&fields=drug
API 2.129: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000185345&datasource=chembl&size=15&fields=drug
API 2.107: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000012048&datasource=chembl&size=15&fields=drug


API 4.44: https://robokop.renci.org/api/simple/expand/gene/HGNC:8630/chemical_substance/
API 4.53: https://robokop.renci.org/api/simple/expand/gene/HGNC:18424/chemical_substance/
API 4.50: https://robokop.renci.org/api/simple/expand/gene/HGNC:6201/chemical_substance/
API 4.52: https://robokop.renci.org/api/simple/expand/gene/HGNC:9782/chemical_substance/
API 4.49: https://robokop.renci.org/api/simple/expand/gene/HGNC:2264/chemical_substance/
API 4.54: https://robokop.renci.org/api/simple/expand/gene/HGNC:11701/chemical_substance/
API 4.59: https://robokop.renci.org/api/simple/expand/gene/HGNC:9312/chemical_substance/
API 4.58: https://robokop.renci.org/api/simple/expand/gene/HGNC:11179/chemical_substance/
API 4.51: https://robokop.renci.org/api/simple/expand/gene/HGNC:3488/chemical_substance/
API 4.48: https://robokop.renci.org/api/simple/expand/gene/HGNC:16753/chemical_substance/
API 4.64: https://robokop.renci.org/api/simple/expand/gene/HGNC:24500/chemical_substance/
API 4.61: https:

API 3.144 dgidb_gene2chemical: 1 hits
API 3.145 dgidb_gene2chemical: No hits
API 3.146 dgidb_gene2chemical: No hits
API 1.1 mychem.info: 19 hits
API 1.2 mychem.info: 518 hits
API 1.3 mychem.info: 301 hits
API 2.1 opentarget: No hits
API 2.2 opentarget: 15 hits
API 2.3 opentarget: No hits
API 2.4 opentarget: No hits
API 2.5 opentarget: No hits
API 2.6 opentarget: No hits
API 2.7 opentarget: No hits
API 2.8 opentarget: No hits
API 2.9 opentarget: No hits
API 2.10 opentarget: No hits
API 2.11 opentarget: No hits
API 2.12 opentarget: No hits
API 2.13 opentarget: 15 hits
API 2.14 opentarget: No hits
API 2.15 opentarget: No hits
API 2.16 opentarget: No hits
API 2.17 opentarget: No hits
API 2.18 opentarget: No hits
API 2.19 opentarget: No hits
API 2.20 opentarget: No hits
API 2.21 opentarget: No hits
API 2.22 opentarget: 2 hits
API 2.23 opentarget: No hits
API 2.24 opentarget: No hits
API 2.25 opentarget: No hits
API 2.26 opentarget: No hits
API 2.27 opentarget: No hits
API 2.28 opentarget: 1

API 5.1 semmedgene: 25595 hits

After id-to-object translation, BTE retrieved 6103 unique objects.



In the #1 query, BTE found 218 unique Gene nodes
In the #2 query, BTE found 6103 unique ChemicalSubstance nodes


In [3]:
df = fc.display_table_view()


The df object contains the full output from BioThings Explorer. Each row shows one path that joins the input node (PRDX1) to an intermediate node (a gene or protein) to an ending node (a chemical compound). The data frame includes a set of columns with additional details on each node and edge (including human-readable labels, identifiers, and sources). Let's remove all examples where the output_name (the compound label) is None, and specifically focus on paths with specific mechanistic predicates decreasesActivityOf and targetedBy.

### Filter for drugs that targets genes which decrease the activity of PRDX1

In [15]:
dfFilt = df.loc[df['output_name'].notnull()].query('pred1 == "decreasesActivityOf" and pred2 == "targetedBy"')
dfFilt

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_id,node1_name,node1_type,pred2,pred2_source,pred2_api,pred2_pubmed,output_id,output_name,output_type
215,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:7422,VEGFA,Gene,targetedBy,mychem.info,mychem.info,,drugbank:DB14864,DB14864,ChemicalSubstance
346,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:7422,VEGFA,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL799,CILOSTAZOL,ChemicalSubstance
512,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:1385,CREB1,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL3,NICOTINE,ChemicalSubstance
793,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:7124,TNF,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL1201831,CERTOLIZUMAB PEGOL,ChemicalSubstance
899,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:7124,TNF,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL173373,ethyl pyruvate,ChemicalSubstance
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32292,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:7124,TNF,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL2103838,DILMAPIMOD,ChemicalSubstance
32321,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:10013,HDAC6,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL98,VORINOSTAT,ChemicalSubstance
32322,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:10013,HDAC6,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL98,VORINOSTAT,ChemicalSubstance
32652,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,,entrez:7422,VEGFA,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL16,PHENYTOIN,ChemicalSubstance


## Step 3: Evaluating Paths based on published pathway figures

Let's see if PRDX1 (entrez:5052) is in the same pathway as VEGFA (entrez:7422) using our newly created API [PFOCR](http://pending.biothings.io/pfocr)

In [16]:
import requests

# query pfocr to see if PRDX1 and VEGFA is in the same pathway figure
doc = requests.get('https://pending.biothings.io/pfocr/query?q=associatedWith.genes:5052 AND associatedWith.genes:7422').json()
doc

{'max_score': 10.06258,
 'took': 1,
 'total': 1,
 'hits': [{'_id': 'PMC4776098__srep22392-f1.jpg',
   '_score': 10.06258,
   'associatedWith': {'figureUrl': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4776098/bin/srep22392-f1.jpg',
    'genes': ['1000',
     '10013',
     '10451',
     '10540',
     '10657',
     '1072',
     '1147',
     '116085',
     '1232',
     '1410',
     '1499',
     '1600',
     '161003',
     '1654',
     '1956',
     '207',
     '2113',
     '2146',
     '2194',
     '226',
     '2308',
     '2309',
     '2324',
     '2475',
     '2495',
     '25',
     '25759',
     '2885',
     '2902',
     '3091',
     '3181',
     '3326',
     '3329',
     '3397',
     '3399',
     '3717',
     '3791',
     '3921',
     '3927',
     '399694',
     '4092',
     '4171',
     '4172',
     '4176',
     '4282',
     '4306',
     '4602',
     '4609',
     '4627',
     '4628',
     '4846',
     '4869',
     '5013',
     '5052',
     '5058',
     '5062',
     '5063',
     '508

![](img/srep22392-f1.jpg)