# Introduction

This notebook demonstrates how BioThings Explorer can be used to answer the following query:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*"Finding Marketed Drugs that Might Treat an Unknown Syndrome by Perturbing the Disease Mechanism Pathway"*

![](img/tidbit4_300.png)

This query corresponds to [Tidbit 4](https://ncats.nih.gov/tidbit/tidbit_04.html) which was formulated as a demonstration of the NCATS Translator program.

**Background of BTE**: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/EXPLAIN_demo.ipynb), and PREDICT queries are described in [PREDICT_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/PREDICT_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them.  A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

**To experiment with an executable version of this notebook, [load it in Google Colaboratory](https://colab.research.google.com/github/biothings/biothings_explorer/blob/master/jupyter%20notebooks/TIDBIT%2004%20Finding%20Marketed%20Drugs%20that%20Might%20Treat%20an%20Unknown%20Syndrome%20by%20Perturbing%20the%20Disease%20Mechanism%20Pathway.ipynb).**

**Background of TIDBIT 04**:

A five-year-old patient was brought to the emergency room with recurrent polymicrobial lung infections, only 29% small airway function and was unresponsive to antibiotics.

The patient’s medical records included a genetics report from age 1, which showed a 1p34.1 chromosomal duplication encompassing 1.9 Mb, including the PRDX1 gene, which encodes Peroxiredoxin 1. The gene has been linked to airway disease in both rats and humans, and is known to act as an agonist of toll-like receptor 4 (TLR4), a pro-inflammatory receptor. In addition, two patients at another clinic were found to have 1p34.1 duplications:

1. One patient with a duplication including PRDX1 died with similar phenotypes
2. One patient with a duplication that did NOT include PRDX1 showed no airway disease phenotype
While recurrent lung infections are typically treated with antibiotics, this patient was unresponsive to standard treatments. The patient’s earlier genetics report and data from other patients with similar duplications gave the physician evidence that PRDX1 may play a role in the disease, but no treatments directly related to the gene were known. With this information in mind, the physician asked a researcher familiar with Translator to try to find possible treatments for this patient.

**How Might Translator Help?**
The patient’s duplication of the 1p34.1 region of chromosome 1 gave Translator researchers a good place to start. Since PRDX1 is an agonist of TLR4, the duplication of the PRDX1 gene likely causes overexpression of PRDX1, which could lead to overactivity of both of the gene products. The researcher decided to try to find drugs that could be used to reduce the activity of those two proteins. An exhaustive search of chemical databases and PubMed to find safe drug options could take days to weeks.

For a known genetic mutation, can Translator be used to quickly find existing modulators to compensate for the dysfunctional gene product?

## Step 0: Load BioThings Explorer modules

First, install the `biothings_explorer` and `biothings_schema` packages, as described in this [README](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/README.md#prerequisite).  This only needs to be done once (but including it here for compability with [colab](https://colab.research.google.com/)).

In [None]:
!pip install git+https://github.com/biothings/biothings_explorer#egg=biothings_explorer

## Step 1: Find representation of "PRDX1" in BTE

In this step, BioThings Explorer translates our query string "PRDX1"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [1]:
from biothings_explorer.hint import Hint
ht = Hint()
prdx1 = ht.query("PRDX1")['Gene'][0]

prdx1

{'entrez': '5052',
 'name': 'peroxiredoxin 1',
 'symbol': 'PRDX1',
 'taxonomy': 9606,
 'umls': 'C1418879',
 'uniprot': 'Q06830',
 'hgnc': '9352',
 'ensembl': 'ENSG00000117450',
 'display': 'entrez(5052) name(peroxiredoxin 1) symbol(PRDX1) taxonomy(9606) umls(C1418879) uniprot(Q06830) hgnc(9352) ensembl(ENSG00000117450) ',
 'type': 'Gene',
 'primary': {'identifier': 'entrez', 'cls': 'Gene', 'value': '5052'}}

## Step 2: Find drugs that are associated with genes which are associated with PRDX1

In this section, we find all paths in the knowledge graph that connect PRDX1 to any entity that is a chemical compound.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 

The parameters for `FindConnection` are described below:


In [2]:
from biothings_explorer.user_query_dispatcher import FindConnection

fc = FindConnection(input_obj=prdx1, output_obj='ChemicalSubstance', intermediate_nodes=['Gene'])
fc.connect(verbose=True)


BTE will find paths that join 'PRDX1' and 'ChemicalSubstance'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene




==== Step #1: Query path planning ====

Because PRDX1 is of type 'Gene', BTE will query our meta-KG for APIs that can take 'Gene' as input and 'Gene' as output

BTE found 3 apis:

API 1. biolink_geneinteraction(1 API call)
API 2. semmedgene(11 API calls)
API 3. mygene.info(3 API calls)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 3.1: http://mygene.info/v3/query (POST "q=5052&scopes=entrezgene&fields=pantherdb.ortholog&species=human&size=100")
API 3.3: http://mygene.info/v3/query (POST "q=9352&scopes=pantherdb.ortholog.HGNC&fields=entrezgene&species=human&size=100")
API 3.2: http://mygene.info/v3/query (POST "q=ENSG00000117450&scopes=pantherdb.ortholog.Ensembl&fields=entrezgene&species=human&size=100")
API 2.4: https://pend

API 4.59: http://www.dgidb.org/api/v2/interactions.json?genes=PRKDC
API 4.33: http://www.dgidb.org/api/v2/interactions.json?genes=EIF4A3
API 3.17: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000135870&datasource=chembl&size=15&fields=drug
API 3.7: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000204490&datasource=chembl&size=15&fields=drug
API 4.66: http://www.dgidb.org/api/v2/interactions.json?genes=RNH1
API 4.32: http://www.dgidb.org/api/v2/interactions.json?genes=EEF1A1
API 3.8: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000123131&datasource=chembl&size=15&fields=drug
API 3.14: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000124006&datasource=chembl&size=15&fields=drug
API 3.16: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000167815&datasource=chembl&size=15&fields=drug
API 4.43: http://ww

API 4.117: http://www.dgidb.org/api/v2/interactions.json?genes=CYLD
API 4.116: http://www.dgidb.org/api/v2/interactions.json?genes=PTPRC
API 3.53: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000284807&datasource=chembl&size=15&fields=drug
API 4.119: http://www.dgidb.org/api/v2/interactions.json?genes=CUL1
API 4.120: http://www.dgidb.org/api/v2/interactions.json?genes=SLC38A1
API 3.54: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000256618&datasource=chembl&size=15&fields=drug
API 4.121: http://www.dgidb.org/api/v2/interactions.json?genes=DHRS4
API 3.55: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000105401&datasource=chembl&size=15&fields=drug
API 4.122: http://www.dgidb.org/api/v2/interactions.json?genes=PPHLN1
API 4.123: http://www.dgidb.org/api/v2/interactions.json?genes=OTUB1
API 4.126: http://www.dgidb.org/api/v2/interactions.json?genes=PPP2R1A
API 4.118: http://w

API 3.83: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000146834&datasource=chembl&size=15&fields=drug
API 3.91: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000073282&datasource=chembl&size=15&fields=drug
API 3.92: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000136810&datasource=chembl&size=15&fields=drug
API 3.93: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000153071&datasource=chembl&size=15&fields=drug
API 3.89: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000179262&datasource=chembl&size=15&fields=drug
API 3.90: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000136807&datasource=chembl&size=15&fields=drug
API 4.140: http://www.dgidb.org/api/v2/interactions.json?genes=EGFR
API 5.1: https://robokop.renci.org/api/simple/expand/gene/HGNC:1442/chemical_sub

API 5.30: https://robokop.renci.org/api/simple/expand/gene/HGNC:587/chemical_substance/
API 3.117: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000169083&datasource=chembl&size=15&fields=drug
API 3.118: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000198400&datasource=chembl&size=15&fields=drug
API 3.119: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000136869&datasource=chembl&size=15&fields=drug
API 3.120: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000277804&datasource=chembl&size=15&fields=drug
API 5.34: https://robokop.renci.org/api/simple/expand/gene/HGNC:24591/chemical_substance/
API 5.32: https://robokop.renci.org/api/simple/expand/gene/HGNC:177/chemical_substance/
API 3.121: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000089220&datasource=chembl&size=15&fields=drug
API 5.36: https://r

API 3.139: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000137713&datasource=chembl&size=15&fields=drug
API 3.161: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000149781&datasource=chembl&size=15&fields=drug
API 3.162: https://platform-api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000186575&datasource=chembl&size=15&fields=drug
API 5.31: https://robokop.renci.org/api/simple/expand/gene/HGNC:11892/chemical_substance/
API 5.61: https://robokop.renci.org/api/simple/expand/gene/HGNC:10291/chemical_substance/
API 5.65: https://robokop.renci.org/api/simple/expand/gene/HGNC:16846/chemical_substance/
API 5.60: https://robokop.renci.org/api/simple/expand/gene/HGNC:8607/chemical_substance/
API 5.68: https://robokop.renci.org/api/simple/expand/gene/HGNC:23077/chemical_substance/
API 5.69: https://robokop.renci.org/api/simple/expand/gene/HGNC:9642/chemical_substance/
API 5.67: https://robokop.renci.o

API 4.8 dgidb_gene2chemical: No hits
API 4.9 dgidb_gene2chemical: No hits
API 4.10 dgidb_gene2chemical: No hits
API 4.11 dgidb_gene2chemical: No hits
API 4.12 dgidb_gene2chemical: No hits
API 4.13 dgidb_gene2chemical: No hits
API 4.14 dgidb_gene2chemical: 20 hits
API 4.15 dgidb_gene2chemical: 1 hits
API 4.16 dgidb_gene2chemical: No hits
API 4.17 dgidb_gene2chemical: 1 hits
API 4.18 dgidb_gene2chemical: 3 hits
API 4.19 dgidb_gene2chemical: No hits
API 4.20 dgidb_gene2chemical: No hits
API 4.21 dgidb_gene2chemical: No hits
API 4.22 dgidb_gene2chemical: 1 hits
API 4.23 dgidb_gene2chemical: No hits
API 4.24 dgidb_gene2chemical: 67 hits
API 4.25 dgidb_gene2chemical: 2 hits
API 4.26 dgidb_gene2chemical: No hits
API 4.27 dgidb_gene2chemical: No hits
API 4.28 dgidb_gene2chemical: No hits
API 4.29 dgidb_gene2chemical: 1 hits
API 4.30 dgidb_gene2chemical: No hits
API 4.31 dgidb_gene2chemical: 33 hits
API 4.32 dgidb_gene2chemical: 2 hits
API 4.33 dgidb_gene2chemical: No hits
API 4.34 dgidb_gene2c

API 3.103 opentarget: No hits
API 3.104 opentarget: No hits
API 3.105 opentarget: No hits
API 3.106 opentarget: No hits
API 3.107 opentarget: No hits
API 3.108 opentarget: No hits
API 3.109 opentarget: No hits
API 3.110 opentarget: No hits
API 3.111 opentarget: No hits
API 3.112 opentarget: No hits
API 3.113 opentarget: 10 hits
API 3.114 opentarget: 15 hits
API 3.115 opentarget: No hits
API 3.116 opentarget: No hits
API 3.117 opentarget: 15 hits
API 3.118 opentarget: 15 hits
API 3.119 opentarget: 3 hits
API 3.120 opentarget: No hits
API 3.121 opentarget: No hits
API 3.122 opentarget: No hits
API 3.123 opentarget: No hits
API 3.124 opentarget: No hits
API 3.125 opentarget: No hits
API 3.126 opentarget: No hits
API 3.127 opentarget: No hits
API 3.128 opentarget: No hits
API 3.129 opentarget: No hits
API 3.130 opentarget: No hits
API 3.131 opentarget: No hits
API 3.132 opentarget: No hits
API 3.133 opentarget: No hits
API 3.134 opentarget: No hits
API 3.135 opentarget: No hits
API 3.136 o

In [3]:
df = fc.display_table_view()


The df object contains the full output from BioThings Explorer. Each row shows one path that joins the input node (PRDX1) to an intermediate node (a gene or protein) to an ending node (a chemical compound). The data frame includes a set of columns with additional details on each node and edge (including human-readable labels, identifiers, and sources). Let's remove all examples where the output_name (the compound label) is None, and specifically focus on paths with specific mechanistic predicates decreasesActivityOf and targetedBy.

### Filter for drugs that targets genes which decrease the activity of PRDX1

In [4]:
dfFilt = df.loc[df['output_name'].notnull()].query('pred1 == "decreasesActivityOf" and pred2 == "targetedBy"')
dfFilt

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
9348,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,21167242,Gene,CREB1,entrez:1385,targetedBy,dgidb,dgidb_gene2chemical,,ChemicalSubstance,ALCOHOL,chembl:CHEMBL545
9901,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,20683885,Gene,TNF,entrez:7124,targetedBy,mychem.info,mychem.info,,ChemicalSubstance,EPINEPHRINE,chembl:CHEMBL679
10324,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,23185615,Gene,VEGFA,entrez:7422,targetedBy,mychem.info,mychem.info,,ChemicalSubstance,MINOCYCLINE,chembl:CHEMBL1434
10409,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,21167242,Gene,CREB1,entrez:1385,targetedBy,dgidb,dgidb_gene2chemical,,ChemicalSubstance,NICOTINE,chembl:CHEMBL3
10697,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,23185615,Gene,VEGFA,entrez:7422,targetedBy,dgidb,dgidb_gene2chemical,,ChemicalSubstance,FENOFIBRATE,chembl:CHEMBL672
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33686,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,20683885,Gene,TNF,entrez:7124,targetedBy,mychem.info,mychem.info,,ChemicalSubstance,PLINABULIN,chembl:CHEMBL1096380
33687,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,20683885,Gene,TNF,entrez:7124,targetedBy,mychem.info,mychem.info,,ChemicalSubstance,DILMAPIMOD,chembl:CHEMBL2103838
33688,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,20683885,Gene,TNF,entrez:7124,targetedBy,mychem.info,mychem.info,,ChemicalSubstance,ATIPRIMOD,chembl:CHEMBL103735
33689,PRDX1,Gene,decreasesActivityOf,semmed,semmedgene,20683885,Gene,TNF,entrez:7124,targetedBy,mychem.info,mychem.info,,ChemicalSubstance,ANDROGRAPHOLIDE,chembl:CHEMBL186141


In [5]:
dfFilt.node1_id.unique()

array(['entrez:1385', 'entrez:7124', 'entrez:7422', 'entrez:10013',
       'entrez:4547', 'entrez:5657'], dtype=object)

In [6]:
dfFilt.node1_name.unique()

array(['CREB1', 'TNF', 'VEGFA', 'HDAC6', 'MTTP', 'PRTN3'], dtype=object)

## Step 3: Evaluating Paths based on published pathway figures

Let's see if PRDX1 (entrez:5052) is in the same pathway as TNF (entrez:7124) using our newly created API [PFOCR](http://pending.biothings.io/pfocr)

In [7]:
import requests

# query pfocr to see if PRDX1 and VEGFA is in the same pathway figure
doc = requests.get('https://pending.biothings.io/pfocr/query?q=associatedWith.genes:5052 AND associatedWith.genes:7124').json()
doc

{'max_score': 9.817886,
 'took': 1,
 'total': 1,
 'hits': [{'_id': 'PMC4749872__molce-39-1-40f1.jpg',
   '_score': 9.817886,
   'associatedWith': {'figureUrl': 'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4749872/bin/molce-39-1-40f1.jpg',
    'genes': ['11221',
     '1432',
     '1843',
     '3163',
     '369',
     '4217',
     '4297',
     '5052',
     '5594',
     '5595',
     '5599',
     '5600',
     '5601',
     '5602',
     '5603',
     '5604',
     '5605',
     '5606',
     '5607',
     '5608',
     '5609',
     '5894',
     '6300',
     '6416',
     '673',
     '7124',
     '7180'],
    'pmc': 'PMC4749872'}}]}

**image of PMC4749872__molce-39-1-40f1.jpg**

![](img/molce-39-1-40f1.jpg)