# Introduction

![](img/tidbit2_300.png)

This notebook demonstrates how BioThings Explorer can be used to answer the following query:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*"What drugs might be used to treat Parkinson's disease?"*

This query corresponds to [Tidbit 2](https://ncats.nih.gov/tidbit/tidbit_02.html) which was formulated as a demonstration of the NCATS Translator program.

**Background**: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/EXPLAIN_demo.ipynb), and PREDICT queries are described in [PREDICT_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/PREDICT_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them.  A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

## Step 1: Find representation of "Parkinson disease" in BTE

In this step, BioThings Explorer translates our query string "Parkinson disease"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [1]:
from biothings_explorer.hint import Hint
ht = Hint()
parkDis = ht.query("Parkinson disease")['DiseaseOrPhenotypicFeature'][0]

parkDis

{'mondo': 'MONDO:0005180',
 'doid': 'DOID:14330',
 'umls': 'C0030567',
 'mesh': 'D010300',
 'name': 'Parkinson disease',
 'display': 'mondo(MONDO:0005180) doid(DOID:14330) umls(C0030567) mesh(D010300) name(Parkinson disease) ',
 'type': 'DiseaseOrPhenotypicFeature',
 'primary': {'identifier': 'mondo',
  'cls': 'DiseaseOrPhenotypicFeature',
  'value': 'MONDO:0005180'}}

## Step 2: Find drugs that are associated with genes which invovled in Parkinson disease

In this section, we find all paths in the knowledge graph that connect Parkinson disease to any entity that is a chemical compound.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 

The parameters for `FindConnection` are described below:


In [2]:
from biothings_explorer.user_query_dispatcher import FindConnection

fc = FindConnection(input_obj=parkDis, output_obj='ChemicalSubstance', intermediate_nodes=['Gene'])
fc.connect(verbose=True)


BTE will find paths that join 'Parkinson disease' and 'ChemicalSubstance'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene





==== Step #1: Query path planning ====

Because Parkinson disease is of type 'DiseaseOrPhenotypicFeature', BTE will query our meta-KG for APIs that can take 'DiseaseOrPhenotypicFeature' as input

BTE found 3 apis:

API 1. semmeddisease(1 API call)
API 2. mydisease.info(1 API call)
API 3. biolink_disease2gene(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 2.1: http://mydisease.info/v1/query (POST "q=C0030567&scopes=mondo.xrefs.umls,disgenet.xrefs.umls&fields=disgenet.genes_related_to_disease&species=human&size=100")
API 1.1: http://pending.biothings.io/semmed/query (POST "q=C0030567&scopes=umls&fields=ASSOCIATED_WITH.gene,AFFECTS_reverse.protein,CAUSES_reverse.gene,AFFECTS_reverse.gene,AFFECTS.gene,AFF

API 2.86: http://www.dgidb.org/api/v2/interactions.json?genes=SETD1A
API 2.77: http://www.dgidb.org/api/v2/interactions.json?genes=TRH
API 2.90: http://www.dgidb.org/api/v2/interactions.json?genes=CHM
API 2.88: http://www.dgidb.org/api/v2/interactions.json?genes=MAP3K5
API 2.85: http://www.dgidb.org/api/v2/interactions.json?genes=ATP13A2
API 2.93: http://www.dgidb.org/api/v2/interactions.json?genes=BRINP1
API 2.91: http://www.dgidb.org/api/v2/interactions.json?genes=CRLS1
API 2.87: http://www.dgidb.org/api/v2/interactions.json?genes=CYP17A1
API 2.95: http://www.dgidb.org/api/v2/interactions.json?genes=DENR
API 2.97: http://www.dgidb.org/api/v2/interactions.json?genes=VPS35
API 2.94: http://www.dgidb.org/api/v2/interactions.json?genes=TRIM40
API 2.89: http://www.dgidb.org/api/v2/interactions.json?genes=BDNF
API 2.98: http://www.dgidb.org/api/v2/interactions.json?genes=PON1
API 2.96: http://www.dgidb.org/api/v2/interactions.json?genes=CAT
API 2.99: http://www.dgidb.org/api/v2/interaction

dgidb_gene2chemical failed
API 2.166: http://www.dgidb.org/api/v2/interactions.json?genes=SYT17
API 2.163: http://www.dgidb.org/api/v2/interactions.json?genes=ASXL3
API 2.167: http://www.dgidb.org/api/v2/interactions.json?genes=ZNF646
API 2.153: http://www.dgidb.org/api/v2/interactions.json?genes=GSTP1
API 2.168: http://www.dgidb.org/api/v2/interactions.json?genes=HSPA1A
API 2.170: http://www.dgidb.org/api/v2/interactions.json?genes=SOD2
API 2.171: http://www.dgidb.org/api/v2/interactions.json?genes=PITX3
API 2.162: http://www.dgidb.org/api/v2/interactions.json?genes=FTL
API 2.172: http://www.dgidb.org/api/v2/interactions.json?genes=ATP7A
API 2.173: http://www.dgidb.org/api/v2/interactions.json?genes=BST1
API 2.169: http://www.dgidb.org/api/v2/interactions.json?genes=MAPT
dgidb_gene2chemical failed
API 2.178: http://www.dgidb.org/api/v2/interactions.json?genes=LCN2
API 2.164: http://www.dgidb.org/api/v2/interactions.json?genes=CYP2D6
API 2.177: http://www.dgidb.org/api/v2/interactions.

API 2.304: http://www.dgidb.org/api/v2/interactions.json?genes=SCG3
API 2.306: http://www.dgidb.org/api/v2/interactions.json?genes=TPPP
API 2.301: http://www.dgidb.org/api/v2/interactions.json?genes=IL6
API 2.302: http://www.dgidb.org/api/v2/interactions.json?genes=CP
API 2.303: http://www.dgidb.org/api/v2/interactions.json?genes=ITIH1
API 2.307: http://www.dgidb.org/api/v2/interactions.json?genes=DNAJC5
API 2.311: http://www.dgidb.org/api/v2/interactions.json?genes=WNT3
API 2.312: http://www.dgidb.org/api/v2/interactions.json?genes=SPHK2
API 2.299: http://www.dgidb.org/api/v2/interactions.json?genes=ATP5PF
API 2.310: http://www.dgidb.org/api/v2/interactions.json?genes=CRMP1
API 2.313: http://www.dgidb.org/api/v2/interactions.json?genes=CAB39L
API 2.309: http://www.dgidb.org/api/v2/interactions.json?genes=IGF2
API 2.316: http://www.dgidb.org/api/v2/interactions.json?genes=GCH1
API 2.317: http://www.dgidb.org/api/v2/interactions.json?genes=GBA
API 2.318: http://www.dgidb.org/api/v2/inte

API 2.1 dgidb_gene2chemical: 11 hits
API 2.2 dgidb_gene2chemical: No hits
API 2.3 dgidb_gene2chemical: No hits
API 2.4 dgidb_gene2chemical: No hits
API 2.5 dgidb_gene2chemical: No hits
API 2.6 dgidb_gene2chemical: No hits
API 2.7 dgidb_gene2chemical: No hits
API 2.8 dgidb_gene2chemical: No hits
API 2.9 dgidb_gene2chemical: No hits
API 2.10 dgidb_gene2chemical: 2 hits
API 2.11 dgidb_gene2chemical: 1 hits
API 2.12 dgidb_gene2chemical: No hits
API 2.13 dgidb_gene2chemical: No hits
API 2.14 dgidb_gene2chemical: No hits
API 2.15 dgidb_gene2chemical: No hits
API 2.16 dgidb_gene2chemical: No hits
API 2.17 dgidb_gene2chemical: 9 hits
API 2.18 dgidb_gene2chemical: No hits
API 2.19 dgidb_gene2chemical: No hits
API 2.20 dgidb_gene2chemical: 2 hits
API 2.21 dgidb_gene2chemical: No hits
API 2.22 dgidb_gene2chemical: No hits
API 2.23 dgidb_gene2chemical: No hits
API 2.24 dgidb_gene2chemical: 6 hits
API 2.25 dgidb_gene2chemical: 54 hits
API 2.26 dgidb_gene2chemical: No hits
API 2.27 dgidb_gene2chemic

API 2.222 dgidb_gene2chemical: No hits
API 2.223 dgidb_gene2chemical: No hits
API 2.224 dgidb_gene2chemical: No hits
API 2.225 dgidb_gene2chemical: 19 hits
API 2.226 dgidb_gene2chemical: 29 hits
API 2.227 dgidb_gene2chemical: 1 hits
API 2.228 dgidb_gene2chemical: No hits
API 2.229 dgidb_gene2chemical: No hits
API 2.230 dgidb_gene2chemical: 2 hits
API 2.231 dgidb_gene2chemical: 1 hits
API 2.232 dgidb_gene2chemical: No hits
API 2.233 dgidb_gene2chemical: No hits
API 2.234 dgidb_gene2chemical: No hits
API 2.235 dgidb_gene2chemical: 1 hits
API 2.236 dgidb_gene2chemical: No hits
API 2.237 dgidb_gene2chemical: No hits
API 2.238 dgidb_gene2chemical: No hits
API 2.239 dgidb_gene2chemical: 28 hits
API 2.240 dgidb_gene2chemical: 11 hits
API 2.241 dgidb_gene2chemical: 1 hits
API 2.242 dgidb_gene2chemical: No hits
API 2.243 dgidb_gene2chemical: No hits
API 2.244 dgidb_gene2chemical: No hits
API 2.245 dgidb_gene2chemical: 4 hits
API 2.246 dgidb_gene2chemical: 1 hits
API 2.247 dgidb_gene2chemical: 1

In [3]:
df = fc.display_table_view()


The df object contains the full output from BioThings Explorer. Each row shows one path that joins the input node (Parkinson's disease) to an intermediate node (a gene or protein) to an ending node (a chemical compound). The data frame includes a set of columns with additional details on each node and edge (including human-readable labels, identifiers, and sources). Let's remove all examples where the output_name (the compound label) is None, and specifically focus on paths with specific mechanistic predicates causedBy and targetedBy.

In [4]:
dfFilt = df.loc[df['output_name'].notnull()].query('pred1 == "causedBy" and pred2 == "targetedBy"')
dfFilt

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_id,node1_name,node1_type,pred2,pred2_source,pred2_api,pred2_pubmed,output_id,output_name,output_type
35,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:5728,PTEN,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL1684984,PA-799,ChemicalSubstance
49,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:627,BDNF,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL563,FLURBIPROFEN,ChemicalSubstance
174,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:760,CA2,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL402262,InChI=1S/C14H11ClN2O4S/c15-11-6-5-8(7-12(11)22...,ChemicalSubstance
193,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:207,AKT1,Gene,targetedBy,mychem.info,mychem.info,,drugbank:DB05971,DB05971,ChemicalSubstance
278,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:10846,PDE10A,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL348000,CINCHOPHEN,ChemicalSubstance
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91796,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL267930,SPIPERONE,ChemicalSubstance
91848,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:7124,TNF,Gene,targetedBy,mychem.info,mychem.info,,drugbank:DB05758,DB05758,ChemicalSubstance
91865,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL3545388,SERIDOPIDINE,ChemicalSubstance
92023,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:2629,GBA,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL206468,AFEGOSTAT,ChemicalSubstance


Let's examine how many unique Parkinsons - GENE - DRUG paths there are:

In [5]:
dfFiltUnique = dfFilt[["input","node1_name","output_name"]].drop_duplicates()
dfFiltUnique

Unnamed: 0,input,node1_name,output_name
35,Parkinson disease,PTEN,PA-799
49,Parkinson disease,BDNF,FLURBIPROFEN
174,Parkinson disease,CA2,InChI=1S/C14H11ClN2O4S/c15-11-6-5-8(7-12(11)22...
193,Parkinson disease,AKT1,DB05971
278,Parkinson disease,PDE10A,CINCHOPHEN
...,...,...,...
91765,Parkinson disease,DRD2,RACLOPRIDE
91796,Parkinson disease,DRD2,SPIPERONE
91848,Parkinson disease,TNF,DB05758
91865,Parkinson disease,DRD2,SERIDOPIDINE


## Results

Finally, let's sort the drugs by the number of proteins that link them to Parkinson's Disease.

In [6]:
import pandas as pd

genes = dfFiltUnique.groupby(['output_name'])['node1_name'].apply(','.join)
count = dfFiltUnique.groupby(['output_name'])['node1_name'].count()
result = pd.DataFrame({ 'genes': genes, 'count': count } )

result.sort_values("count", ascending=False).head(30)

Unnamed: 0_level_0,genes,count
output_name,Unnamed: 1_level_1,Unnamed: 2_level_1
ZINC CHLORIDE,"TP53,FYN,MT2A,PON1,APP,UTRN,GSN,SLC6A2,CA2",9
InChI=1S/Cu,"BDNF,PARK7,SNCA,APP,GSN,HSPA8,PON1,PRDX2",8
TRETINOIN,"NFE2L2,MAPK8,NR4A1,BAX,FUS,HSPA8",6
PACLITAXEL,"AKT1,BDNF,NAT2,FYN,PTEN,MAPT",6
InChI=1S/Zn,"UTRN,TP53,GSN,APP,MT2A,PON1",6
ZINC ACETATE,"UTRN,TP53,MT2A,APP,GSN,PON1",6
LEVODOPA,"BDNF,DRD2,FYN,COMT,BAX,NR4A1",6
CETUXIMAB,"PTEN,AKT1,BDNF,BAX,EGF",5
DESIPRAMINE,"SLC6A2,CYP2D6,DRD2,MC1R,BDNF",5
TAMOXIFEN,"MAPK8,NFE2L2,LRRK2,FYN,ADORA1",5


Let's focus specifically on drugs that are joined by one of the dopamine receptor genes:

In [7]:
result[result["genes"].str.contains("DRD")].query("count >= 3").sort_values("count", ascending=False)

Unnamed: 0_level_0,genes,count
output_name,Unnamed: 1_level_1,Unnamed: 2_level_1
LEVODOPA,"BDNF,DRD2,FYN,COMT,BAX,NR4A1",6
RISPERIDONE,"CYP2D6,COMT,AKT1,TNF,DRD2",5
DESIPRAMINE,"SLC6A2,CYP2D6,DRD2,MC1R,BDNF",5
DOPAMINE,"SLC6A2,DRD2,MTNR1B,SLC18A2",4
PROCHLORPERAZINE,"FYN,SLC6A2,DRD2,CYP2D6",4
IMIPRAMINE,"CYP2D6,SLC6A2,BDNF,DRD2",4
HALOPERIDOL,"BDNF,SLC18A2,DRD2,NR4A1",4
ALCOHOL,"DRD2,NAT2,NFE2L2,CHRNA4",4
CHLORPROMAZINE,"BDNF,CYP2D6,FYN,DRD2",4
BUPROPION,"SLC6A2,CYP2D6,DRD2,COMT",4


While the list above clearly could benefit from more filtering and sorting, the table provides a wide range of information from our distributed knowledge graph on potential testable hypotheses. For more details on any individual drug candidate, we again can query the original BTE results. For example, here we examine the evidence behind the link between Parkinson's Disease and the drug chlorpromazine.

In [8]:
df[[all(tup) for tup in zip(df['output_name'] == 'CHLORPROMAZINE', df['node1_name'].str.contains("DRD"))]]


Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_id,node1_name,node1_type,pred2,pred2_source,pred2_api,pred2_pubmed,output_id,output_name,output_type
18016,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:1812,DRD1,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
18017,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:1812,DRD1,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
75084,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:1813,DRD2,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
75085,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:1813,DRD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
75086,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:1813,DRD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
75087,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
75088,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
75089,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
