# Introduction

This notebook demonstrates how BioThings Explorer can be used to answer the following query:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*"What drugs might be used to treat Parkinson's disease?"*

![](img/tidbit2_300.png)

This query corresponds to [Tidbit 2](https://ncats.nih.gov/tidbit/tidbit_02.html) which was formulated as a demonstration of the NCATS Translator program.

**Background**: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/EXPLAIN_demo.ipynb), and PREDICT queries are described in [PREDICT_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/PREDICT_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them.  A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

## Step 1: Find representation of "Parkinson disease" in BTE

In this step, BioThings Explorer translates our query string "Parkinson disease"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [1]:
from biothings_explorer.hint import Hint
ht = Hint()
parkDis = ht.query("Parkinson disease")['DiseaseOrPhenotypicFeature'][0]

parkDis

{'mondo': 'MONDO:0005180',
 'doid': 'DOID:14330',
 'umls': 'C0030567',
 'mesh': 'D010300',
 'name': 'Parkinson disease',
 'display': 'mondo(MONDO:0005180) doid(DOID:14330) umls(C0030567) mesh(D010300) name(Parkinson disease) ',
 'type': 'DiseaseOrPhenotypicFeature',
 'primary': {'identifier': 'mondo',
  'cls': 'DiseaseOrPhenotypicFeature',
  'value': 'MONDO:0005180'}}

## Step 2: Find drugs that are associated with genes which involved in Parkinson disease

In this section, we find all paths in the knowledge graph that connect Parkinson disease to any entity that is a chemical compound.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 

The parameters for `FindConnection` are described below:


In [2]:
from biothings_explorer.user_query_dispatcher import FindConnection

fc = FindConnection(input_obj=parkDis, output_obj='ChemicalSubstance', intermediate_nodes=['Gene'])
fc.connect(verbose=True)


BTE will find paths that join 'Parkinson disease' and 'ChemicalSubstance'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene





==== Step #1: Query path planning ====

Because Parkinson disease is of type 'DiseaseOrPhenotypicFeature', BTE will query our meta-KG for APIs that can take 'DiseaseOrPhenotypicFeature' as input and 'Gene' as output

BTE found 5 apis:

API 1. mydisease.info(1 API call)
API 2. biolink_disease2gene(1 API call)
API 3. mgigene2phenotype(1 API call)
API 4. DISEASES(1 API call)
API 5. semmeddisease(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 1.1: http://mydisease.info/v1/query (POST "q=C0030567&scopes=mondo.xrefs.umls,disgenet.xrefs.umls&fields=disgenet.genes_related_to_disease&species=human&size=100")
API 5.1: http://pending.biothings.io/semmed/query (POST "q=C0030567&scopes=umls&fields=ASSOCIATED_WITH.

In [5]:
df = fc.display_table_view()

The df object contains the full output from BioThings Explorer. Each row shows one path that joins the input node (Parkinson's disease) to an intermediate node (a gene or protein) to an ending node (a chemical compound). The data frame includes a set of columns with additional details on each node and edge (including human-readable labels, identifiers, and sources). Let's remove all examples where the output_name (the compound label) is None, and specifically focus on paths with specific mechanistic predicates causedBy and targetedBy.

In [6]:
dfFilt = df.loc[df['output_name'].notnull()].query('pred1 == "causedBy" and pred2 == "targetedBy"')
dfFilt

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_id,node1_name,node1_type,pred2,pred2_source,pred2_api,pred2_pubmed,output_id,output_name,output_type
152,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:134,ADORA1,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL193,NIFEDIPINE,ChemicalSubstance
199,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL290962,QUINAGOLIDE,ChemicalSubstance
200,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL290962,QUINAGOLIDE,ChemicalSubstance
308,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1137,CHRNA4,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL46,ONDANSETRON,ChemicalSubstance
333,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1565,CYP2D6,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL1423,PIMOZIDE,ChemicalSubstance
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105786,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL564,PROMAZINE,ChemicalSubstance
105787,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL564,PROMAZINE,ChemicalSubstance
105788,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:1813,DRD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL564,PROMAZINE,ChemicalSubstance
105797,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:6648,SOD2,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL600,ACETYLCYSTEINE,ChemicalSubstance


Let's examine how many unique Parkinsons - GENE - DRUG paths there are:

In [7]:
dfFiltUnique = dfFilt[["input","node1_name","output_name"]].drop_duplicates()
dfFiltUnique

Unnamed: 0,input,node1_name,output_name
152,Parkinson disease,ADORA1,NIFEDIPINE
199,Parkinson disease,DRD2,QUINAGOLIDE
308,Parkinson disease,CHRNA4,ONDANSETRON
333,Parkinson disease,CYP2D6,PIMOZIDE
417,Parkinson disease,SLC6A2,QUINACRINE
...,...,...,...
105691,Parkinson disease,CYP2D6,ORPHENADRINE
105765,Parkinson disease,FYN,VANDETANIB
105786,Parkinson disease,DRD2,PROMAZINE
105797,Parkinson disease,SOD2,ACETYLCYSTEINE


## Results

Finally, let's sort the drugs by the number of proteins that link them to Parkinson's Disease.

In [8]:
import pandas as pd

genes = dfFiltUnique.groupby(['output_name'])['node1_name'].apply(','.join)
count = dfFiltUnique.groupby(['output_name'])['node1_name'].count()
result = pd.DataFrame({ 'genes': genes, 'count': count } )

result.sort_values("count", ascending=False).head(30)

Unnamed: 0_level_0,genes,count
output_name,Unnamed: 1_level_1,Unnamed: 2_level_1
ZINC CHLORIDE,"CA2,UTRN,FYN,APP,SLC6A2,GSN,TP53,PON1,MT2A",9
InChI=1S/Cu,"APP,PRDX2,PARK7,PON1,BDNF,SNCA,HSPA8,GSN",8
PACLITAXEL,"TP53,MAPT,FYN,AKT1,BDNF,PTEN,NAT2",7
ZINC ACETATE,"MT2A,PON1,UTRN,TP53,GSN,APP",6
TAMOXIFEN,"MAPK8,TP53,ADORA1,LRRK2,NFE2L2,FYN",6
TRETINOIN,"HSPA8,NR4A1,BAX,FUS,MAPK8,NFE2L2",6
DOXORUBICIN,"FYN,BDNF,BAX,TP53,AKT1,NFE2L2",6
HALOPERIDOL,"BDNF,HSPA4,TP53,SLC18A2,DRD2,NR4A1",6
QUERCETIN,"CA2,BAX,ADORA1,AKT1,CDK5R1,GABPA",6
LEVODOPA,"NR4A1,BDNF,COMT,BAX,FYN,DRD2",6


Let's focus specifically on drugs that are joined by one of the dopamine receptor genes:

In [9]:
result[result["genes"].str.contains("DRD")].query("count >= 3").sort_values("count", ascending=False)

Unnamed: 0_level_0,genes,count
output_name,Unnamed: 1_level_1,Unnamed: 2_level_1
LEVODOPA,"NR4A1,BDNF,COMT,BAX,FYN,DRD2",6
HALOPERIDOL,"BDNF,HSPA4,TP53,SLC18A2,DRD2,NR4A1",6
CHLORPROMAZINE,"CYP2D6,DRD2,BDNF,HSPA4,FYN",5
RISPERIDONE,"CYP2D6,COMT,DRD2,AKT1,TNF",5
DESIPRAMINE,"SLC6A2,DRD2,MC1R,BDNF,CYP2D6",5
ALCOHOL,"CHRNA4,MT2A,DRD2,NAT2,NFE2L2",5
BUPROPION,"DRD2,COMT,SLC6A2,CYP2D6",4
DOPAMINE,"MTNR1B,SLC6A2,DRD2,SLC18A2",4
AMPHETAMINE,"SLC6A2,DRD2,CYP2D6,SLC18A2",4
PROCHLORPERAZINE,"FYN,CYP2D6,DRD2,SLC6A2",4


While the list above clearly could benefit from more filtering and sorting, the table provides a wide range of information from our distributed knowledge graph on potential testable hypotheses. For more details on any individual drug candidate, we again can query the original BTE results. For example, here we examine the evidence behind the link between Parkinson's Disease and the drug chlorpromazine.

In [10]:
df[[all(tup) for tup in zip(df['output_name'] == 'CHLORPROMAZINE', df['node1_name'].str.contains("DRD"))]]


Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_id,node1_name,node1_type,pred2,pred2_source,pred2_api,pred2_pubmed,output_id,output_name,output_type
4842,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,DISEASES,DISEASES,,entrez:1812,DRD1,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
4843,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,DISEASES,DISEASES,,entrez:1812,DRD1,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
4844,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,DISEASES,DISEASES,,entrez:1812,DRD1,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
4845,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:1812,DRD1,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
4846,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:1812,DRD1,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
4847,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:1812,DRD1,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
9225,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,DISEASES,DISEASES,,entrez:1814,DRD3,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
9226,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,DISEASES,DISEASES,,entrez:1814,DRD3,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
15311,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,DISEASES,DISEASES,,entrez:1815,DRD4,Gene,targetedBy,dgidb,dgidb_gene2chemical,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
15312,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,DISEASES,DISEASES,,entrez:1815,DRD4,Gene,targetedBy,mychem.info,mychem.info,,chembl:CHEMBL71,CHLORPROMAZINE,ChemicalSubstance
