# Introduction

<img src="img/tidbit2.png" width=300 style="float: right;"/>
This notebook demonstrates how BioThings Explorer can be used to answer the following query: 

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*"What existing drugs might be used to treat Parkinson's disease based on an intermediate protein?"*

This query corresponds to [Tidbit 2](https://ncats.nih.gov/tidbit/tidbit_02.html) which was formulated as a demonstration of the NCATS Translator program.

**Background**: BioThings Explorer is an engine for autonomously querying a distributed knowledge graph. BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](EXPLAIN_demo.ipynb), and PREDICT queries are described in [PREDICT_demo.ipynb](PREDICT_demo.ipynb). A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).


## Step 1: Find representation of "Parkinson disease" in BTE

In this step, BioThings Explorer translates our query string "Parkinson disease"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [1]:
from biothings_explorer.hint import Hint
ht = Hint()
parkDis = ht.query("Parkinson disease")['DiseaseOrPhenotypicFeature'][0]

parkDis

{'mondo': 'MONDO:0005180',
 'doid': 'DOID:14330',
 'umls': 'C0030567',
 'mesh': 'D010300',
 'name': 'Parkinson disease',
 'display': 'mondo(MONDO:0005180) doid(DOID:14330) umls(C0030567) mesh(D010300) name(Parkinson disease) ',
 'type': 'DiseaseOrPhenotypicFeature',
 'primary': {'identifier': 'mondo',
  'cls': 'DiseaseOrPhenotypicFeature',
  'value': 'MONDO:0005180'}}

## Step 2: Find drugs that are associated with genes which are involved in Parkinson disease

In this section, we find all paths in the knowledge graph that connect Parkinson disease to any entity that is a chemical compound.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 


In [2]:
from biothings_explorer.user_query_dispatcher import FindConnection

fc = FindConnection(input_obj=parkDis, output_obj='ChemicalSubstance', intermediate_nodes=['Gene'])
fc.connect(verbose=True)


BTE will find paths that join 'Parkinson disease' and 'ChemicalSubstance'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene





==== Step #1: Query path planning ====

Because Parkinson disease is of type 'DiseaseOrPhenotypicFeature', BTE will query our meta-KG for APIs that can take 'DiseaseOrPhenotypicFeature' as input

BTE found 3 apis:

API 1. biolink_disease2gene(1 API call)
API 2. mydisease.info(1 API call)
API 3. semmeddisease(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 2.1: http://mydisease.info/v1/query (POST "q=C0030567&scopes=mondo.xrefs.umls,disgenet.xrefs.umls&fields=disgenet.genes_related_to_disease&species=human&size=100")
API 1.1: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0005180/genes?rows=100
API 3.1: http://pending.biothings.io/semmed/query (POST "q=C0030567&scopes=umls&fields=AFFECTS_

API 3.55: http://www.dgidb.org/api/v2/interactions.json?genes=PTEN
API 3.57: http://www.dgidb.org/api/v2/interactions.json?genes=MAPT-AS1
API 3.49: http://www.dgidb.org/api/v2/interactions.json?genes=CNTN1
API 3.65: http://www.dgidb.org/api/v2/interactions.json?genes=SEMA5A
API 3.71: http://www.dgidb.org/api/v2/interactions.json?genes=FKBP1AP1
API 3.63: http://www.dgidb.org/api/v2/interactions.json?genes=NGF
API 3.67: http://www.dgidb.org/api/v2/interactions.json?genes=CHRNA4
API 3.101: http://www.dgidb.org/api/v2/interactions.json?genes=GFPT1
API 3.73: http://www.dgidb.org/api/v2/interactions.json?genes=DDC
API 3.78: http://www.dgidb.org/api/v2/interactions.json?genes=LINC02331
API 3.103: http://www.dgidb.org/api/v2/interactions.json?genes=CDK5R1
API 3.102: http://www.dgidb.org/api/v2/interactions.json?genes=MOG
API 3.79: http://www.dgidb.org/api/v2/interactions.json?genes=DRD1
API 3.91: http://www.dgidb.org/api/v2/interactions.json?genes=VPS26A
API 3.110: http://www.dgidb.org/api/v2/

API 3.221: http://www.dgidb.org/api/v2/interactions.json?genes=PITX3
API 3.224: http://www.dgidb.org/api/v2/interactions.json?genes=PARK7
API 3.222: http://www.dgidb.org/api/v2/interactions.json?genes=TWNK
API 3.225: http://www.dgidb.org/api/v2/interactions.json?genes=IREB2
API 3.223: http://www.dgidb.org/api/v2/interactions.json?genes=CTSB
API 3.228: http://www.dgidb.org/api/v2/interactions.json?genes=AGAP1
API 3.226: http://www.dgidb.org/api/v2/interactions.json?genes=RAB39B
API 3.229: http://www.dgidb.org/api/v2/interactions.json?genes=KLHDC1
API 3.233: http://www.dgidb.org/api/v2/interactions.json?genes=DAPK2
API 3.232: http://www.dgidb.org/api/v2/interactions.json?genes=MIR30E
API 3.215: http://www.dgidb.org/api/v2/interactions.json?genes=BDNF
API 3.231: http://www.dgidb.org/api/v2/interactions.json?genes=HSPA9
API 3.234: http://www.dgidb.org/api/v2/interactions.json?genes=KIAA1109
API 3.230: http://www.dgidb.org/api/v2/interactions.json?genes=CP
API 3.236: http://www.dgidb.org/ap

API 1.2: http://mychem.info/v1/query (POST "q=FYN,USP9X,SLC2A13,PARL,TMC3-AS1,TIRAP,SIRT1,FBP1,SRY,ZP3,TPM1,EPHX2,MZB1,INS,CSMD1,MCCC1,GRM4,KHDRBS1,NUP62,CNR2,INSR,CAB39L,RHOD,CNTN1,ADORA1,PDSS2,TAT,CRK,SPPL2C,WASH6P,TMEM189,RPS8,MSX1,PSMC1,CHRNA4,CCNT2-AS1,LRRK1,CCDC62,RET,RGS10,LINC02331,DRD1,TCEANC2,SPHK2,NFE2L2,FKBP1AP4,DNAJC10,NQO1,VPS26A,LTB,SNCA-AS1,CD24,HTRA2,PDE10A,MOG,SLC41A1,CDK5R1,BAX,IGF1R,SIPA1L2,FGF18,COX4I1,IGKV2-14,FAM47E-STBD1,DRD2,RIC3,NCAM1,PTEN,ND2,DLG2,CDNF,GSTP1,HTR2A-AS1,SMAD3,ZNF646,ATP13A2,KCNIP4,EBP,TBC1D5,P2RX7,SPTSSB,DGKQ,SH3GL2,CEBPZ,DNAJC13,LINGO1,GPNMB,NPS,DDIT4,RHO,APOE,EPHB1,GFAP,WNT3,IL6,MAOB,MSMB,PRDX2,FKBP1AP2,DNAJC6,ATXN2,PRSS53,MIR185,NR4A1,SYNJ1,CAMK2G,CHL1,PLPPR1,HGF,KITLG,ATP7A,TMED9,COMT,MAOA,BDNF,RIT2,ODAPH,GSTO2,PITX3,RAB39B,NAT2,AGAP1,CP,HSPA9,FKBP1AP3,LINC02210-CRHR1,CNKSR3,BAG2,XK,CD8A,UTRN,PLEKHM1,KCNJ4,TRPS1,NDUFAF2,VPS13C,RTN4,ITGA2B,DNAH8,SYT1,MT2A,TMEM175,CA8,MFN1,NUCKS1,COL11A2,LINC02210,IGF2,TMEM230,GDNF,SLC18A2,TRIB3,ITGA8,GPX1,TN

API 3.103 dgidb_gene2chemical: 2 hits
API 3.104 dgidb_gene2chemical: No hits
API 3.105 dgidb_gene2chemical: No hits
API 3.106 dgidb_gene2chemical: 82 hits
API 3.107 dgidb_gene2chemical: No hits
API 3.108 dgidb_gene2chemical: 37 hits
API 3.109 dgidb_gene2chemical: No hits
API 3.110 dgidb_gene2chemical: No hits
API 3.111 dgidb_gene2chemical: No hits
API 3.112 dgidb_gene2chemical: No hits
API 3.113 dgidb_gene2chemical: No hits
API 3.114 dgidb_gene2chemical: No hits
API 3.115 dgidb_gene2chemical: No hits
API 3.116 dgidb_gene2chemical: No hits
API 3.117 dgidb_gene2chemical: No hits
API 3.118 dgidb_gene2chemical: No hits
API 3.119 dgidb_gene2chemical: No hits
API 3.120 dgidb_gene2chemical: 61 hits
API 3.121 dgidb_gene2chemical: 1 hits
API 3.122 dgidb_gene2chemical: 188 hits
API 3.123 dgidb_gene2chemical: 1 hits
API 3.124 dgidb_gene2chemical: No hits
API 3.125 dgidb_gene2chemical: No hits
API 3.126 dgidb_gene2chemical: No hits
API 3.127 dgidb_gene2chemical: 2 hits
API 3.128 dgidb_gene2chemica

API 3.316 dgidb_gene2chemical: 1 hits
API 3.317 dgidb_gene2chemical: No hits
API 3.318 dgidb_gene2chemical: No hits
API 3.319 dgidb_gene2chemical: No hits
API 3.320 dgidb_gene2chemical: 1 hits
API 3.321 dgidb_gene2chemical: No hits
API 3.322 dgidb_gene2chemical: No hits
API 3.323 dgidb_gene2chemical: No hits
API 3.324 dgidb_gene2chemical: No hits
API 3.325 dgidb_gene2chemical: 1 hits
API 3.326 dgidb_gene2chemical: 13 hits
API 3.327 dgidb_gene2chemical: 17 hits
API 3.328 dgidb_gene2chemical: No hits
API 3.329 dgidb_gene2chemical: 6 hits
API 3.330 dgidb_gene2chemical: No hits
API 3.331 dgidb_gene2chemical: No hits
API 3.332 dgidb_gene2chemical: No hits
API 3.333 dgidb_gene2chemical: 1 hits
API 3.334 dgidb_gene2chemical: No hits
API 3.335 dgidb_gene2chemical: 26 hits
API 3.336 dgidb_gene2chemical: No hits
API 3.337 dgidb_gene2chemical: No hits
API 3.338 dgidb_gene2chemical: No hits
API 1.1 mychem.info: 352 hits
API 1.2 mychem.info: 1208 hits
API 1.3 mychem.info: 1239 hits
API 2.1 semmedge

In [None]:
df = fc.display_table_view()

The `df` object contains the full output from BioThings Explorer.  Each row shows one path that joins the input node (Parkinson's disease) to an intermediate node (a gene or protein) to an ending node (a chemical compound).  The data frame includes a set of columns with additional details on each node and edge (including human-readable labels, identifiers, and sources).  Let's remove all examples where the output_name (the compound label) is `None`, and specifically focus on paths with specific mechanistic predicates `causedBy` and `targetedBy`.

In [None]:
dfFilt = df.loc[df['output_name'].notnull()].query('pred1 == "causedBy" and pred2 == "targetedBy"')
dfFilt

Let's examine how many unique `Parkinsons` - `GENE` - `DRUG` paths there are:

In [None]:
dfFiltUnique = dfFilt[["input","node1_name","output_name"]].drop_duplicates()
dfFiltUnique

### Results
Finally, let's sort the drugs by the number of proteins that link them to Parkinson's Disease.

In [None]:
import pandas as pd

genes = dfFiltUnique.groupby(['output_name'])['node1_name'].apply(','.join)
count = dfFiltUnique.groupby(['output_name'])['node1_name'].count()
result = pd.DataFrame({ 'genes': genes, 'count': count } )

result.sort_values("count", ascending=False).head(30)

While the list above clearly could benefit from more filtering and sorting, the table provides a wide range of information from our distributed knowledge graph on potential testable hypotheses. For more details on any individual drug candidate, we again can query the original BTE results.  For example, here we examine the evidence behind the link between Parkinson's Disease and the drug nintedanib.

In [None]:
df[df['output_name'] == 'NINTEDANIB']