# Introduction

This notebook demonstrates basic usage of BioThings Explorer, an engine for autonomously querying a distributed knowledge graph. BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](EXPLAIN_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them.  A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

PREDICT queries are designed to **predict plausible relationships between one entity and an entity class**.  For example, in this notebook, we explore the question:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*"What drugs might be used to treat Parkinson's disease?"*



## Step 1: Find representation of "Parkinson disease" in BTE

In this step, BioThings Explorer translates our query string "Parkinson disease"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [8]:
from biothings_explorer.hint import Hint
ht = Hint()
pd = ht.query("Parkinson disease")['DiseaseOrPhenotypicFeature'][0]

In [9]:
pd

{'mondo': 'MONDO:0005180',
 'doid': 'DOID:14330',
 'umls': 'C0030567',
 'mesh': 'D010300',
 'name': 'Parkinson disease',
 'display': 'mondo(MONDO:0005180) doid(DOID:14330) umls(C0030567) mesh(D010300) name(Parkinson disease) ',
 'type': 'DiseaseOrPhenotypicFeature',
 'primary': {'identifier': 'mondo',
  'cls': 'DiseaseOrPhenotypicFeature',
  'value': 'MONDO:0005180'}}

## Step 2: Find drugs that are associated with genes which invovled in Parkinson disease

In this section, we find all paths in the knowledge graph that connect Parkinson disease to any entity that is a chemical compound.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 

The parameters for `FindConnection` are described below:


In [10]:
from biothings_explorer.user_query_dispatcher import FindConnection

In [11]:
fc = FindConnection(input_obj=pd, output_obj='ChemicalSubstance', intermediate_nodes=['Gene'])

In [12]:
fc.connect(verbose=True)


BTE will find paths that join 'Parkinson disease' and 'ChemicalSubstance'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene





==== Step #1: Query path planning ====

Because Parkinson disease is of type 'DiseaseOrPhenotypicFeature', BTE will query our meta-KG for APIs that can take 'DiseaseOrPhenotypicFeature' as input

BTE found 3 apis:

API 1. mydisease.info(1 API call)
API 2. semmeddisease(1 API call)
API 3. biolink_disease2gene(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 1.1: http://mydisease.info/v1/query (POST "q=C0030567&scopes=mondo.xrefs.umls,disgenet.xrefs.umls&fields=disgenet.genes_related_to_disease&species=human&size=100")
API 2.1: http://pending.biothings.io/semmed/query (POST "q=C0030567&scopes=umls&fields=AFFECTS.protein,CAUSES_reverse.gene,ASSOCIATED_WITH.gene,AFFECTS_reverse.protein,AFFECTS_reverse.gene,

API 3.38: http://www.dgidb.org/api/v2/interactions.json?genes=FAF1
API 3.56: http://www.dgidb.org/api/v2/interactions.json?genes=NR4A1
API 3.92: http://www.dgidb.org/api/v2/interactions.json?genes=ATXN8OS
API 3.48: http://www.dgidb.org/api/v2/interactions.json?genes=CYP17A1
dgidb_gene2chemical failed
API 3.45: http://www.dgidb.org/api/v2/interactions.json?genes=DENR
API 3.54: http://www.dgidb.org/api/v2/interactions.json?genes=KCNJ4
API 3.59: http://www.dgidb.org/api/v2/interactions.json?genes=PLEKHM1
API 3.53: http://www.dgidb.org/api/v2/interactions.json?genes=CA8
API 3.58: http://www.dgidb.org/api/v2/interactions.json?genes=TH
API 3.46: http://www.dgidb.org/api/v2/interactions.json?genes=LRRK2-DT
API 3.50: http://www.dgidb.org/api/v2/interactions.json?genes=CHRNA4
API 3.55: http://www.dgidb.org/api/v2/interactions.json?genes=ODAPH
API 3.51: http://www.dgidb.org/api/v2/interactions.json?genes=CRMP1
API 3.91: http://www.dgidb.org/api/v2/interactions.json?genes=MTERF4
API 3.60: http://

API 3.148: http://www.dgidb.org/api/v2/interactions.json?genes=MAOA
API 3.142: http://www.dgidb.org/api/v2/interactions.json?genes=LINC02210
API 3.155: http://www.dgidb.org/api/v2/interactions.json?genes=PARK16
API 3.153: http://www.dgidb.org/api/v2/interactions.json?genes=PSMC1
API 3.156: http://www.dgidb.org/api/v2/interactions.json?genes=INPP5F
API 3.159: http://www.dgidb.org/api/v2/interactions.json?genes=PDSS2
API 3.164: http://www.dgidb.org/api/v2/interactions.json?genes=RAB39B
API 3.152: http://www.dgidb.org/api/v2/interactions.json?genes=GAK
API 3.151: http://www.dgidb.org/api/v2/interactions.json?genes=TWSG1
API 3.165: http://www.dgidb.org/api/v2/interactions.json?genes=TRIB3
API 3.162: http://www.dgidb.org/api/v2/interactions.json?genes=FBXO7
API 3.166: http://www.dgidb.org/api/v2/interactions.json?genes=KANSL1
API 3.170: http://www.dgidb.org/api/v2/interactions.json?genes=FUS
API 3.169: http://www.dgidb.org/api/v2/interactions.json?genes=KITLG
API 3.167: http://www.dgidb.org

API 3.310: http://www.dgidb.org/api/v2/interactions.json?genes=CSMD1
API 3.311: http://www.dgidb.org/api/v2/interactions.json?genes=FAM47E-STBD1
API 3.307: http://www.dgidb.org/api/v2/interactions.json?genes=COX6A1
API 3.313: http://www.dgidb.org/api/v2/interactions.json?genes=LINGO1
API 3.298: http://www.dgidb.org/api/v2/interactions.json?genes=SLC6A2
API 3.312: http://www.dgidb.org/api/v2/interactions.json?genes=ITGA8
API 3.315: http://www.dgidb.org/api/v2/interactions.json?genes=FYN
API 3.314: http://www.dgidb.org/api/v2/interactions.json?genes=DNAJC10
API 3.320: http://www.dgidb.org/api/v2/interactions.json?genes=LINC02451
API 3.319: http://www.dgidb.org/api/v2/interactions.json?genes=LINC02331
API 3.321: http://www.dgidb.org/api/v2/interactions.json?genes=FGB
API 3.323: http://www.dgidb.org/api/v2/interactions.json?genes=RTN4
API 3.324: http://www.dgidb.org/api/v2/interactions.json?genes=CTC1
API 3.322: http://www.dgidb.org/api/v2/interactions.json?genes=FTL
API 3.316: http://www.

API 3.5 dgidb_gene2chemical: No hits
API 3.6 dgidb_gene2chemical: 7 hits
API 3.7 dgidb_gene2chemical: 3 hits
API 3.8 dgidb_gene2chemical: No hits
API 3.9 dgidb_gene2chemical: 1 hits
API 3.10 dgidb_gene2chemical: No hits
API 3.11 dgidb_gene2chemical: 24 hits
API 3.12 dgidb_gene2chemical: 2 hits
API 3.13 dgidb_gene2chemical: No hits
API 3.14 dgidb_gene2chemical: 1 hits
API 3.15 dgidb_gene2chemical: No hits
API 3.16 dgidb_gene2chemical: No hits
API 3.17 dgidb_gene2chemical: 1 hits
API 3.18 dgidb_gene2chemical: No hits
API 3.19 dgidb_gene2chemical: 13 hits
API 3.20 dgidb_gene2chemical: No hits
API 3.21 dgidb_gene2chemical: No hits
API 3.22 dgidb_gene2chemical: 13 hits
API 3.23 dgidb_gene2chemical: No hits
API 3.24 dgidb_gene2chemical: No hits
API 3.25 dgidb_gene2chemical: 17 hits
API 3.26 dgidb_gene2chemical: No hits
API 3.27 dgidb_gene2chemical: 1 hits
API 3.28 dgidb_gene2chemical: 1 hits
API 3.29 dgidb_gene2chemical: No hits
API 3.30 dgidb_gene2chemical: No hits
API 3.31 dgidb_gene2chemi

API 3.223 dgidb_gene2chemical: 17 hits
API 3.224 dgidb_gene2chemical: 3 hits
API 3.225 dgidb_gene2chemical: 1 hits
API 3.226 dgidb_gene2chemical: 2 hits
API 3.227 dgidb_gene2chemical: 1 hits
API 3.228 dgidb_gene2chemical: 2 hits
API 3.229 dgidb_gene2chemical: 6 hits
API 3.230 dgidb_gene2chemical: No hits
API 3.231 dgidb_gene2chemical: 1 hits
API 3.232 dgidb_gene2chemical: No hits
API 3.233 dgidb_gene2chemical: No hits
API 3.234 dgidb_gene2chemical: No hits
API 3.235 dgidb_gene2chemical: No hits
API 3.236 dgidb_gene2chemical: No hits
API 3.237 dgidb_gene2chemical: No hits
API 3.238 dgidb_gene2chemical: 8 hits
API 3.239 dgidb_gene2chemical: No hits
API 3.240 dgidb_gene2chemical: No hits
API 3.241 dgidb_gene2chemical: No hits
API 3.242 dgidb_gene2chemical: No hits
API 3.243 dgidb_gene2chemical: 1 hits
API 3.244 dgidb_gene2chemical: No hits
API 3.245 dgidb_gene2chemical: No hits
API 3.246 dgidb_gene2chemical: 24 hits
API 3.247 dgidb_gene2chemical: No hits
API 3.248 dgidb_gene2chemical: No 

In [13]:
df = fc.display_table_view()

In [14]:
df.head()

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_id,node1_name,node1_type,pred2,pred2_source,pred2_api,pred2_pubmed,output_id,output_name,output_type
0,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:7124,TNF,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0037874,,ChemicalSubstance
1,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:7124,TNF,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0037874,,ChemicalSubstance
2,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:7124,TNF,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0037874,,ChemicalSubstance
3,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:7124,TNF,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0037874,,ChemicalSubstance
4,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:7124,TNF,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0037874,,ChemicalSubstance


In [21]:
'SNCA' in df['node1_name'].unique()

True

In [23]:
df_snca = df[df['node1_name'] == 'SNCA']

In [27]:
df_snca.head()

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_id,node1_name,node1_type,pred2,pred2_source,pred2_api,pred2_pubmed,output_id,output_name,output_type
3213,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,biolink,biolink_disease2gene,,entrez:6622,SNCA,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0243192,,ChemicalSubstance
3214,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,disgenet,mydisease.info,,entrez:6622,SNCA,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0243192,,ChemicalSubstance
3215,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,semmed,semmeddisease,,entrez:6622,SNCA,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0243192,,ChemicalSubstance
3216,Parkinson disease,DiseaseOrPhenotypicFeature,associatedWith,semmed,semmeddisease,,entrez:6622,SNCA,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0243192,,ChemicalSubstance
3217,Parkinson disease,DiseaseOrPhenotypicFeature,causedBy,semmed,semmeddisease,,entrez:6622,SNCA,Gene,molecularlyInteractsWith,semmed,semmedgene,,umls:C0243192,,ChemicalSubstance


In [26]:
df_snca['output_id'].unique()

array(['umls:C0243192', 'umls:C0207636', 'umls:C0008260', 'umls:C0043047',
       'umls:C1101610', 'chembl:CHEMBL165', 'chembl:CHEMBL3833330',
       'umls:C0005525', 'mesh:D002945', 'drugbank:DB09130',
       'umls:C1611640', 'umls:C0031253', 'chembl:CHEMBL59'], dtype=object)