# Introduction


This notebook demonstrates how BioThings Explorer can be used to execute queries having more than one intermediate nodes:

The query starts from drug "Anisindione", the two intermediate nodes with be *Gene and DiseaseOrPhenotypicFeature", the final output will be "PhenotypicFeature".


**Background**: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/EXPLAIN_demo.ipynb), and PREDICT queries are described in [PREDICT_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/PREDICT_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them.  A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

**To experiment with an executable version of this notebook, [load it in Google Colaboratory](https://colab.research.google.com/github/biothings/biothings_explorer/blob/master/jupyter%20notebooks/Multi%20intermediate%20nodes%20query.ipynb).**

## Step 0: Load BioThings Explorer modules

Install the `biothings_explorer` and `biothings_schema` packages, as described in this [README](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/README.md#prerequisite).  This only needs to be done once (but including it here for compability with [colab](https://colab.research.google.com/)).

In [None]:
!pip install git+https://github.com/biothings/biothings_explorer#egg=biothings_explorer

Next, import the relevant modules:

* **Hint**: Find corresponding bio-entity representation used in BioThings Explorer based on user input (could be any database IDs, symbols, names)
* **FindConnection**: Find intermediate bio-entities which connects user specified input and output

In [1]:
from biothings_explorer.hint import Hint
from biothings_explorer.user_query_dispatcher import FindConnection

## Step 1: Find representation of "Anisindione" in BTE

In this step, BioThings Explorer translates our query string "Anisindioine"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [2]:
ht = Hint()
anisindione = ht.query("Anisindione")['ChemicalSubstance'][0]

anisindione

{'chembl': 'CHEMBL712',
 'drugbank': 'DB01125',
 'name': 'Anisindione',
 'pubchem': 2197,
 'umls': 'C0051919',
 'chebi': 'CHEBI:133809',
 'display': 'chembl(CHEMBL712) drugbank(DB01125) name(Anisindione) pubchem(2197) umls(C0051919) chebi(CHEBI:133809) ',
 'type': 'ChemicalSubstance',
 'primary': {'identifier': 'chembl',
  'cls': 'ChemicalSubstance',
  'value': 'CHEMBL712'}}

## Step 2: Find phenotypes that are associated with Anisindione through Gene and DiseaseOrPhenotypicFeature as intermediate nodes

In this section, we find all paths in the knowledge graph that connect Anisindione to any entity that is a phenotypic feature.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 

In [3]:
fc = FindConnection(input_obj=anisindione, 
                    output_obj='PhenotypicFeature', 
                    intermediate_nodes=['Gene', 'DiseaseOrPhenotypicFeature'])

In [4]:
fc.connect(verbose=True)


BTE will find paths that join 'Anisindione' and 'PhenotypicFeature'. Paths will have 2 intermediate node.

Intermediate node #1 will have these type constraints: Gene

Intermediate node #2 will have these type constraints: DiseaseOrPhenotypicFeature




==== Step #1: Query path planning ====

Because Anisindione is of type 'ChemicalSubstance', BTE will query our meta-KG for APIs that can take 'ChemicalSubstance' as input and 'Gene' as output

BTE found 4 apis:

API 1. dgidb_chemical2gene(1 API call)
API 2. chembl_drug_mechanism(1 API call)
API 3. semmedgene(6 API calls)
API 4. mychem.info(2 API calls)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 4.2: http://mychem.info/v1/query (POST "q=DB01125&scopes=drugbank.id&fields=drugbank.enzymes,drugbank.targets&species=human&size=100")
API 4.1: http://mychem.info/v1/query (POST "q=CHEMBL712&scopes=chembl.molecule_chembl_id&fields=drugcentral.b

API 1.60: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0005090/phenotypes?rows=100
API 1.2: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0009172/phenotypes?rows=100
API 1.66: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0005492/phenotypes?rows=100
API 1.57: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0006547/phenotypes?rows=100
API 1.10: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0004982/phenotypes?rows=100
API 1.63: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0005335/phenotypes?rows=100
API 1.24: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0019200/phenotypes?rows=100
API 1.53: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0000986/phenotypes?rows=100
API 1.7: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0001415/phenotypes?rows=100
API 1.4: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0005053/phenotypes?rows=100
API

API 1.102: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0006857/phenotypes?rows=100
API 1.104: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0011751/phenotypes?rows=100
API 1.101: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0007136/phenotypes?rows=100
API 1.32: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0006873/phenotypes?rows=100
API 1.103: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0009212/phenotypes?rows=100
API 1.108: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0016627/phenotypes?rows=100
API 1.110: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0005311/phenotypes?rows=100
API 1.36: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0010108/phenotypes?rows=100
API 1.105: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0007362/phenotypes?rows=100
API 1.106: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0011782/phenotypes?r

API 1.30 biolink_disease2phenotype: 46 hits
API 1.31 biolink_disease2phenotype: 100 hits
API 1.32 biolink_disease2phenotype: 100 hits
API 1.33 biolink_disease2phenotype: 100 hits
API 1.34 biolink_disease2phenotype: 3 hits
API 1.35 biolink_disease2phenotype: 100 hits
API 1.36 biolink_disease2phenotype: 8 hits
API 1.37 biolink_disease2phenotype: 1 hits
API 1.38 biolink_disease2phenotype: 76 hits
API 1.39 biolink_disease2phenotype: 14 hits
API 1.40 biolink_disease2phenotype: 71 hits
API 1.41 biolink_disease2phenotype: 100 hits
API 1.42 biolink_disease2phenotype: 8 hits
API 1.43 biolink_disease2phenotype: 13 hits
API 1.44 biolink_disease2phenotype: No hits
API 1.45 biolink_disease2phenotype: 15 hits
API 1.46 biolink_disease2phenotype: 100 hits
API 1.47 biolink_disease2phenotype: 41 hits
API 1.48 biolink_disease2phenotype: 100 hits
API 1.49 biolink_disease2phenotype: No hits
API 1.50 biolink_disease2phenotype: 100 hits
API 1.51 biolink_disease2phenotype: 7 hits
API 1.52 biolink_disease2phen

In [5]:
df = fc.display_table_view()

The df object contains the full output from BioThings Explorer. Each row shows one path that joins the input node (ANISINDIONE) to an intermediate node (a gene or protein) to another intermediate node (a DisseaseOrPhenotypicFeature) to an ending node (a Phenotypic Feature). The data frame includes a set of columns with additional details on each node and edge (including human-readable labels, identifiers, and sources). Let's remove all examples where the output_name (the phenotype label) is None, and specifically focus on paths with specific mechanistic predicates **target** and **causes**.

In [6]:
dfFilt = df.loc[df['output_name'].notnull()].query('pred1 == "target" and pred2 == "causes"')

In [7]:
dfFilt

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,...,node2_type,node2_name,node2_id,pred3,pred3_source,pred3_api,pred3_pubmed,output_type,output_name,output_id
39,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,cancer,mondo:MONDO:0004992,associatedWith,biolink,biolink_disease2phenotype,,PhenotypicFeature,HP:0002664,hp:HP:0002664
44,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,carcinoma of liver and intrahepatic biliary tract,mondo:MONDO:0018531,associatedWith,biolink,biolink_disease2phenotype,,PhenotypicFeature,HP:0002664,hp:HP:0002664
45,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,exanthem (disease),mondo:MONDO:0006547,associatedWith,biolink,biolink_disease2phenotype,,PhenotypicFeature,HP:0002664,hp:HP:0002664
58,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,cancer,mondo:MONDO:0004992,associatedWith,biolink,biolink_disease2phenotype,,PhenotypicFeature,HP:0000976,hp:HP:0000976
63,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,cancer,mondo:MONDO:0004992,associatedWith,biolink,biolink_disease2phenotype,,PhenotypicFeature,HP:0005534,hp:HP:0005534
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19307,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,mitochondrial DNA depletion syndrome 4a,mondo:MONDO:0008758,associatedWith,hpo,mydisease.info,,PhenotypicFeature,HP:0002446,hp:HP:0002446
19308,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,mitochondrial DNA depletion syndrome 4a,mondo:MONDO:0008758,associatedWith,hpo,mydisease.info,,PhenotypicFeature,HP:0000478,hp:HP:0000478
19309,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,mitochondrial DNA depletion syndrome 4a,mondo:MONDO:0008758,associatedWith,hpo,mydisease.info,,PhenotypicFeature,HP:0001276,hp:HP:0001276
19310,ANISINDIONE,ChemicalSubstance,target,dgidb,dgidb_chemical2gene,,Gene,GC,entrez:2638,causes,...,DiseaseOrPhenotypicFeature,mitochondrial DNA depletion syndrome 4a,mondo:MONDO:0008758,associatedWith,hpo,mydisease.info,,PhenotypicFeature,HP:0003678,hp:HP:0003678
