# Introduction

This notebook demonstrates how BioThings Explorer can be used to answer the following query:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*"What biosamples are associated with diseases related to gene SLC15A4"*

**To experiment with an executable version of this notebook, [load it in Google Colaboratory](https://colab.research.google.com/github/biothings/biothings_explorer/blob/master/jupyter%20notebooks/Demo%20of%20Integrating%20Stanford%20BioSample%20API%20into%20BTE.ipynb).**

**Background**: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT".  EXPLAIN queries are described in [EXPLAIN_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/EXPLAIN_demo.ipynb), and PREDICT queries are described in [PREDICT_demo.ipynb](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/PREDICT_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them.  A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

In the first stage of the query, BTE will first call all APIs which can provide association data between SLC15A4 and diseases, including:
1. [DISEASES API](http://smart-api.info/ui/a7f784626a426d054885a5f33f17d3f8)
2. [BIOLINK API](http://smart-api.info/ui/d22b657426375a5295e7da8a303b9893)
3. [SEMMED API](http://smart-api.info/ui/e99229fc6ccb9ad9889bcc9c77a36bad)
4. [MyDisease.info API](http://smart-api.info/ui/f307760715d91908d0ae6de7f0810b22)
5. [CTD API](http://smart-api.info/ui/0212611d1c670f9107baf00b77f0889a)

In the second stage of the query, BTE will first call all APIs which can provide association data between diseases and biosamples through **[Stanford Biosample API](http://smart-api.info/ui/553a49d112bb19306253942ebd6377a9)**.



## Step 0: Load BioThings Explorer modules

Install the `biothings_explorer` packages, as described in this [README](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/README.md#prerequisite).  This only needs to be done once (but including it here for compability with [colab](https://colab.research.google.com/)).

In [None]:
!pip install git+https://github.com/biothings/biothings_explorer#egg=biothings_explorer

In [1]:
from biothings_explorer.user_query_dispatcher import FindConnection

from biothings_explorer.hint import Hint


## Step 1: Find representation of "SLC15A4" in BTE

In this step, BioThings Explorer translates our query string "SLC15A4"  into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `DiseaseOrPhenotypicFeature` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle").

In [2]:
ht = Hint()
SLC15A4 = ht.query("SLC15A4")['Gene'][0]
SLC15A4

{'entrez': '121260',
 'name': 'solute carrier family 15 member 4',
 'symbol': 'SLC15A4',
 'taxonomy': 9606,
 'umls': 'C1427907',
 'uniprot': 'Q8N697',
 'hgnc': '23090',
 'ensembl': 'ENSG00000139370',
 'display': 'entrez(121260) name(solute carrier family 15 member 4) symbol(SLC15A4) taxonomy(9606) umls(C1427907) uniprot(Q8N697) hgnc(23090) ensembl(ENSG00000139370) ',
 'type': 'Gene',
 'primary': {'identifier': 'entrez', 'cls': 'Gene', 'value': '121260'}}

## Step 2: Find biosamples that are associated with diseases which related to Gene SLC15A4

In this section, we find all paths in the knowledge graph that connect SLC15A4 to any entity that is a biosample.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. 

In [3]:
fc = FindConnection(input_obj=SLC15A4, output_obj='Biosample', intermediate_nodes=['DiseaseOrPhenotypicFeature'])

In [4]:
fc.connect(verbose=True)


BTE will find paths that join 'SLC15A4' and 'Biosample'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: DiseaseOrPhenotypicFeature





==== Step #1: Query path planning ====

Because SLC15A4 is of type 'Gene', BTE will query our meta-KG for APIs that can take 'Gene' as input and 'DiseaseOrPhenotypicFeature' as output

BTE found 5 apis:

API 1. semmeddisease(6 API calls)
API 2. ctd_gene2disease(1 API call)
API 3. DISEASES(1 API call)
API 4. mydisease.info(1 API call)
API 5. biolink_gene2disease(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 4.1: http://mydisease.info/v1/query (POST "q=121260&scopes=disgenet.genes_related_to_disease.gene_id&fields=mondo.xrefs.umls,disgenet.xrefs.umls&species=human&size=100")
API 1.4: http://pending.biothings.io/semmed/query (POST "q=C1427907&scopes=AFFECTS_reverse.protein.umls&fields=umls&species=h

## Step 3: Explore the results

Through BTE, we found **8 DiseasesOrPhenotypicFeature entities** which are associated with Gene SLC15A. And we found **770 biosample entities** which are associated with these diseases.

In [5]:
fc.display_table_view()

Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_id,node1_name,node1_type,pred2,pred2_source,pred2_api,pred2_pubmed,output_id,output_name,output_type
0,SLC15A4,Gene,associatedWith,DISEASES,DISEASES,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMEA2402387,Biosample
1,SLC15A4,Gene,associatedWith,biolink,biolink_gene2disease,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMEA2402387,Biosample
2,SLC15A4,Gene,associatedWith,CTD,ctd_gene2disease,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMEA2402387,Biosample
3,SLC15A4,Gene,associatedWith,mydisease.info,mydisease.info,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMEA2402387,Biosample
4,SLC15A4,Gene,associatedWith,DISEASES,DISEASES,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMEA4456858,Biosample
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3075,SLC15A4,Gene,associatedWith,mydisease.info,mydisease.info,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMEA104266477,Biosample
3076,SLC15A4,Gene,associatedWith,DISEASES,DISEASES,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMN04017728,Biosample
3077,SLC15A4,Gene,associatedWith,biolink,biolink_gene2disease,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMN04017728,Biosample
3078,SLC15A4,Gene,associatedWith,CTD,ctd_gene2disease,,,systemic lupus erythematosus (disease),DiseaseOrPhenotypicFeature,diseaseAssociatedWithBiosample,NCBI Biosample Database,stanford_biosample_disease2sample,,,SAMN04017728,Biosample
