# Introduction

This notebook demonstrates basic usage of BioThings Explorer, an engine for autonomously querying a distributed knowledge graph of biomedical knowledge.

BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT". EXPLAIN queries are described in [EXPLAIN_demo.ipynb](EXPLAIN_demo.ipynb). Here, we describe PREDICT queries and how to use BioThings Explorer to execute them. A more detailed overview of the BioThings Explorer systems is provided in [these slides](https://docs.google.com/presentation/d/1QWQqqQhPD_pzKryh6Wijm4YQswv8pAjleVORCPyJyDE/edit?usp=sharing).

EXPLAIN queries are designed to **identify plausible reasoning chains to explain the relationship between two specific biomedical entities (objects or concepts)**.  For example, in this notebook, we explore the question:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"*Why does imatinib have an effect on the treatment of chronic myelogenous leukemia (CML)?*"  

Later, we also compare those results to a similar query looking at imatinib's role in treating gastrointestinal stromal tumors (GIST).

**To experiment with an executable version of this notebook, [load it in Google Colaboratory](https://colab.research.google.com/github/biothings/biothings_explorer/blob/master/jupyter%20notebooks/EXPLAIN_demo.ipynb).**

## Step 0: Load BioThings Explorer modules

First, install the `biothings_explorer` and `biothings_schema` packages, as described in this [README](https://github.com/biothings/biothings_explorer/blob/master/jupyter%20notebooks/README.md#prerequisite).  This only needs to be done once (but it's included here for compability with [Colab](https://colab.research.google.com/)).

In [1]:
%%capture
!pip install git+https://github.com/biothings/biothings_explorer#egg=biothings_explorer

Next, import the relevant modules:

* **Hint**: Find corresponding BioThings representations to use in BioThings Explorer based on user input (could be any database IDs, symbols, names)
* **FindConnection**: Find the relationship(s) between a specific entity and an entity type (PREDICT) or two specific entities (EXPLAIN))

In [2]:
# import modules from biothings_explorer
from biothings_explorer.hint import Hint
from biothings_explorer.user_query_dispatcher import FindConnection

In [3]:
def hint_display(query, hint_result):
    """
    input: query (string used in hint query), hint_result (object returned from hint query)
    hint_result is a dictionary of lists of dictionaries
    returns: None
    show the name and type of all results returned by the query
    """
    display = ['name', 'type']  ## replace with the parts of the BioThings object you want to see
    concise_results = []
    
    for BT_type, result in hint_result.items():
        if result:  ## basically if it's not empty
            for items in result:
                concise_results.append([items[ele] for ele in display])
    
    print('There are {total} BioThings objects returned for {ht}:'.format(\
                total = len(concise_results), ht = query))
    for (idx, display_info) in enumerate(concise_results):
        print('{i}: {val}'.format(i = idx, val = display_info))

## Step 1: Find representations of "chronic myelogenous leukemia" and "imatinib" in BTE

In this step, BioThings Explorer translates our query strings "chronic myelogenous leukemia" and "imatinib" into BioThings objects, which contain mappings to many common identifiers.  Generally, the top result returned by the `Hint` module will match what you want, but you should confirm that using the identifiers shown. 

BioThings types correspond to children and descendants of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `Disease` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle"). **However, [only a subset of the Biolink BiologicalEntity children / descendants are currently implemented in BTE](https://smart-api.info/portal/translator/metakg)**. More biomedical object types will be available as more knowledge sources (APIs) are added to the system. 

**Note that the type `BiologicalEntity` means any BioThings type currently implemented in BTE will be accepted.**

In [4]:
ht = Hint()
cml_str = "chronic myelogenous leukemia"
# find all potential representations of CML
cml_hint = ht.query(cml_str)

hint_display(cml_str, cml_hint)

There are 2 BioThings objects returned for chronic myelogenous leukemia:
0: ['chronic myelogenous leukemia, BCR-ABL1 positive', 'Disease']
1: ['blast phase chronic myelogenous leukemia, BCR-ABL1 positive', 'Disease']


Based on the information above, we'll pick the top Disease choice (indexed at 0) for CML for our query. We can look at identifier mappings inside this BioThings object. 

In [5]:
# select the first Disease object for our CML query
cml = cml_hint['Disease'][0]
cml

{'MONDO': 'MONDO:0011996',
 'DOID': 'DOID:8552',
 'UMLS': 'C0023473',
 'name': 'chronic myelogenous leukemia, BCR-ABL1 positive',
 'OMIM': '608232',
 'ORPHANET': '521',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0011996'},
 'display': 'MONDO(MONDO:0011996) DOID(DOID:8552) OMIM(608232) ORPHANET(521) UMLS(C0023473) name(chronic myelogenous leukemia, BCR-ABL1 positive)',
 'type': 'Disease'}

In [6]:
# find all potential representations of imatinib
imatinib_str = "imatinib"
imatinib_hint = ht.query(imatinib_str)

hint_display(imatinib_str, imatinib_hint)

There are 10 BioThings objects returned for imatinib:
0: ['imatinib', 'ChemicalSubstance']
1: ['imatinib methanesulfonate', 'ChemicalSubstance']
2: ['linkable imatinib analogue', 'ChemicalSubstance']
3: ['IMATINIB', 'ChemicalSubstance']
4: ['IMATINIB MESYLATE', 'ChemicalSubstance']
5: ['HYPEREOSINOPHILIC SYNDROME, IDIOPATHIC, RESISTANT TO IMATINIB', 'Disease']
6: ['CHRONIC MYELOID LEUKEMIA, RESISTANT TO IMATINIB', 'Disease']
7: ['LEUKEMIA, PHILADELPHIA CHROMOSOME-POSITIVE, RESISTANT TO IMATINIB', 'Disease']
8: ['Imatinib Pathway, Pharmacokinetics/Pharmacodynamics', 'Pathway']
9: ['Imatinib and Chronic Myeloid Leukemia', 'Pathway']


While multiple results look valid (indexed at 0, 1, 3, 4), for now we'll pick the top ChemicalSubstance choice (indexed at 0) for imatinib for our query. We can look at identifier mappings inside this BioThings object. 

In [7]:
# select the correct representation of imatinib
imatinib = imatinib_hint['ChemicalSubstance'][0]
imatinib

{'DRUGBANK': 'DB00619',
 'CHEBI': 'CHEBI:45783',
 'name': 'imatinib',
 'primary': {'identifier': 'CHEBI',
  'cls': 'ChemicalSubstance',
  'value': 'CHEBI:45783'},
 'display': 'CHEBI(CHEBI:45783) DRUGBANK(DB00619) name(imatinib)',
 'type': 'ChemicalSubstance'}

## Step 2: Find intermediate nodes connecting imatinib and chronic myelogenous leukemia

In this section, we find some paths in the knowledge graph that connect imatinib and chronic myelogenous leukemia.  To do that, we will use `FindConnection`.  This class is a convenient wrapper around two advanced functions for **query path planning** and **query path execution**. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months. 

The parameters for `FindConnection` are described below:

In [8]:
help(FindConnection.__init__)

Help on function __init__ in module biothings_explorer.user_query_dispatcher:

__init__(self, input_obj, output_obj, intermediate_nodes, registry=None)
    Find relationships in the Knowledge Graph between an Input Object and an Output Object.
    
    Args:
        input_obj (required): must be an object returned from Hint corresponding to a specific biomedical entity.
                            Examples:
                Hint().query("Fanconi anemia")['DiseaseOrPhenotypicFeature'][0]
                Hint().query("acetaminophen")['ChemicalSubstance'][0]
    
        output_obj (required): must EITHER be an object returned from Hint corresponding to a specific biomedical
                            entity, OR be a string or list of strings corresponding to Biolink Entity classes.
                            Examples:
                Hint().query("acetaminophen")['ChemicalSubstance'][0]
                'Gene'
                ['Gene','ChemicalSubstance']
    
        intermediate_nodes (

Here, we formulate a `FindConnection` query with CML as the `input_obj`, imatinib as the `output_obj`.  We further specify with the `intermediate_nodes` parameter that we are looking for paths joining chronic myelogenous leukemia and imatinib with *one* intermediate node that is a "Gene" (BioThings type).  

In [9]:
fc = FindConnection(input_obj = cml, output_obj = imatinib, \
                    intermediate_nodes = 'Gene')

We next execute the `connect` method, which performs the **query path planning** and **query path execution** process.  In short, BioThings Explorer is deconstructing the query into individual API calls, executing those API calls, then assembling the results.

A verbose log of this process is displayed below:

In [10]:
# set verbose=True will display all steps which BTE takes to find the connection
fc.connect(verbose = True)


BTE will find paths that join 'chronic myelogenous leukemia, BCR-ABL1 positive' and 'imatinib'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene



==== Step #1: Query path planning ====

Because chronic myelogenous leukemia, BCR-ABL1 positive is of type 'Disease', BTE will query our meta-KG for APIs that can take 'Disease' as input and 'Gene' as output

BTE found 10 apis:

API 1. pharos(1 API call)
API 2. scigraph(1 API call)
API 3. cord_disease(1 API call)
API 4. semmed_disease(15 API calls)
API 5. hetio(1 API call)
API 6. DISEASES(1 API call)
API 7. mgi_gene2phenotype(1 API call)
API 8. mydisease(1 API call)
API 9. biolink(1 API call)
API 10. scibite(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 7.1: https://mydisease.info/v1/query?fields=disgenet.genes_related_to_disease.gene_id (POST -d q=C0023474,C0023473&scopes=mondo.xr

API 6.3 mychem: 9 hits
API 2.1 pharos: 6 hits
API 7.2 semmed_chemical: 96 hits
API 7.3 semmed_chemical: 36 hits
API 7.4 semmed_chemical: No hits
API 7.5 semmed_chemical: No hits
API 8.1 dgidb: 5 hits
API 8.2 dgidb: 34 hits
API 9.1 hmdb: No hits
API 7.6 semmed_chemical: No hits
API 7.7 semmed_chemical: 119 hits
API 7.8 semmed_chemical: No hits
API 7.9 semmed_chemical: 11 hits
API 1.1 cord_chemical: 172 hits
API 4.1 ctd: 199 hits
API 4.2 ctd: No hits
API 7.10 semmed_chemical: 166 hits
API 7.11 semmed_chemical: No hits
API 7.12 semmed_chemical: No hits
API 7.13 semmed_chemical: 39 hits

After id-to-object translation, BTE retrieved 728 unique objects.



BTE found 249 unique intermediate nodes connecting 'chronic myelogenous leukemia, BCR-ABL1 positive' and 'imatinib'


## Step 3: Display and filter results

This section demonstrates post-query filtering done in Python. Later, more advanced filtering functions will be added to the **query path execution** module for interleaved filtering, thereby enabling longer query paths. More details to come...

First, results can be exported to a data frame. Let's examine a sample of those results.

In [11]:
df = fc.display_table_view()

# because UMLS is not currently well-integrated in our ID-to-object translation system, removing UMLS-only entries here
patternDel = "^UMLS:C\d+"
filter = df.node1_id.str.contains(patternDel)
df = df[~filter]

print(df.shape)
df.sample(10)

(1028, 16)


Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
1319,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,affected_by,SEMMED,SEMMED Disease API,27573699,Gene,MYC,NCBIGene:4609,positively_regulated_by,SEMMED,SEMMED Chemical API,22842456,Gene,IMATINIB,name:IMATINIB
563,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,related_to,,BioLink API,,Gene,BCR,NCBIGene:613,positively_regulates,SEMMED,SEMMED Chemical API,202288462220439522641215,Gene,IMATINIB,name:IMATINIB
1092,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,caused_by,SEMMED,SEMMED Disease API,"10092207,10224280,10490629,10676660,10833515,1...",Gene,MTTP,NCBIGene:4547,coexists_with,SEMMED,SEMMED Chemical API,1266972715283151215055922271316123434731,Gene,IMATINIB,name:IMATINIB
1537,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,related_to,SEMMED,SEMMED Disease API,10688886,Gene,SQSTM1,NCBIGene:8878,related_to,CTD,CTD API,2264161630165626,Gene,IMATINIB,name:IMATINIB
700,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,treated_by,SEMMED,SEMMED Disease API,"15717993,16205964,16596277,17694264,17891251,2...",Gene,KIT,NCBIGene:3815,positively_regulates,SEMMED,SEMMED Chemical API,171394612876592728878021,Gene,IMATINIB,name:IMATINIB
911,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,affected_by,SEMMED,SEMMED Disease API,22823957,Gene,ABCB1,NCBIGene:5243,physically_interacts_with,SEMMED,SEMMED Chemical API,"16157201,17495881,17690695,18669873,19225039,2...",Gene,IMATINIB,name:IMATINIB
1264,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,caused_by,SEMMED,SEMMED Disease API,15833859,Gene,IFI27,NCBIGene:3429,physically_interacts_with,SEMMED,SEMMED Chemical API,18974147,Gene,IMATINIB,name:IMATINIB
759,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,related_to,Translator Text Mining Provider,CORD Disease API,,Gene,KIT,NCBIGene:3815,related_to,Translator Text Mining Provider,CORD Chemical API,,Gene,IMATINIB,name:IMATINIB
959,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,related_to,SEMMED,SEMMED Disease API,1857774418984583211541272409114425965880,Gene,LYN,NCBIGene:4067,related_to,Translator Text Mining Provider,CORD Chemical API,,Gene,IMATINIB,name:IMATINIB
935,"CHRONIC MYELOGENOUS LEUKEMIA, BCR-ABL1 POSITIVE",Disease,related_to,Translator Text Mining Provider,CORD Disease API,,Gene,SRC,NCBIGene:6714,physically_interacts_with,SEMMED,SEMMED Chemical API,27087366,Gene,IMATINIB,name:IMATINIB


Notice that results give the knowledge source (API) and that many also give the literature source (Pubmed ID). 

Next, let's look to see which genes are mentioned the most.

In [12]:
df.node1_name.value_counts().head(10)

BCR      210
ABL1      78
KIT       72
MTTP      42
AKT1      36
TP53      25
ABCG2     20
ABCB1     18
LYN       18
PTEN      16
Name: node1_name, dtype: int64

*Note: as of 2020-08-13, the notebook output above matches the description below. As knowledge sources (APIs) and code are updated and added, users may obtain different results.*

Not surprisingly, the top two genes that BioThings Explorer found that join imatinib to CML are *ABL1* and *BCR*, the two genes that are fused in the "Philadelphia chromosome", the genetic abnormality that underlies CML, *and* the validated target of imatinib.

Let's examine some of the PubMed articles linking **CML to *ABL1*** and ***ABL1* to imatinib**.

In [13]:
# fetch all articles connecting 'chronic myelogenous leukemia' and 'ABL1'
ABL1_str = 'ABL1'
CML_name = cml['name']

articles = []
for info in fc.display_edge_info(CML_name, ABL1_str).values():
    if 'pubmed' in info['info']:
        articles += info['info']['pubmed']
print("There are "+str(len(articles))+" articles supporting the edge between CML and ABL1. Sampling of 10 of those:")
x = [print("http://pubmed.gov/" + str(x)) for x in articles[0:10] ]

There are 17 articles supporting the edge between CML and ABL1. Sampling of 10 of those:
http://pubmed.gov/24662807
http://pubmed.gov/26179066
http://pubmed.gov/10498618
http://pubmed.gov/10822991
http://pubmed.gov/11368359
http://pubmed.gov/11979553
http://pubmed.gov/18082628
http://pubmed.gov/18243808
http://pubmed.gov/18406870
http://pubmed.gov/20809971


In [14]:
# fetch all articles connecting 'ABL1' and 'Imatinib
imatinib_name = imatinib['name']

articles = []
for info in fc.display_edge_info(ABL1_str, imatinib_name).values():
    if 'pubmed' in info['info']:
        articles += info['info']['pubmed']
print("There are "+str(len(articles))+" articles supporting the edge between ABL1 and imatinib. Sampling of 10 of those:")
x = [print("http://pubmed.gov/" + str(x)) for x in articles[0:10] ]

There are 29 articles supporting the edge between ABL1 and imatinib. Sampling of 10 of those:
http://pubmed.gov/15799618
http://pubmed.gov/15917650
http://pubmed.gov/15949566
http://pubmed.gov/16153117
http://pubmed.gov/16205964
http://pubmed.gov/15713800
http://pubmed.gov/19166098
http://pubmed.gov/26030291
http://pubmed.gov/26618175
http://pubmed.gov/21693657


## Comparing results between CML and GIST

Let's perform another BioThings Explorer query, this time looking to EXPLAIN the relationship between imatinib and gastrointestinal stromal tumors (GIST), another disease treated by imatinib.

In [15]:
ht = Hint()
gist_str = "gastrointestinal stromal tumor"

# find all potential representations of CML
gist_hint = ht.query(gist_str)

hint_display(gist_str, gist_hint)

There are 4 BioThings objects returned for gastrointestinal stromal tumor:
0: ['gastrointestinal stromal tumor', 'Disease']
1: ['GASTROINTESTINAL STROMAL TUMOR, FAMILIAL', 'Disease']
2: ['Gastric Gastrointestinal Stromal Tumor', 'Disease']
3: ['colorectal gastrointestinal stromal tumor', 'Disease']


While all results look valid, for now we'll pick the top Disease choice (indexed at 0) for GIST for our query. We can look at identifier mappings inside this BioThings object. 

In [16]:
# select the correct representation of CML
gist = gist_hint['Disease'][0]
gist

{'MONDO': 'MONDO:0011719',
 'DOID': 'DOID:9253',
 'UMLS': 'C0238198',
 'name': 'gastrointestinal stromal tumor',
 'MESH': 'D046152',
 'OMIM': '606764',
 'ORPHANET': '44890',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0011719'},
 'display': 'MONDO(MONDO:0011719) DOID(DOID:9253) OMIM(606764) ORPHANET(44890) UMLS(C0238198) MESH(D046152) name(gastrointestinal stromal tumor)',
 'type': 'Disease'}

Notice that like before, we are specifying with the `intermediate_nodes` parameter that we are looking for paths joining GIST and imatinib with *one* intermediate node that is a "Gene" (BioThings type).  

In [17]:
fc2 = FindConnection(input_obj = gist, output_obj = imatinib, \
                     intermediate_nodes = 'Gene')

In [18]:
fc2.connect(verbose = True) 


BTE will find paths that join 'gastrointestinal stromal tumor' and 'imatinib'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene



==== Step #1: Query path planning ====

Because gastrointestinal stromal tumor is of type 'Disease', BTE will query our meta-KG for APIs that can take 'Disease' as input and 'Gene' as output

BTE found 10 apis:

API 1. pharos(1 API call)
API 2. scigraph(1 API call)
API 3. cord_disease(1 API call)
API 4. semmed_disease(15 API calls)
API 5. hetio(1 API call)
API 6. DISEASES(1 API call)
API 7. mgi_gene2phenotype(1 API call)
API 8. mydisease(1 API call)
API 9. biolink(1 API call)
API 10. scibite(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 7.1: https://mydisease.info/v1/query?fields=disgenet.genes_related_to_disease.gene_id (POST -d q=C0238198&scopes=mondo.xrefs.umls, disgenet.xrefs.umls)
API 4.11: ht


After id-to-object translation, BTE retrieved 728 unique objects.



BTE found 130 unique intermediate nodes connecting 'gastrointestinal stromal tumor' and 'imatinib'


In [19]:
df2 = fc2.display_table_view()

# because UMLS is not currently well-integrated in our ID-to-object translation system, removing UMLS-only entries here
patternDel = "^UMLS:C\d+"
filter = df2.node1_id.str.contains(patternDel)
df2 = df2[~filter]

print(df2.shape)
df2.sample(10)

(772, 16)


Unnamed: 0,input,input_type,pred1,pred1_source,pred1_api,pred1_pubmed,node1_type,node1_name,node1_id,pred2,pred2_source,pred2_api,pred2_pubmed,output_type,output_name,output_id
694,GASTROINTESTINAL STROMAL TUMOR,Disease,treated_by,SEMMED,SEMMED Disease API,24399735,Gene,CYP3A4,NCBIGene:1576,related_to,Translator Text Mining Provider,CORD Chemical API,,Gene,IMATINIB,name:IMATINIB
707,GASTROINTESTINAL STROMAL TUMOR,Disease,related_to,Translator Text Mining Provider,CORD Disease API,,Gene,EGFR,NCBIGene:1956,physically_interacts_with,SEMMED,SEMMED Chemical API,1588723816476837,Gene,IMATINIB,name:IMATINIB
675,GASTROINTESTINAL STROMAL TUMOR,Disease,related_to,SEMMED,SEMMED Disease API,18265649186487362552585128698433,Gene,PDGFRB,NCBIGene:5159,physically_interacts_with,SEMMED,SEMMED Chemical API,28028030,Gene,IMATINIB,name:IMATINIB
919,GASTROINTESTINAL STROMAL TUMOR,Disease,related_to,SEMMED,SEMMED Disease API,20028860,Gene,H2AX,NCBIGene:3014,physically_interacts_with,SEMMED,SEMMED Chemical API,22388075,Gene,IMATINIB,name:IMATINIB
990,GASTROINTESTINAL STROMAL TUMOR,Disease,related_to,Translator Text Mining Provider,CORD Disease API,,Gene,BCL2,NCBIGene:596,related_to,CTD,CTD API,23994742,Gene,IMATINIB,name:IMATINIB
975,GASTROINTESTINAL STROMAL TUMOR,Disease,related_to,DISEASE,DISEASES API,,Gene,SPINK5,NCBIGene:11005,related_to,Translator Text Mining Provider,CORD Chemical API,,Gene,IMATINIB,name:IMATINIB
431,GASTROINTESTINAL STROMAL TUMOR,Disease,caused_by,SEMMED,SEMMED Disease API,10779223117065201293826027771813,Gene,KIT,NCBIGene:3815,negatively_regulates,SEMMED,SEMMED Chemical API,"12938280,14531349,15139047,15966213,16614006,1...",Gene,IMATINIB,name:IMATINIB
730,GASTROINTESTINAL STROMAL TUMOR,Disease,related_to,DISEASE,DISEASES API,,Gene,BRAF,NCBIGene:673,physically_interacts_with,drugcentral,MyChem.info API,,Gene,IMATINIB,name:IMATINIB
200,GASTROINTESTINAL STROMAL TUMOR,Disease,related_to,Translator Text Mining Provider,CORD Disease API,,Gene,BCR,NCBIGene:613,related_to,CTD,CTD API,20222756,Gene,IMATINIB,name:IMATINIB
965,GASTROINTESTINAL STROMAL TUMOR,Disease,caused_by,SEMMED,SEMMED Disease API,28862263,Gene,NF1,NCBIGene:4763,related_to,Translator Text Mining Provider,CORD Chemical API,,Gene,IMATINIB,name:IMATINIB


In [20]:
df2.node1_name.value_counts().head(10)

KIT       306
PDGFRA     88
BCR        45
VEGFA      24
BRAF       20
TP53       20
MTTP       14
ABL1       13
CD34       12
FIP1L1     12
Name: node1_name, dtype: int64

*Note: as of 2020-08-13, the notebook output above matches the description below. As knowledge sources (APIs) and code are updated and added, users may obtain different results.*

Here, the top two genes that BioThings Explorer found that join imatinib to GIST are *PDGFRA* and *KIT*, the most commonly mutated genes found in GIST *and* validated targets of imatinib.

While several of the listed genes would be considered positive controls, others on the list could be viewed as **testable hypotheses and discovery opportunities** to be evaluated by domain experts.

## Conclusions and caveats

This notebook demonstrated the use of BioThings Explorer in EXPLAIN mode to investigate the relationship between imatinib and two diseases that it treats -- chronic myelogenous leukemia (CML) and gastrointestinal stromal tumors (GIST).  In each case, BioThings Explorer autonomously queried a **distributed knowledge graph of biomedical APIs** to find the most common genes, and in each case the relevant targets were retrieved.

There are still many areas for improvement (and some areas in which BioThings Explorer is still buggy).  And of course, BioThings Explorer is dependent on the accessibility of the APIs that comprise the distributed knowledge graph.  Nevertheless, we encourage users to try other variants of the EXPLAIN queries demonstrated in this notebook.

## More query information for users

*Note: as of 2020-08-13, the information below is valid.*

The input into the Hint query should be one string with **at least one whole word.** 

Currently, the user must set the input_obj parameter of FindConnection() as **one specific BioThings object** (ex: one chemical, phenotype, disease, GO term, etc.). To do an EXPLAIN query, the output_obj parameter of FindConnection() must be **another specific BioThings object**. 

**For an EXPLAIN query, BTE will only accept one intermediate node.** The intermediate node cannot be a specific BioThings object. They must be BioThings type(s), None, or an empty list ([]). 

The intermediate node parameter can be a single entity type (string, example: 'BiologicalEntity'), a list of acceptable entity types (list of strings, example: ['Gene', 'AnatomicalEntity']), None, or an empty list. **None or an empty list will tell BTE to look for direct connections (no intermediate nodes).** If the intermediate node parameter is a list, BTE will look for knowledge sources that can handle at least one of the types listed.   

BTE will query from input_obj -> intermediate and **from output_obj -> intermediate** and then join results based on shared intermediates. The results should therefore be read as **input_obj -> intermediate <- output_obj. This directionality in results should be kept in mind when interpreting them.**