# Translator Use Case Question 5: IBD, Imuran, and side effects

## Understanding the question

**To experiment with an executable version of this notebook, [load it in Google Colaboratory](https://colab.research.google.com/github/colleenXu/biothings_explorer/blob/relay/jupyter%20notebooks/TranslatorUseCases/TranslatorUseCase_Q5_IBD_Imuran_SideEffects.ipynb).**

The Translator Use Case Question #5 is:    

> If a patient with disease X is treated off-label with drug Y, what are some potential side effects?

We interpret the Translator Use Case Question to be about **unintended drug-disease interactions**. These occur when a drug has an unintended effect on a person due to the person’s existing medical conditions. 
* The interaction could be beneficial: for example, a person taking [lorazepam](https://www.drugs.com/ppa/lorazepam.html) for nighttime anxiety may also experience drowsiness (a common side effect), which may help them with their insomnia. 
* On the other hand, the interaction could be harmful: for example, a person with peptic ulcers may want to take [ibuprofen](https://www.drugs.com/ppa/ibuprofen.html) for a headache. However, NSAIDS can increase the risk of serious gastrointestinal inflammation, ulceration, bleeding, and perforation, especially for people with a history of GI ulcers or current GI ulcers. Some adverse drug-disease interactions, like the example here, are described in formal drug warnings and contraindications.

We notice that unintended drug-disease interactions occur in more contexts than off-label drug use. They can also occur when the drug is used following its label-indication and in [the context of comorbidity / multimorbidity](https://www.bmj.com/content/350/bmj.h949), when a person is taking a drug to treat one of their diseases and the drug affects a comorbid condition.

**We therefore decided to reframe this question and find potential unintended drug-disease interactions with the following type of question:**
> What symptoms do `Disease` X and `ChemicalSubstance` Y have in common? 

BioThings Explorer (BTE) can answer two classes of queries -- "EXPLAIN" and "PREDICT". This Question fits the EXPLAIN  template of starting with **a specific biomedical entity** (a specific `Disease` X) and finding indirect relationships with **another specific biomedical entity** (a specific `ChemicalSubstance` Y).

## Specific use case: IBD treatment linked to cancer 

We will use **inflammatory bowel disease (IBD)** as our specific disease of interest. We will use **Imuran** as our drug of interest. [Imuran / azathioprine](https://www.uptodate.com/contents/azathioprine-drug-information) has been [commonly used **off-label** to treat Crohn disease (a form of IBD)](https://www.mayoclinic.org/diseases-conditions/crohns-disease/diagnosis-treatment/drc-20353309). 
  
However, the use of Imuran to treat IBD has been linked to hepatosplenic T-cell lymphoma (HSTCL) and other lymphomas (references [1](https://pubmed.ncbi.nlm.nih.gov/20888436/), [2](https://pubmed.ncbi.nlm.nih.gov/21830262/), [3](https://pubmed.ncbi.nlm.nih.gov/23891975/) and [4](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3710465/)). In particular, [hepatosplenic T-cell lymphoma is a rare, aggressive, and usually fatal disease](https://pubmed.ncbi.nlm.nih.gov/29337025/). This serious adverse effect is made clear in the black box warning for [Imuran](https://www.drugs.com/pro/imuran.html). 

We tackle the search for side-effects associated with using Imuran to treat IBD using the query: 
* `Disease` Crohn disease  &rarr; results:`Disease` &larr; `ChemicalSubstance` Imuran. 
    * Note that we use `Disease` as the intermediate node type to represent disease symptoms and drug effects. With the current APIs available and issues around BioThings type annotation (PhenotypicFeatures vs Diseases), it made more sense to use Disease as the intermediate node. 
* The query will return a graph object with entities as nodes and relationships as edges. We then use edge provenance information to **filter** the results. For each intermediate Disease node, we use the number of unique paths from the input nodes to that node to **score** it. The scores can then be used to sort the results.  

<br>

Note 1: This example was inspired by the [information shared by Jeff McKnight], [a molecular biology researcher](http://molbio.uoregon.edu/mcknight/) who is currently fighting hepatosplenic T-cell lymphoma after 16 years of taking Imuran and Remicade to treat Crohn disease. To help, he has a [fundraiser here](https://www.gofundme.com/f/mcknight-fund-help-jeff-buy-time-with-his-family). 

Note 2: many of those with IBD, this drug treatment, and this cancer are [men < 40 years old](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3710465/). Jeff may be in this age group. 

## Step 0: Load BTE modules, notebook functions

In [None]:
## for Google Colab
%%capture
!pip install git+https://github.com/colleenXu/biothings_explorer@relay#egg=biothings_explorer

In [1]:
## CX: allows multiple lines of code to print from one code block
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# import modules from biothings_explorer
from biothings_explorer.hint import Hint
# from biothings_explorer.user_query_dispatcher import FindConnection
from biothings_explorer.query.predict import Predict

## show time that this notebook was executed 
from datetime import datetime

## packages to work with objects 
import re
import requests

## to get around bugs
import nest_asyncio
nest_asyncio.apply()

In [2]:
## functions to add to modules?
def hint_display(query, hint_result):
    """
    show the type, name, number of IDs for all results returned by the query
    
    :param: query: string used in hint query
    :param: hint_result: object returned from hint query, a dictionary of lists of dictionaries
    
    Returns: None
    """
    ## function needs to be rewritten if it's going to give the exact index of each object within its type 
    display = ['type', 'name']  ## replace with the parts of the BioThings object you want to see
    concise_results = []
    for BT_type, result in hint_result.items():
        if result:  ## basically if it's not empty
            for items in result:
                ## number of identifiers per object: number of keys - 4 (name, primary, display, type)
                temp = len(items) - 4
                concise_results.append((items[display[0]], items[display[1]], 
                                         str(temp)))
                    
    print('There are {total} BioThings objects returned for {ht}:'.format(\
                total = len(concise_results), ht = query))
    for display_info in concise_results:
        print('{0}, {1}, num of IDs: {2}'.format(display_info[0], display_info[1], display_info[2]))

In [3]:
def filter_table(df):
    """
    use _source and _method columns to remove rows (paths) from the dataframe
    works with Explain and Predict queries
    :param: pandas dataframe containing results from BTE FindConnection module, in table form
    
    Returns: filtered dataframe
    """
    ## key is the string to match to column, value is a list of strings to match to column values
    filter_out = {'_source': ['SEMMED', 'CTD', 'ctd', 'omia']   
#                   '_method': []  ## currently no method stuff I want to filter out
                 }
    ## SEMMED: text mining results wrong for PhenotypicFeature -> Gene
    ## CTD/ctd: results odd for MSUD -> ChemicalSubstance
    ## omia: results wrong or discontinued gene IDs for PhenotypicFeature -> Gene
    
    
    df_temp = df.copy()  ## so the original df isn't modified in-place
    for key,val in filter_out.items():
        ## find columns that match the key string
        columns = [i for i in df_temp.columns if key in i]
        ## iterate through each column
        for col in columns:
            ## iterate through each value to take out, check if string CONTAINS match. 
            ## only keep rows that don't contain the value
            for i in val:
                df_temp = df_temp[~ df_temp[col].str.contains(i, na = False)]

    return df_temp

In [4]:
def scoring_output(df, q_type):
    """
    score results based on whether query was Predict or Explain type, number of 
        intermediate nodes 
    :param: pandas dataframe containing results from BTE FindConnection module
    :param: string describing type of query (Predict or Explain)
    
    May flatten some edges, because score only counts one edge per 
        unique predicate / API / method (ignoring source and pubmed col)
    
    Predict queries: score each output node by counting # of paths
        from input nodes to it. Normalize by dividing by maximum
        possible # of paths
    Explain two-hop (one intermediate) queries: score each intermediate node by 
        counting # of paths (between input and output nodes) that include it. 
        Normalize by dividing by maximum possible # of paths    

    Explain one-hop (direct) queries: no need to score, prints message
    Other Explain queries (many-hops): currently not able to score, prints message     
    
    Returns: pandas series with scores, index is output_name
             or None (one-hop or many-hop Explain query)
    """
    df_temp = df.copy()  ## so no chance to mutate this   
    flag_direct = False  ## one-hop query or not
    ## use df_col to look quicker into columns
    df_col = set(df_temp.columns)
    
    ## ignore source and pubmed col in looking at unique edges 
    columns_drop = [col for col in df_col if (('_source' in col) or ('_pubmed' in col))]
    df_temp.drop(columns = columns_drop, inplace = True)    
    df_temp.drop_duplicates(inplace = True)
    
    ## check if query is one-hop or not
    if "node1_label" not in df_col:    ## name for first intermediate node layer
        flag_direct = True  
    
    if q_type == 'Explain':
        if flag_direct:   # one hop / no intermediates
            print('No valid node scoring for one-hop (direct) Explain queries.')
            return None
        ## if there are many-hops/intermediate layers
        elif "node2_label" in df_col:  ## name for 2nd intermed. node layer
            print('Cannot currently score many-hop Explain queries.')
            return None
        else:   ## two-hop / 1 intermediate layer
            ## count multi-edges to results (the intermediate node1 col)
            scores = df_temp.node1_label.value_counts() 
            ## to find the maximum-possible number of edges, look at non-result cols
            columns_drop = [col for col in df_col if 'node1' in col]
            df_temp.drop(columns = columns_drop, inplace = True)
            ## now look at number of unique combos for input, edge info, output
            df_temp.drop_duplicates(inplace = True)
            max_paths = df_temp.shape[0]            
            ## normalize scores by dividing each by max number of paths
            scores = scores / max_paths

    else:  ## Predict type query
        ## count multi-edges to results (the output col)
        scores = df_temp.output_label.value_counts()
        ## to find the maximum number of multi-edges, look at non-output col
        columns_drop = [col for col in df_temp.columns if 'output' in col]
        df_temp.drop(columns = columns_drop, inplace = True)
        ## now look at number of unique paths possible
        df_temp.drop_duplicates(inplace = True)
        max_paths = df_temp.shape[0]
        ## normalize scores by dividing each by max number of paths
        scores = scores / max_paths
            
    ## return scores as pandas dataframe, with rank
    scores = scores.to_frame(name = 'score') 
    scores['rank'] = scores['score'].rank(method = 'dense', ascending = False)
    return scores

In [5]:
## record when cell blocks are executed
print('The time that this notebook was executed is...')
print('Local time (PST, West Coast USA): ')
print(datetime.now())
print('UTC time: ')
print(datetime.utcnow())

The time that this notebook was executed is...
Local time (PST, West Coast USA): 
2020-09-28 18:47:59.618219
UTC time: 
2020-09-29 01:47:59.618389


## Step 1: Find representation of "inflammatory bowel disease" in BTE

In this step, BioThings Explorer translates our query string "inflammatory bowel disease"  into BioThings objects, which contain mappings to many common identifiers. We then pick the BioThings object that best matches what we want (inflammatory bowel disease). 

Note: the query failed to retrieve Disease &rarr; PhenotypicFeatures for Crohn disease (a child of IBD in the Mondo ontology).

Generally, the top result returned by the Hint module for your BioThings type of interest will match what you want, but you should confirm that using the identifiers shown. 


> BioThings types correspond to children and descendants of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `Disease` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle"). **However, [only a subset of the Biolink BiologicalEntity children / descendants are currently implemented in BTE](https://smart-api.info/portal/translator/metakg)**. More biomedical object types will be available as more knowledge sources (APIs) are added to the system. **Note that the type `BiologicalEntity` means any BioThings type currently implemented in BTE will be accepted.**

In [6]:
ht = Hint()  ## neater way to call this BTE module

## the human user gives this input
disease_starting_str = "IBD"

disease_hint = ht.query(disease_starting_str)
hint_display(disease_starting_str, disease_hint)

There are 12 BioThings objects returned for IBD:
ChemicalSubstance, IBD-78, num of IDs: 2
ChemicalSubstance, IBD-78, (Z)-, num of IDs: 6
ChemicalSubstance, IBD-78 HYDROCHLORIDE, num of IDs: 2
ChemicalSubstance, IBD-78 ACETATE, num of IDs: 2
ChemicalSubstance, IBD-78, (E)-, num of IDs: 6
Disease, irritable bowel syndrome, num of IDs: 5
Disease, inflammatory bowel disease, num of IDs: 4
Disease, IL21-related infantile inflammatory bowel disease, num of IDs: 4
Disease, immune dysregulation-inflammatory bowel disease-arthritis-recurrent infections syndrome, num of IDs: 3
Disease, multiple intestinal atresia, num of IDs: 6
Pathway, Inflammatory bowel disease (IBD) - Homo sapiens (human), num of IDs: 1
Pathway, Inflammatory bowel disease (IBD) - Mus musculus (mouse), num of IDs: 1


Based on the information above, we'll pick the second `Disease` choice (indexed at 1) for our query because we want inflammatory bowel disease. We can look at identifier mappings inside this BioThings object. 

In [7]:
## the human user makes this choice, gives this input
disease_choice_type = 'Disease'
disease_choice_idx = 1

disease_hint_obj = disease_hint[disease_choice_type][disease_choice_idx]  
disease_hint_obj

{'MONDO': 'MONDO:0005265',
 'DOID': 'DOID:0050589',
 'UMLS': 'C0021390',
 'name': 'inflammatory bowel disease',
 'MESH': 'D015212',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0005265'},
 'display': 'MONDO(MONDO:0005265) DOID(DOID:0050589) UMLS(C0021390) MESH(D015212) name(inflammatory bowel disease)',
 'type': 'Disease'}

Ideally BTE would do this under-the-hood, but that's an integration issue that hasn't been resolved yet. 

In [8]:
vocab_OMOP_map = 'MESH'

## human need to set up this url
## it uses the MESH id from the disease object above
headers = {'content-type': 'application/x-www-form-urlencoded'}

getDisease_OMOP_url = 'http://tr-kp-clinical.ncats.io/api/'\
      'omop/mapToStandardConceptID?'\
      'concept_code={}&vocabulary_id={}'.format(disease_hint_obj[vocab_OMOP_map], vocab_OMOP_map)
getDisease_OMOP_url

getDisease_OMOP_request = requests.get(getDisease_OMOP_url, headers=headers)
getDisease_OMOP_request.status_code

getDisease_OMOP_response = getDisease_OMOP_request.json()
getDisease_OMOP_response

disease_OMOP_id = getDisease_OMOP_response['results'][0]['standard_concept_id']
disease_OMOP_id

'http://tr-kp-clinical.ncats.io/api/omop/mapToStandardConceptID?concept_code=D015212&vocabulary_id=MESH'

200

{'results': [{'source_concept_code': 'D015212',
   'source_concept_id': 45618215,
   'source_concept_name': 'Inflammatory Bowel Diseases',
   'source_vocabulary_id': 'MeSH',
   'standard_concept_code': '24526004',
   'standard_concept_id': 4074815,
   'standard_concept_name': 'Inflammatory bowel disease',
   'standard_domain_id': 'Condition',
   'standard_vocabulary_id': 'SNOMED'}]}

4074815

In [9]:
disease_hint_obj['OMOP'] = str(disease_OMOP_id)

## Step 2: Find representation of "Imuran" in BTE

In this step, BioThings Explorer translates our query string "Imuran"  into BioThings objects, which contain mappings to many common identifiers. We then pick the BioThings object that best matches what we want (the drug, also known as azathioprine). 

In [10]:
## the human user gives this input
drug_starting_str = "Imuran"

drug_hint = ht.query(drug_starting_str)
hint_display(drug_starting_str, drug_hint)

There are 4 BioThings objects returned for Imuran:
ChemicalSubstance, azathioprine, num of IDs: 6
ChemicalSubstance, Imuran, num of IDs: 2
ChemicalSubstance, AZATHIOPRINE SODIUM, num of IDs: 8
ChemicalSubstance, AZATHIOPRINE, num of IDs: 13


All of these `ChemicalSubstance` entries seem to be the right object. We'll pick the entry with the most identifiers (indexed at 3) for our query. We can look at identifier mappings inside this BioThings object. 

In [11]:
## the human user makes this choice, gives this input
drug_choice_type = 'ChemicalSubstance'
drug_choice_idx = 3

drug_hint_obj = drug_hint[drug_choice_type][drug_choice_idx]  
drug_hint_obj

{'CHEMBL.COMPOUND': 'CHEMBL1542',
 'DRUGBANK': 'DB00993',
 'PUBCHEM': 2265,
 'CHEBI': 'CHEBI:2948',
 'UMLS': 'C0004482',
 'MESH': 'D001379',
 'UNII': 'AM94R510MS',
 'INCHIKEY': 'LMEKQMALGUDUQG-UHFFFAOYSA-N',
 'INCHI': 'InChI=1S/C9H7N7O2S/c1-15-4-14-7(16(17)18)9(15)19-8-5-6(11-2-10-5)12-3-13-8/h2-4H,1H3,(H,10,11,12,13)',
 'KEGG': 'C06837',
 'name': 'AZATHIOPRINE',
 'CAS': '446-86-6',
 'IUPAC': '6-[(3-methyl-5-nitro-imidazol-4-yl)thio]-7H-purine',
 'formula': 'C9H7N7O2S',
 'primary': {'identifier': 'CHEBI',
  'cls': 'ChemicalSubstance',
  'value': 'CHEBI:2948'},
 'display': 'CHEBI(CHEBI:2948) CHEMBL.COMPOUND(CHEMBL1542) DRUGBANK(DB00993) PUBCHEM(2265) MESH(D001379) UNII(AM94R510MS) UMLS(C0004482) name(AZATHIOPRINE) CAS(446-86-6) IUPAC(6-[(3-methyl-5-nitro-imidazol-4-yl)thio]-7H-purine) formula(C9H7N7O2S)',
 'type': 'ChemicalSubstance'}

In [12]:
## human need to set up this url
## it uses the MESH id from the disease object above
getDrug_OMOP_url = 'http://tr-kp-clinical.ncats.io/api/'\
      'omop/mapToStandardConceptID?'\
      'concept_code={}&vocabulary_id={}'.format(drug_hint_obj[vocab_OMOP_map], vocab_OMOP_map)
getDrug_OMOP_url

getDrug_request = requests.get(getDrug_OMOP_url, headers=headers)
getDrug_request.status_code

getDrug_OMOP_response = getDrug_request.json()
getDrug_OMOP_response

drug_OMOP_id = getDrug_OMOP_response['results'][0]['standard_concept_id']
drug_OMOP_id

'http://tr-kp-clinical.ncats.io/api/omop/mapToStandardConceptID?concept_code=D001379&vocabulary_id=MESH'

200

{'results': [{'source_concept_code': 'D001379',
   'source_concept_id': 45610378,
   'source_concept_name': 'Azathioprine',
   'source_vocabulary_id': 'MeSH',
   'standard_concept_code': '1256',
   'standard_concept_id': 19014878,
   'standard_concept_name': 'Azathioprine',
   'standard_domain_id': 'Drug',
   'standard_vocabulary_id': 'RxNorm'}]}

19014878

In [13]:
drug_hint_obj['OMOP'] = str(drug_OMOP_id)

## Step 3: IBD &rarr; Disease

BTE performs the **query path planning** and **query path execution** by deconstructing the query into individual API calls, executing those API calls, and then assembling the results.

The code block below takes ~4 seconds to run. 

In [14]:
## the human user gives this input
result_types = ['Disease']

In [15]:
## the human user sets this up
q1 = Predict(input_objs = [disease_hint_obj],\
                     output_types = result_types, \
                    intermediate_nodes = [], config = {})
q1.connect(verbose = True)


Your query have 1 input nodes, including inflammatory bowel disease .... And BTE will find paths that connect your input nodes to your output types ['Disease']. Paths will contain 0 intermediate nodes.



==== Step #1: Query Path Planning ====

Input Types: Disease
Output Types: ['Disease']
Predicates: None

BTE found 5 APIs based on SmartAPI Meta-KG.

API 1. COHD API (1 API calls)
API 2. Ontology Lookup Service API (1 API calls)
API 3. mydisease.info API (2 API calls)
API 4. SEMMED Disease API (17 API calls)
API 5. CORD Disease API (1 API calls)


==== Step #2: Query path execution ====

NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 3.1: http://mydisease.info/v1/query?fields=mondo.descendants (POST -d q=MONDO:0005265&scopes=mondo.mondo)
API 3.1 mydisease.info API: 0 hits
API 3.2: http://mydisease.info/v1/query?fields=mondo.parents (POST -d q=MONDO:0005265&scopes=mondo.mondo)
API 3.2 mydisease.info API: 2 hits
API 4.1: https://

In [16]:
q1_r_paths_table = q1.display_table_view()

# q1 = None  ## clear memory

In [17]:
q1_r_paths_table.columns

Index(['input_id', 'input_label', 'input_type', 'pred1', 'pred1_source',
       'pred1_api', 'pred1_publications', 'output_id', 'output_label',
       'output_type', 'output_degree'],
      dtype='object')

In [18]:
cols_rename = ['input_id', 'input_label', 'input_type', 'pred1', 'pred1_source', 'pred1_api',\
               'pred1_publications', 'node1_id', 'node1_label', 'node1_type', 'node1_degree']
q1_r_paths_table.columns = cols_rename

## Step 4: Imuran &rarr; Disease

BTE performs the **query path planning** and **query path execution** by deconstructing the query into individual API calls, executing those API calls, and then assembling the results.

The code block below takes ~4 seconds to run. 

In [19]:
## the human user gives this input
q2 = Predict(input_objs = [drug_hint_obj],\
                     output_types = result_types, \
                    intermediate_nodes = [], config = {})
q2.connect(verbose = True)


Your query have 1 input nodes, including AZATHIOPRINE .... And BTE will find paths that connect your input nodes to your output types ['Disease']. Paths will contain 0 intermediate nodes.



==== Step #1: Query Path Planning ====

Input Types: ChemicalSubstance
Output Types: ['Disease']
Predicates: None

BTE found 9 APIs based on SmartAPI Meta-KG.

API 1. COHD API (1 API calls)
API 2. Automat CORD19 Scibite API (2 API calls)
API 3. mydisease.info API (1 API calls)
API 4. Automat CORD19 Scigraph API (2 API calls)
API 5. SEMMED Chemical API (14 API calls)
API 6. Automat PHAROS API (2 API calls)
API 7. MyChem.info API (2 API calls)
API 8. CORD Chemical API (1 API calls)
API 9. Automat HMDB API (2 API calls)


==== Step #2: Query path execution ====

NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 3.1: http://mydisease.info/v1/query?fields=disgenet.xrefs.mesh&size=250 (POST -d q=D001379&scopes=ctd.chemical_related_to_disease.mesh_che

In [20]:
q2_r_paths_table = q2.display_table_view()

In [21]:
q2_r_paths_table.columns

Index(['input_id', 'input_label', 'input_type', 'pred1', 'pred1_source',
       'pred1_api', 'pred1_publications', 'output_id', 'output_label',
       'output_type', 'output_degree'],
      dtype='object')

In [22]:
cols_rename = ['output_id', 'output_label', 'output_type', 'pred2', 'pred2_source', 'pred2_api',\
               'pred2_publications', 'node1_id', 'node1_label', 'node1_type', 'node1_degree']
q2_r_paths_table.columns = cols_rename

## Merging

In [23]:
final_table = q1_r_paths_table.merge(q2_r_paths_table)

In [24]:
## show number of unique intermediate nodes
print("There are {0} unique {1}s.".format( \
    final_table.node1_label.nunique(), "".join(result_types)))

## show number of paths from IBD to Imuran
print("There are {0} unique paths.".format( \
    final_table.shape[0]))

There are 540 unique Diseases.
There are 1510 unique paths.


## Filtering

Filtering involves using edge provenance, like the source this relationship came from and the method used to make this association, to filter out edges (removing nodes in the process). 

In [25]:
final_table = filter_table(final_table)

After filtering, there are fewer results in the answer knowledge graph. 

In [26]:
## show number of unique intermediate nodes
print("There are {0} unique {1}s.".format( \
    final_table.node1_label.nunique(), "".join(result_types)))

## show number of paths from IBD to Imuran
print("There are {0} unique paths.".format( \
    final_table.shape[0]))

There are 259 unique Diseases.
There are 291 unique paths.


## Getting OMOP map -> MONDO

We have some identifiers for intermediate nodes (our results) that are OMOP ids. We need to map them back to BTE's id-space. This takes about 2-3 min to run. 

In [27]:
stuff_to_map = [i for i in final_table['node1_id'].unique() if 'OMOP:' in i]
len(stuff_to_map)  

199

In [28]:
## not sure if BTE is using the same names
MONDO_translate = {}
name_translate = {}

for i in stuff_to_map:
    OMOP = i[5:]
    
    url = 'http://tr-kp-clinical.ncats.io/api/'\
              'omop/xrefFromOMOP?'\
              'concept_id={}&mapping_targets=MONDO&distance=2'.format(OMOP)
    res = requests.get(url, headers=headers)
    if res.status_code == 200:
        response = res.json()['results']
        if len(response) == 0:
            print("not resolved")
            MONDO_translate['OMOP:'+OMOP] = None  ## have to add the prefix back
            name_translate['OMOP:'+OMOP] = None
        else:
            ## for now, just take the top result
            temp_id = response[0]['target_curie']
            temp_name = response[0]['target_label']
            print("YAY it's {}".format(temp_id))
            MONDO_translate['OMOP:'+OMOP] = temp_id
            name_translate['OMOP:'+OMOP] = temp_name
    else:
        print("not resolved")
        MONDO_translate['OMOP:'+OMOP] = None
        name_translate['OMOP:'+OMOP] = None

YAY it's MONDO:0005101
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
YAY it's MONDO:0005292
YAY it's MONDO:0043839
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
YAY it's MONDO:0003409
YAY it's MONDO:0024634
not resolved
YAY it's MONDO:0000707
YAY it's MONDO:0005020
YAY it's MONDO:0016542
not resolved
not resolved
not resolved
not resolved
YAY it's MONDO:0005532
not resolved
YAY it's MONDO:0004335
not resolved
YAY it's MONDO:0000706
not resolved
YAY it's MONDO:0000710
not resolved
YAY it's MONDO:0004335
not resolved
not resolved
YAY it's MONDO:0018305
not resolved
not resolved
not resolved
YAY it's MONDO:0021166
not resolved
YAY it's MONDO:0024635
not resolved
not resolved
not resolved
YAY it's MONDO:0044965
not resolved
not resolved
YAY it's MONDO:0043579
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
not resolved
n

In [29]:
final_table['node1_id'] = final_table['node1_id'].replace(to_replace = MONDO_translate)
final_table['node1_label'] = final_table['node1_label'].replace(to_replace = name_translate)

Remove results that weren't resolved to MONDO

In [30]:
final_table = final_table.dropna(subset = ['node1_label'])

In [31]:
## show number of unique intermediate nodes
print("There are {0} unique {1}s.".format( \
    final_table.node1_label.nunique(), "".join(result_types)))

## show number of paths from IBD to Imuran
print("There are {0} unique paths.".format( \
    final_table.shape[0]))

There are 98 unique Diseases.
There are 146 unique paths.


## Scoring

In [32]:
final_scoring = scoring_output(final_table, 'Explain')

Different knowledge sources (APIs) were called in different parts of the query. 

In the first part of the query (IBD &rarr; Disease), the following APIs returned results and the following predicates (semantic relationships) were found.

In [33]:
## show that the APIs use different predicates
final_table[['pred1_api', 'pred1']].drop_duplicates().sort_values(by = ['pred1_api', 'pred1'])

Unnamed: 0,pred1_api,pred1
0,COHD API,correlated_with
37,CORD Disease API,related_to
305,Ontology Lookup Service API,has_subclass
339,mydisease.info API,subclass_of


In the second part of the query (Imuran &rarr; Disease), the following APIs returned results and the following predicates (semantic relationships) were found.

In [34]:
## show that the APIs use different predicates
final_table[['pred2_api', 'pred2']].drop_duplicates().sort_values(by = ['pred2_api', 'pred2'])

Unnamed: 0,pred2_api,pred2
218,Automat CORD19 Scibite API,related_to
36,Automat CORD19 Scigraph API,related_to
0,COHD API,correlated_with
600,CORD Chemical API,related_to
673,MyChem.info API,contraindication
395,MyChem.info API,treats


Still in progress: adding the scores/ranks and provenance to the Reasoner Standard (TRAPI) object that will be returned to the ARS. 
* likely provenance will include score's range, method, what is a good score (larger or smaller numbers)

## Evaluate results 

**BTE's top results (score above 0.2)*** include (1) diseases that are treated with the drug Imuran and are related to IBD, (2) **potential adverse side effects** of using Imuran to treat IBD, and (3) **potential beneficial side effects** of using Imuran to treat IBD. **BTE successfully identifies the connections between Imuran, IBD, and lymphoma that we described above.** 

*Note that the actual BTE results (diseases identified and their scores) vary because BTE dynamically generates its knowledge graph from current API results. 

#1: diseases that are treated with the drug Imuran and are related to IBD
* [Crohn disease, ulcerative colitis, IBD](https://www.mayoclinic.org/diseases-conditions/crohns-disease/diagnosis-treatment/drc-20353309), [IBD1](https://beta.monarchinitiative.org/disease/MONDO:0009960): all forms of inflammatory bowel disease
* [Disseminated / systemic lupus erythematosus aka SLE](https://monarchinitiative.org/disease/MONDO:0007915): [drug-induced lupus has occurred in patients with IBD](https://www.sciencedirect.com/science/article/pii/S1873994612001328), and [Imuran has been used to treat SLE](https://www.hopkinslupus.org/lupus-treatment/lupus-medications/immunosuppressive-medications/)

#2: **potential adverse side effects of using Imuran to treat IBD**
* [Lymphoid cancer aka lymphoma](https://monarchinitiative.org/disease/MONDO:0005062): as mentioned above, with references [1](https://pubmed.ncbi.nlm.nih.gov/20888436/), [2](https://pubmed.ncbi.nlm.nih.gov/21830262/), [3](https://pubmed.ncbi.nlm.nih.gov/23891975/) and [4](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3710465/)
* [Colitis aka inflammation of the colon](https://www.drugs.com/cg/colitis.html): this can [occur in IBD](https://www.mayoclinic.org/diseases-conditions/inflammatory-bowel-disease/symptoms-causes/syc-20353315) and shares symptoms (nausea, diarrhea, fever, malaise) with [adverse GI side effects of Imuran treatment](https://www.drugs.com/pro/imuran.html#s-34084-4). 
* Liver disease: [people with IBD can experience liver damage](https://www.crohnscolitisfoundation.org/what-is-ibd/extraintestinal-complications-ibd) and [liver disease affects 3-5% of people with IBD](https://www.merckmanuals.com/professional/gastrointestinal-disorders/inflammatory-bowel-disease-ibd/overview-of-inflammatory-bowel-disease#v894353). [Liver injury is an adverse side effect of Imuran](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5505863/), and [a recent case report described liver cirrhosis as a potential rare adverse event when using Imuran to treat IBD](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7008285/).     

#3: **potential beneficial side effects of using Imuran to treat IBD**
* Rheumatic arthritis or polyarthritis: [arthritis is the most common non-GI complication of IBD](https://www.crohnscolitisfoundation.org/what-is-ibd/extraintestinal-complications-ibd), [although it is not rheumatic arthritis (perhaps a misannotation?)](http://europepmc.org/article/PMC/3424429). Imuran is [indicated for the treatment of active rheumatic arthritis](https://www.drugs.com/pro/imuran.html#s-34067-9) to reduce symptoms. Perhaps Imuran treatment would also reduce the arthritis symptoms in patients with IBD. 

In [35]:
final_scoring.head(10)

Unnamed: 0,score,rank
liver disease,0.088235,1.0
Crohn disease,0.058824,2.0
diarrheal disease,0.058824,2.0
rheumatoid arthritis,0.044118,3.0
lymphoma,0.044118,3.0
systemic lupus erythematosus (disease),0.044118,3.0
colitis (disease),0.044118,3.0
systemic sclerosis,0.029412,4.0
digestive system disease,0.029412,4.0
chronic kidney disease,0.029412,4.0


The table below showing the relationships between IBD, Imuran, and liver disease shows how multiple API queries were chained together, including results from the Clinical Data Provider (COHD API). 

In [36]:
final_table[final_table['node1_label'] == 'liver disease']

Unnamed: 0,input_id,input_label,input_type,pred1,pred1_source,pred1_api,pred1_publications,node1_id,node1_label,node1_type,node1_degree,output_id,output_label,output_type,pred2,pred2_source,pred2_api,pred2_publications
217,MONDO:0005265,inflammatory bowel disease,Disease,correlated_with,,COHD API,,MONDO:0005154,liver disease,Disease,,CHEBI:2948,AZATHIOPRINE,ChemicalSubstance,correlated_with,,COHD API,
218,MONDO:0005265,inflammatory bowel disease,Disease,correlated_with,,COHD API,,MONDO:0005154,liver disease,Disease,,CHEBI:2948,AZATHIOPRINE,ChemicalSubstance,related_to,scibite,Automat CORD19 Scibite API,
219,MONDO:0005265,inflammatory bowel disease,Disease,correlated_with,,COHD API,,MONDO:0005154,liver disease,Disease,,CHEBI:2948,AZATHIOPRINE,ChemicalSubstance,related_to,scigraph,Automat CORD19 Scigraph API,
235,MONDO:0005265,inflammatory bowel disease,Disease,related_to,Translator Text Mining Provider,CORD Disease API,PMC:6835877,MONDO:0005154,liver disease,Disease,,CHEBI:2948,AZATHIOPRINE,ChemicalSubstance,correlated_with,,COHD API,
236,MONDO:0005265,inflammatory bowel disease,Disease,related_to,Translator Text Mining Provider,CORD Disease API,PMC:6835877,MONDO:0005154,liver disease,Disease,,CHEBI:2948,AZATHIOPRINE,ChemicalSubstance,related_to,scibite,Automat CORD19 Scibite API,
237,MONDO:0005265,inflammatory bowel disease,Disease,related_to,Translator Text Mining Provider,CORD Disease API,PMC:6835877,MONDO:0005154,liver disease,Disease,,CHEBI:2948,AZATHIOPRINE,ChemicalSubstance,related_to,scigraph,Automat CORD19 Scigraph API,
