# Exploring COVID, hereditary angioedema, drugs

2020-11-26 update: Hint module results changed so the pick selection is changed. 

## Intro

**To experiment with an executable version of this notebook, [load it in Google Colaboratory](https://colab.research.google.com/github/colleenXu/biothings_explorer/blob/relay/jupyter%20notebooks/CX_WIPs/TranslatorUseCase_COVIDproxies_HAdrugs.ipynb).**

BioThings Explorer (BTE) can answer two classes of queries -- "EXPLAIN" and "PREDICT". This Question fits the EXPLAIN  template of starting with **a specific biomedical entity** (a specific `Disease` X) and finding indirect relationships with **another specific biomedical entity** (a specific `ChemicalSubstance` Y).

## Step 0: Load BTE modules, notebook functions

In [None]:
## for Google Colab
%%capture
!pip install git+https://github.com/colleenXu/biothings_explorer@relay#egg=biothings_explorer

In [1]:
## CX: allows multiple lines of code to print from one code block
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# import modules from biothings_explorer
from biothings_explorer.hint import Hint
from biothings_explorer.user_query_dispatcher import FindConnection

## show time that this notebook was executed 
from datetime import datetime

## packages to work with objects 
import re

## to get around bugs
import nest_asyncio
nest_asyncio.apply()

In [2]:
## functions to add to modules?
def hint_display(query, hint_result):
    """
    show the type, name, number of IDs for all results returned by the query
    
    :param: query: string used in hint query
    :param: hint_result: object returned from hint query, a dictionary of lists of dictionaries
    
    Returns: None
    """
    ## function needs to be rewritten if it's going to give the exact index of each object within its type 
    display = ['type', 'name']  ## replace with the parts of the BioThings object you want to see
    concise_results = []
    for BT_type, result in hint_result.items():
        if result:  ## basically if it's not empty
            for items in result:
                ## number of identifiers per object: number of keys - 4 (name, primary, display, type)
                temp = len(items) - 4
                concise_results.append((items[display[0]], items[display[1]], 
                                         str(temp)))
                    
    print('There are {total} BioThings objects returned for {ht}:'.format(\
                total = len(concise_results), ht = query))
    for display_info in concise_results:
        print('{0}, {1}, num of IDs: {2}'.format(display_info[0], display_info[1], display_info[2]))

In [3]:
def filter_table(df):
    """
    use _source and _method columns to remove rows (paths) from the dataframe
    :param: pandas dataframe containing results from BTE FindConnection module, in table form
    
    Returns: filtered dataframe
    """
    ## note: still needs checking with EXPLAIN queries
    ## key is the string to match to column, value is a list of strings to match to column values
    filter_out = {'_source': ['SEMMED', 'CTD', 'ctd', 'omia']   
#                   '_method': []  ## currently no method stuff I want to filter out
                 }
    ## SEMMED: text mining results wrong for PhenotypicFeature -> Gene
    ## CTD/ctd: results odd for MSUD -> ChemicalSubstance
    ## omia: results wrong or discontinued gene IDs for PhenotypicFeature -> Gene
    
    
    df_temp = df.copy()  ## so the original df isn't modified in-place
    for key,val in filter_out.items():
        ## find columns that match the key string
        columns = [i for i in df_temp.columns if key in i]
        ## iterate through each column
        for col in columns:
            ## iterate through each value to take out, check if string CONTAINS match. 
            ## only keep rows that don't contain the value
            for i in val:
                df_temp = df_temp[~ df_temp[col].str.contains(i, na = False)]
    return df_temp

In [4]:
def scoring_output(df, q_type):
    """
    score results based on whether query was Predict or Explain type, number of 
        intermediate nodes 
    :param: pandas dataframe containing results from BTE FindConnection module
    :param: string describing type of query (Predict or Explain)
    
    May flatten some edges, because score only counts one edge per 
        unique predicate / API / method (ignoring source and pubmed col)
    
    Predict queries: score each output node by counting # of paths
        from input nodes to it. Normalize by dividing by maximum
        possible # of paths
    Explain two-hop (one intermediate) queries: score each intermediate node by 
        counting # of paths (between input and output nodes) that include it. 
        Normalize by dividing by maximum possible # of paths    

    Explain one-hop (direct) queries: no need to score, prints message
    Other Explain queries (many-hops): currently not able to score, prints message     
    
    Returns: pandas series with scores, index is output_name
             or None (one-hop or many-hop Explain query)
    """
    df_temp = df.copy()  ## so no chance to mutate this   
    flag_direct = False  ## one-hop query or not
    ## use df_col to look quicker into columns
    df_col = set(df_temp.columns)
    
    ## ignore source and pubmed col in looking at unique edges 
    columns_drop = [col for col in df_col if (('_source' in col) or ('_pubmed' in col))]
    df_temp.drop(columns = columns_drop, inplace = True)    
    df_temp.drop_duplicates(inplace = True)
    
    ## check if query is one-hop or not
    if "node1_name" not in df_col:    ## name for first intermediate node layer
        flag_direct = True  
    
    if q_type == 'Explain':
        if flag_direct:   # one hop / no intermediates
            print('No valid node scoring for one-hop (direct) Explain queries.')
            return None
        ## if there are many-hops/intermediate layers
        elif "node2_name" in df_col:  ## name for 2nd intermed. node layer
            print('Cannot currently score many-hop Explain queries.')
            return None
        else:   ## two-hop / 1 intermediate layer
            ## count multi-edges to results (the intermediate node1 col)
            scores = df_temp.node1_name.value_counts() 
            ## to find the maximum-possible number of edges, look at non-result cols
            columns_drop = [col for col in df_col if 'node1' in col]
            df_temp.drop(columns = columns_drop, inplace = True)
            ## now look at number of unique combos for input, edge info, output
            df_temp.drop_duplicates(inplace = True)
            max_paths = df_temp.shape[0]            
            ## normalize scores by dividing each by max number of paths
            scores = scores / max_paths

    else:  ## Predict type query
        ## count multi-edges to results (the output col)
        scores = df_temp.output_name.value_counts()
        ## to find the maximum number of multi-edges, look at non-output col
        columns_drop = [col for col in df_temp.columns if 'output' in col]
        df_temp.drop(columns = columns_drop, inplace = True)
        ## now look at number of unique paths possible
        df_temp.drop_duplicates(inplace = True)
        max_paths = df_temp.shape[0]
        ## normalize scores by dividing each by max number of paths
        scores = scores / max_paths
            
    ## return scores as pandas dataframe, with rank
    scores = scores.to_frame(name = 'score') 
    scores['rank'] = scores['score'].rank(method = 'dense', ascending = False)
    return scores

In [5]:
## record when cell blocks are executed
print('The time that this notebook was executed is...')
print('Local time (PST, West Coast USA): ')
print(datetime.now())
print('UTC time: ')
print(datetime.utcnow())

The time that this notebook was executed is...
Local time (PST, West Coast USA): 
2020-11-26 17:51:47.480246
UTC time: 
2020-11-27 01:51:47.480470


## Step 1: Find representation of "SARS" in BTE

In this step, BioThings Explorer translates our query string "hypertension"  into BioThings objects, which contain mappings to many common identifiers. We then pick the BioThings object that best matches what we want (the rare disease). 

Generally, the top result returned by the Hint module for your BioThings type of interest will match what you want, but you should confirm that using the identifiers shown. 


> BioThings types correspond to children and descendants of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `Disease` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle"). **However, [only a subset of the Biolink BiologicalEntity children / descendants are currently implemented in BTE](https://smart-api.info/portal/translator/metakg)**. More biomedical object types will be available as more knowledge sources (APIs) are added to the system. **Note that the type `BiologicalEntity` means any BioThings type currently implemented in BTE will be accepted.**

In [6]:
ht = Hint()  ## neater way to call this BTE module

## the human user gives this input
diseaseOrPheno_starting_str = "SARS"

diseaseOrPheno_hint = ht.query(diseaseOrPheno_starting_str)

hint_display(diseaseOrPheno_starting_str, diseaseOrPheno_hint)

There are 10 BioThings objects returned for SARS:
ChemicalSubstance, Anti-SARS-CoV-2 REGN-COV2, num of IDs: 1
Disease, severe acute respiratory syndrome, num of IDs: 5
Disease, COVID-19, num of IDs: 2
Disease, COVID-19–associated multisystem inflammatory syndrome in children, num of IDs: 2
Disease, IMD74, num of IDs: 0
MolecularActivity, selenocysteine-tRNA ligase activity, num of IDs: 2
MolecularActivity, mRNA (guanine-N7-)-methyltransferase activity, num of IDs: 3
MolecularActivity, 5'-3' RNA helicase activity, num of IDs: 2
MolecularActivity, mRNA (nucleoside-2'-O-)-methyltransferase activity, num of IDs: 3
MolecularActivity, RNA-directed 5'-3' RNA polymerase activity, num of IDs: 3


Based on the information above, we'll pick the top `PhenotypicFeature` choice (indexed at 0) for our query. We can look at identifier mappings inside this BioThings object. 

Note that the query didn't work when picking the top `Disease` choice (essential hypertension). 

In [7]:
## the human user makes this choice, gives this input
diseaseOrPheno_choice_type = 'Disease'
diseaseOrPheno_choice_idx = 0

diseaseOrPheno_hint_obj = diseaseOrPheno_hint[diseaseOrPheno_choice_type][diseaseOrPheno_choice_idx]  
diseaseOrPheno_hint_obj

{'MONDO': 'MONDO:0005091',
 'DOID': 'DOID:2945',
 'UMLS': 'C1175175',
 'name': 'severe acute respiratory syndrome',
 'MESH': 'D045169',
 'ORPHANET': '140896',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0005091'},
 'display': 'MONDO(MONDO:0005091) DOID(DOID:2945) ORPHANET(140896) UMLS(C1175175) MESH(D045169) name(severe acute respiratory syndrome)',
 'type': 'Disease'}

## Step 2: Find representation of "hereditary angioedema" in BTE

In this step, BioThings Explorer translates our query string "hypertension"  into BioThings objects, which contain mappings to many common identifiers. We then pick the BioThings object that best matches what we want (the rare disease). 

Generally, the top result returned by the Hint module for your BioThings type of interest will match what you want, but you should confirm that using the identifiers shown. 


> BioThings types correspond to children and descendants of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `Disease` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle"). **However, [only a subset of the Biolink BiologicalEntity children / descendants are currently implemented in BTE](https://smart-api.info/portal/translator/metakg)**. More biomedical object types will be available as more knowledge sources (APIs) are added to the system. **Note that the type `BiologicalEntity` means any BioThings type currently implemented in BTE will be accepted.**

In [8]:
ht = Hint()  ## neater way to call this BTE module

## the human user gives this input
diseaseOrPheno2_starting_str = "hereditary angioedema"

diseaseOrPheno2_hint = ht.query(diseaseOrPheno2_starting_str)

hint_display(diseaseOrPheno2_starting_str, diseaseOrPheno2_hint)

There are 5 BioThings objects returned for hereditary angioedema:
Disease, hereditary angioedema, num of IDs: 5
Disease, hereditary angioedema with C1Inh deficiency, num of IDs: 3
Disease, hereditary angioedema type 3, num of IDs: 5
Disease, hereditary angioedema type 2, num of IDs: 3
Disease, hereditary angioedema type 1, num of IDs: 3


Based on the information above, we'll pick the top `Disease` choice (indexed at 0) for our query. We can look at identifier mappings inside this BioThings object. 

In [9]:
## the human user makes this choice, gives this input
diseaseOrPheno2_choice_type = 'Disease'
diseaseOrPheno2_choice_idx = 0

diseaseOrPheno2_hint_obj = diseaseOrPheno2_hint[diseaseOrPheno2_choice_type][diseaseOrPheno2_choice_idx]  
diseaseOrPheno2_hint_obj

{'MONDO': 'MONDO:0019623',
 'DOID': 'DOID:14735',
 'UMLS': 'C0019243',
 'name': 'hereditary angioedema',
 'MESH': 'D054179',
 'ORPHANET': '91378',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0019623'},
 'display': 'MONDO(MONDO:0019623) DOID(DOID:14735) ORPHANET(91378) UMLS(C0019243) MESH(D054179) name(hereditary angioedema)',
 'type': 'Disease'}

## Step 2: Find representation of icatibant in BTE

In this step, BioThings Explorer translates our query string into BioThings objects, which contain mappings to many common identifiers. We then pick the BioThings object that best matches what we want. 

Generally, the top result returned by the Hint module for your BioThings type of interest will match what you want, but you should confirm that using the identifiers shown. 


> BioThings types correspond to children and descendants of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `Disease` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle"). **However, [only a subset of the Biolink BiologicalEntity children / descendants are currently implemented in BTE](https://smart-api.info/portal/translator/metakg)**. More biomedical object types will be available as more knowledge sources (APIs) are added to the system. **Note that the type `BiologicalEntity` means any BioThings type currently implemented in BTE will be accepted.**

Drugs for drugs used to treat hereditary angioedema. Note that this question is not working with drug as ecallantide, Cinryze, Berinert, Ruconest, lanadelumab: BTE can find these objects (Hint module), but the specific drug -> disease is not finding anything. Issue with datasources or with identifier mapping inside hint module (not enough IDs)? 

In [10]:
## the human user gives this input
drug_starting_str = "icatibant"

drug_hint = ht.query(drug_starting_str)
hint_display(drug_starting_str, drug_hint)

There are 3 BioThings objects returned for icatibant:
ChemicalSubstance, ICATIBANT, num of IDs: 12
ChemicalSubstance, ICATIBANT ACETATE, num of IDs: 8
ChemicalSubstance, icatibant, num of IDs: 2


2020-11-26 update: Hint module results changed so the pick selection is changed. All of these `ChemicalSubstance` entries seem to be the right object. We'll pick the `ChemicalSubstance` choice with the most identifiers (the  first option, indexed at 0) for our query. We can look at identifier mappings inside this BioThings object. 

In [11]:
## the human user makes this choice, gives this input
drug_choice_type = 'ChemicalSubstance'
drug_choice_idx = 0  ## icatibant: set as 2,3,4 works

drug_hint_obj = drug_hint[drug_choice_type][drug_choice_idx]  
drug_hint_obj

{'CHEMBL.COMPOUND': 'CHEMBL2028850',
 'DRUGBANK': 'DB06196',
 'PUBCHEM': 6918173,
 'CHEBI': 'CHEBI:68556',
 'UMLS': 'C0246269',
 'MESH': 'C065679',
 'UNII': '325O8467XK',
 'INCHIKEY': 'QURWXBZNHXJZBE-SKXRKSCCSA-N',
 'INCHI': 'InChI=1S/C59H89N19O13S/c60-37(14-5-19-67-57(61)62)48(82)72-38(15-6-20-68-58(63)64)52(86)75-22-8-18-43(75)54(88)77-30-35(80)26-44(77)50(84)70-28-47(81)71-40(27-36-13-9-23-92-36)49(83)74-41(31-79)53(87)76-29-34-12-2-1-10-32(34)24-46(76)55(89)78-42-17-4-3-11-33(42)25-45(78)51(85)73-39(56(90)91)16-7-21-69-59(65)66/h1-2,9-10,12-13,23,33,35,37-46,79-80H,3-8,11,14-22,24-31,60H2,(H,70,84)(H,71,81)(H,72,82)(H,73,85)(H,74,83)(H,90,91)(H4,61,62,67)(H4,63,64,68)(H4,65,66,69)/t33-,35+,37+,38-,39-,40-,41-,42-,43-,44-,45-,46+/m0/s1',
 'name': 'ICATIBANT',
 'CAS': '130308-48-4',
 'IUPAC': '(2S)-2-[[(2S,3aS,7aS)-1-[(3R)-2-[(2S)-2-[[(2S)-2-[[2-[[(2S,4R)-1-[(2S)-1-[(2S)-2-[[(2R)-2-amino-5-guanidino-pentanoyl]amino]-5-guanidino-pentanoyl]prolyl]-4-hydroxy-prolyl]amino]acetyl]amino]-3

## Try 1: SARS &rarr; Disease &larr; icantinib

BTE performs the **query path planning** and **query path execution** by deconstructing the query into individual API calls, executing those API calls, and then assembling the results.

The code block below takes ~2 seconds to run. 

Note that this question is not working with drug as ecallantide, Cinryze, Berinert, Ruconest, lanadelumab: BTE can find these objects (Hint module), but the specific drug -> disease is not finding anything. Issue with datasources or with identifier mapping inside hint module (not enough IDs)? 

In [12]:
## the human user gives this input
q1_intermediate = 'Disease'

q1 = FindConnection(input_obj = diseaseOrPheno_hint_obj,\
                    output_obj = drug_hint_obj, \
                    intermediate_nodes = q1_intermediate)
q1.connect(verbose = True)


BTE will find paths that join 'severe acute respiratory syndrome' and 'ICATIBANT'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Disease



==== Step #1: Query path planning ====

Because severe acute respiratory syndrome is of type 'Disease', BTE will query our meta-KG for APIs that can take 'Disease' as input and 'Disease' as output

BTE found 3 apis:

API 1. semmed_disease(17 API calls)
API 2. cord_disease(1 API call)
API 3. ontology_lookup_service(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 1.5: https://biothings.ncats.io/semmed/query?fields=caused_by (POST -d q=C1175175&scopes=umls)
API 2.1: https://biothings.ncats.io/cord_disease/query?fields=associated_with (POST -d q=DOID:2945&scopes=doid)
API 1.2: https://biothings.ncats.io/semmed/query?fields=causes (POST -d q=C1175175&scopes=umls)
API 1.1: https://biothings.ncats.io

In [13]:
q1_r_paths_table = q1.display_table_view()

q1_type = re.findall("dispatcher.([a-zA-Z]+)'", str(type(q1.fc)))
q1_type = "".join(q1_type)  ## convert to string

q1 = None  ## clear memory

In [14]:
## show number of unique intermediate nodes
print("There are {0} unique {1}s linked to both {2} and {3}.".format( \
    q1_r_paths_table.node1_name.nunique(), q1_intermediate, diseaseOrPheno_starting_str, drug_starting_str))

## show number of paths from hypertension to cymbalta
print("There are {0} unique paths between {1} and {2}.".format( \
    q1_r_paths_table.shape[0], diseaseOrPheno_starting_str, drug_starting_str))

There are 3 unique Diseases linked to both SARS and icatibant.
There are 6 unique paths between SARS and icatibant.


In [15]:
q1_r_paths_table['node1_name'].unique()

array(['BLOOD CLOT', 'HYPOTENSION', 'CONDITION'], dtype=object)

## Try 2: SARS &rarr; Disease &larr; hereditary angioedema

BTE performs the **query path planning** and **query path execution** by deconstructing the query into individual API calls, executing those API calls, and then assembling the results.

The code block below takes ~2 seconds to run. 

Note that this question is not working with intermediate as PhenotypicFeature. 

In [16]:
## the human user gives this input
q2_intermediate = 'Disease'

q2 = FindConnection(input_obj = diseaseOrPheno_hint_obj,\
                    output_obj = diseaseOrPheno2_hint_obj, \
                    intermediate_nodes = q2_intermediate)
q2.connect(verbose = True)


BTE will find paths that join 'severe acute respiratory syndrome' and 'hereditary angioedema'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Disease



==== Step #1: Query path planning ====

Because severe acute respiratory syndrome is of type 'Disease', BTE will query our meta-KG for APIs that can take 'Disease' as input and 'Disease' as output

BTE found 3 apis:

API 1. semmed_disease(17 API calls)
API 2. cord_disease(1 API call)
API 3. ontology_lookup_service(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 1.17: https://biothings.ncats.io/semmed/query?fields=treats (POST -d q=C1175175&scopes=umls)
API 1.11: https://biothings.ncats.io/semmed/query?fields=negatively_regulated_by (POST -d q=C1175175&scopes=umls)
API 1.15: https://biothings.ncats.io/semmed/query?fields=negatively_regulates (POST -d q=C1175175&scopes=umls)
API 1.9:

In [17]:
q2_r_paths_table = q2.display_table_view()

q2_type = re.findall("dispatcher.([a-zA-Z]+)'", str(type(q2.fc)))
q2_type = "".join(q2_type)  ## convert to string

q2 = None  ## clear memory

In [18]:
## show number of unique intermediate nodes
print("There are {0} unique {1}s linked to both {2} and {3}.".format( \
    q2_r_paths_table.node1_name.nunique(), q2_intermediate, diseaseOrPheno_starting_str, diseaseOrPheno2_starting_str))

## show number of paths from hypertension to cymbalta
print("There are {0} unique paths between {1} and {2}.".format( \
    q2_r_paths_table.shape[0], diseaseOrPheno_starting_str, diseaseOrPheno2_starting_str))

There are 24 unique Diseases linked to both SARS and hereditary angioedema.
There are 65 unique paths between SARS and hereditary angioedema.


## Try 3: SARS &rarr; ChemicalSubstance &larr; hereditary angioedema

BTE performs the **query path planning** and **query path execution** by deconstructing the query into individual API calls, executing those API calls, and then assembling the results.

The code block below takes ~4 seconds to run. 

Note that this question is not working with drug as ecallantide, Cinryze, Berinert, Ruconest, lanadelumab: BTE can find these objects (Hint module), but the specific drug -> disease is not finding anything. Issue with datasources or with identifier mapping inside hint module (not enough IDs)? 

In [19]:
## the human user gives this input
q3_intermediate = 'ChemicalSubstance'

q3 = FindConnection(input_obj = diseaseOrPheno_hint_obj,\
                    output_obj = diseaseOrPheno2_hint_obj, \
                    intermediate_nodes = q3_intermediate)
q3.connect(verbose = True)


BTE will find paths that join 'severe acute respiratory syndrome' and 'hereditary angioedema'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: ChemicalSubstance



==== Step #1: Query path planning ====

Because severe acute respiratory syndrome is of type 'Disease', BTE will query our meta-KG for APIs that can take 'Disease' as input and 'ChemicalSubstance' as output

BTE found 8 apis:

API 1. hmdb(1 API call)
API 2. mychem(2 API calls)
API 3. scibite(1 API call)
API 4. semmed_disease(15 API calls)
API 5. mydisease(1 API call)
API 6. cord_disease(1 API call)
API 7. scigraph(1 API call)
API 8. pharos(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 5.1: https://mydisease.info/v1/query?fields=ctd.chemical_related_to_disease (POST -d q=D045169&scopes=mondo.xrefs.mesh, disgenet.xrefs.mesh)
API 4.1: https://biothings.ncats.io/semmed/quer

In [20]:
q3_r_paths_table = q3.display_table_view()

q3_type = re.findall("dispatcher.([a-zA-Z]+)'", str(type(q3.fc)))
q3_type = "".join(q3_type)  ## convert to string

q3 = None  ## clear memory

In [21]:
## show number of unique intermediate nodes
print("There are {0} unique {1}s linked to both {2} and {3}.".format( \
    q3_r_paths_table.node1_name.nunique(), q3_intermediate, diseaseOrPheno_starting_str, diseaseOrPheno2_starting_str))

## show number of paths from hypertension to cymbalta
print("There are {0} unique paths between {1} and {2}.".format( \
    q3_r_paths_table.shape[0], diseaseOrPheno_starting_str, diseaseOrPheno2_starting_str))

There are 11 unique ChemicalSubstances linked to both SARS and hereditary angioedema.
There are 26 unique paths between SARS and hereditary angioedema.


In [22]:
q3_r_paths_table['node1_name'].unique()

array(['C0243077', 'C0450442', 'C0544357', 'C1611640', 'CHEBI:35222',
       'CHEBI:48433', 'PHARMACEUTICAL PREPARATIONS',
       'INHIBITOR, PROTEASE', 'ADSORBED HEPATITIS B VACCINE', 'ESTROGEN',
       'ANIONIC TRNA POLYMER'], dtype=object)

## Trying

BTE performs the **query path planning** and **query path execution** by deconstructing the query into individual API calls, executing those API calls, and then assembling the results.

The code block below takes ~4 seconds to run. 

Note that this question is not working with drug as ecallantide, Cinryze, Berinert, Ruconest, lanadelumab: BTE can find these objects (Hint module), but the specific drug -> disease is not finding anything. Issue with datasources or with identifier mapping inside hint module (not enough IDs)? 

Trying to see what ChemicalSubstances are even returned for hereditary angioedema

In [23]:
## the human user gives this input
q4_intermediate = None

q4 = FindConnection(input_obj = diseaseOrPheno2_hint_obj,\
                    output_obj = 'ChemicalSubstance', \
                    intermediate_nodes = q4_intermediate)
q4.connect(verbose = True)


BTE will find paths that join 'hereditary angioedema' and 'ChemicalSubstance'. Paths will have 0 intermediate node.




==== Step #1: Query path planning ====

Because hereditary angioedema is of type 'Disease', BTE will query our meta-KG for APIs that can take 'Disease' as input and 'ChemicalSubstance' as output

BTE found 8 apis:

API 1. hmdb(1 API call)
API 2. mychem(2 API calls)
API 3. scibite(1 API call)
API 4. semmed_disease(15 API calls)
API 5. mydisease(1 API call)
API 6. cord_disease(1 API call)
API 7. scigraph(1 API call)
API 8. pharos(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 5.1: https://mydisease.info/v1/query?fields=ctd.chemical_related_to_disease (POST -d q=D054179&scopes=mondo.xrefs.mesh, disgenet.xrefs.mesh)
API 4.15: https://biothings.ncats.io/semmed/query?fields=positively_regulated_by (POST -d q=CN239191,C0019243&scopes=umls)
API 4.3: https://biothing

In [24]:
q4_r_paths_table = q4.display_table_view()

q4_type = re.findall("dispatcher.([a-zA-Z]+)'", str(type(q4.fc)))
q4_type = "".join(q4_type)  ## convert to string

q4 = None  ## clear memory

In [25]:
q4_r_paths_table['output_name'].unique()

array(['C0243076', 'C0243077', 'C0351241', 'C0450442', 'C0544357',
       'C0596537', 'C0597512', 'C1611640', 'C1305959', 'CHEBI:35222',
       'CHEBI:48433', 'CHEBI:8583', 'CHEBI:48705', 'CHEBI:48706',
       'CHEBI:37958', 'CHEBI:64926', 'CHEBI:33697', 'CHEBI:17891',
       'CHEBI:36080', 'CHEBI:24505',
       '(S)-1-(N(2)-(1-CARBOXY-3-PHENYLPROPYL)-L-LYSYL)-L-PROLINE',
       '(3S)-2-{N-[(2S)-1-ETHOXYCARBONYL-4-PHENYLBUTAN-2-YL]-L-ALANYL}-1,2,3,4-TETRAHYDROISOQUINOLINE-3-CARBOXYLIC ACID',
       '(2S-(1(R*(R*)),2ALPHA,3ABETA,6ABETA))-1-(2-((1-(ETHOXYCARBONYL)-3-PHENYLPROPYL)AMINO)-1-OXOPROPYL)OCTAHYDROCYCLOPENTA(B)PYRROLE-2-CARBOXYLIC ACID',
       '(-)-CAPTOPRIL', 'ENALAPRIL ACID',
       '(S)-1-(N-(1-(ETHOXYCARBONYL)-3-PHENYLPROPYL)-L-ALANYL)-L-PROLINE',
       '(7E,9E,11E,13Z)-RETINOIC ACID', '(+/-)-VERAPAMIL',
       '1H-1-BENZAZEPINE-1-ACETIC ACID, 3-((1-(ETHOXYCARBONYL)-3-PHENYLPROPYL)AMINO)-2,3,4,5-TETRAHYDRO-2-OXO-, (S-(R*,R*))-',
       '6-CHLORO-1,1-DIOXO-1,2-DIHYDRO-1LAMB