# COVID, Pathways / BiologicalProcesses, and the bradykinin storm article

2020-11-26 update: moved the code removing unresolved UMLS ids, now runs before the report of how many entities there are in the answer KG

## Introduction

**To experiment with an executable version of this notebook, [load it in Google Colaboratory](https://colab.research.google.com/github/colleenXu/biothings_explorer/blob/relay/jupyter%20notebooks/CX_WIPs/TranslatorUseCase_COVID_PathwaysBP_newPredict.ipynb).**

The ["bradykinin storm" mechanism article](https://elifesciences.org/articles/59177) hypothesizes that RAS- and bradykinin-related pathways are linked to COVID-19 symptoms. 

Using BTE, we can explore this line of reasoning using the following query templates:
`Disease` &rarr; `PhenotypicFeature` &rarr; `Gene` &rarr; `Pathway`
`Disease` &rarr; `PhenotypicFeature` &rarr; `Gene` &rarr; `BiologicalProcess`

We will use severe acute respiratory syndrome (SARS), as a proxy for COVID-19, as our specific disease of interest. 

Notes:

BioThings Explorer (BTE) can answer two classes of queries -- "EXPLAIN" and "PREDICT". This Question fits the PREDICT  template of starting with **a specific biomedical entity** (a specific `Disease` X) and finding relationships with **one biomedical entity type** (like `PhenotypicFeature` or `Gene`).
* Note that currently a `Protein` biomedical entity type is not implemented in BTE. Instead, protein-coding and some non-coding genes are `Genes`. 

* This query will return a graph object with entities as nodes and relationships as edges. We then use edge provenance information to **filter** the results. For each Gene node, we use the number of unique paths from SARS (input node) to that node to **score** it. The scores can then be used to sort the results.  

## Step 0: Load BTE modules, notebook functions

In [None]:
## for Google Colab
%%capture
!pip install git+https://github.com/colleenXu/biothings_explorer@relay#egg=biothings_explorer

In [1]:
## CX: allows multiple lines of code to print from one code block
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# import modules from biothings_explorer
from biothings_explorer.hint import Hint
from biothings_explorer.query.predict import Predict  ## new Predict will run

## show time that this notebook was executed 
from datetime import datetime

## packages to work with objects 
import re

## to get around bugs
import nest_asyncio
nest_asyncio.apply()

In [2]:
## functions to add to modules?
def hint_display(query, hint_result):
    """
    show the type, name, number of IDs for all results returned by the query
    
    :param: query: string used in hint query
    :param: hint_result: object returned from hint query, a dictionary of lists of dictionaries
    
    Returns: None
    """
    ## function needs to be rewritten if it's going to give the exact index of each object within its type 
    display = ['type', 'name']  ## replace with the parts of the BioThings object you want to see
    concise_results = []
    for BT_type, result in hint_result.items():
        if result:  ## basically if it's not empty
            for items in result:
                ## number of identifiers per object: number of keys - 4 (name, primary, display, type)
                temp = len(items) - 4
                concise_results.append((items[display[0]], items[display[1]], 
                                         str(temp)))
                    
    print('There are {total} BioThings objects returned for {ht}:'.format(\
                total = len(concise_results), ht = query))
    for display_info in concise_results:
        print('{0}, {1}, num of IDs: {2}'.format(display_info[0], display_info[1], display_info[2]))

In [3]:
def filter_table(df):
    """
    use _source and _method columns to remove rows (paths) from the dataframe
    :param: pandas dataframe containing results from BTE FindConnection module, in table form
    
    Returns: filtered dataframe
    """
    ## note: still needs checking with EXPLAIN queries
    ## key is the string to match to column, value is a list of strings to match to column values
    filter_out = {'_source': ['SEMMED', 'CTD', 'ctd', 'omia']   
#                   '_method': []  ## currently no method stuff I want to filter out
                 }
    ## SEMMED: text mining results wrong for PhenotypicFeature -> Gene
    ## CTD/ctd: results odd for MSUD -> ChemicalSubstance
    ## omia: results wrong or discontinued gene IDs for PhenotypicFeature -> Gene
    
    
    df_temp = df.copy()  ## so the original df isn't modified in-place
    for key,val in filter_out.items():
        ## find columns that match the key string
        columns = [i for i in df_temp.columns if key in i]
        ## iterate through each column
        for col in columns:
            ## iterate through each value to take out, check if string CONTAINS match. 
            ## only keep rows that don't contain the value
            for i in val:
                df_temp = df_temp[~ df_temp[col].str.contains(i, na = False)]
    return df_temp

In [4]:
## set for new predict, note that it's using labels and not ids....(change that behavior?)
def scoring_output(df, q_type):
    """
    score results based on whether query was Predict or Explain type, number of 
        intermediate nodes 
    :param: pandas dataframe containing results from BTE FindConnection module
    :param: string describing type of query (Predict or Explain)
    
    May flatten some edges, because score only counts one edge per 
        unique predicate / API / method (ignoring source and pubmed col)
    
    Predict queries: score each output node by counting # of paths
        from input nodes to it. Normalize by dividing by maximum
        possible # of paths
    Explain two-hop (one intermediate) queries: score each intermediate node by 
        counting # of paths (between input and output nodes) that include it. 
        Normalize by dividing by maximum possible # of paths    

    Explain one-hop (direct) queries: no need to score, prints message
    Other Explain queries (many-hops): currently not able to score, prints message     
    
    Returns: pandas series with scores, index is output_name
             or None (one-hop or many-hop Explain query)
    """
    df_temp = df.copy()  ## so no chance to mutate this   
    flag_direct = False  ## one-hop query or not
    ## use df_col to look quicker into columns
    df_col = set(df_temp.columns)
    
    ## ignore source and pubmed col in looking at unique edges 
    columns_drop = [col for col in df_col if (('_source' in col) or ('_publications' in col))]
    df_temp.drop(columns = columns_drop, inplace = True)    
    df_temp.drop_duplicates(inplace = True)
    
    ## check if query is one-hop or not
    if "node1_label" not in df_col:    ## name for first intermediate node layer
        flag_direct = True  
    
    if q_type == 'Explain':
        if flag_direct:   # one hop / no intermediates
            print('No valid node scoring for one-hop (direct) Explain queries.')
            return None
        ## if there are many-hops/intermediate layers
        elif "node2_label" in df_col:  ## name for 2nd intermed. node layer
            print('Cannot currently score many-hop Explain queries.')
            return None
        else:   ## two-hop / 1 intermediate layer
            ## count multi-edges to results (the intermediate node1 col)
            scores = df_temp.node1_label.value_counts() 
            ## to find the maximum-possible number of edges, look at non-result cols
            columns_drop = [col for col in df_col if 'node1' in col]
            df_temp.drop(columns = columns_drop, inplace = True)
            ## now look at number of unique combos for input, edge info, output
            df_temp.drop_duplicates(inplace = True)
            max_paths = df_temp.shape[0]            
            ## normalize scores by dividing each by max number of paths
            scores = scores / max_paths

    else:  ## Predict type query
        ## count multi-edges to results (the output col)
        scores = df_temp.output_label.value_counts()
        ## to find the maximum number of multi-edges, look at non-output col
        columns_drop = [col for col in df_temp.columns if 'output' in col]
        df_temp.drop(columns = columns_drop, inplace = True)
        ## now look at number of unique paths possible
        df_temp.drop_duplicates(inplace = True)
        max_paths = df_temp.shape[0]
        ## normalize scores by dividing each by max number of paths
        scores = scores / max_paths
            
    ## return scores as pandas dataframe, with rank
    scores = scores.to_frame(name = 'score') 
    scores['rank'] = scores['score'].rank(method = 'dense', ascending = False)
    return scores

In [5]:
## record when cell blocks are executed
print('The time that this notebook was executed is...')
print('Local time (PST, West Coast USA): ')
print(datetime.now())
print('UTC time: ')
print(datetime.utcnow())

The time that this notebook was executed is...
Local time (PST, West Coast USA): 
2020-11-26 16:47:24.271129
UTC time: 
2020-11-27 00:47:24.271325


## Step 1: Find representation of "SARS" in BTE

In this step, BioThings Explorer translates our query string "SARS"  into BioThings objects, which contain mappings to many common identifiers. We then pick the BioThings object that best matches what we want. 

Generally, the top result returned by the Hint module for your BioThings type of interest will match what you want, but you should confirm that using the identifiers shown. 


> BioThings types correspond to children and descendants of [BiologicalEntity](https://biolink.github.io/biolink-model/docs/BiologicalEntity.html) from the [Biolink Model](https://biolink.github.io/biolink-model/docs/), including `Disease` (e.g., "lupus"), `ChemicalSubstance` (e.g., "acetaminophen"), `Gene` (e.g., "CDK2"), `BiologicalProcess` (e.g., "T cell differentiation"), and `Pathway` (e.g., "Citric acid cycle"). **However, [only a subset of the Biolink BiologicalEntity children / descendants are currently implemented in BTE](https://smart-api.info/portal/translator/metakg)**. More biomedical object types will be available as more knowledge sources (APIs) are added to the system. **Note that the type `BiologicalEntity` means any BioThings type currently implemented in BTE will be accepted.**

In [6]:
ht = Hint()  ## neater way to call this BTE module

## the human user gives this input
disease_starting_str = "SARS"

disease_hint = ht.query(disease_starting_str)
hint_display(disease_starting_str, disease_hint)

There are 10 BioThings objects returned for SARS:
ChemicalSubstance, Anti-SARS-CoV-2 REGN-COV2, num of IDs: 1
Disease, severe acute respiratory syndrome, num of IDs: 5
Disease, COVID-19, num of IDs: 2
Disease, COVID-19–associated multisystem inflammatory syndrome in children, num of IDs: 2
Disease, IMD74, num of IDs: 0
MolecularActivity, selenocysteine-tRNA ligase activity, num of IDs: 2
MolecularActivity, mRNA (guanine-N7-)-methyltransferase activity, num of IDs: 3
MolecularActivity, 5'-3' RNA helicase activity, num of IDs: 2
MolecularActivity, mRNA (nucleoside-2'-O-)-methyltransferase activity, num of IDs: 3
MolecularActivity, RNA-directed 5'-3' RNA polymerase activity, num of IDs: 3


Note: the query failed to retrieve Disease &rarr; PhenotypicFeatures for COVID-19 (a sibling of SARS in the Mondo ontology) and Orthocoronavirinae infectious disease (parent of COVID-19 and SARS in the Mondo ontology). 

So we'll pick the SARS `Disease` choice (indexed at 0) for our query. We can look at identifier mappings inside this BioThings object. 

In [7]:
## the human user makes this choice, gives this input
disease_choice_type = 'Disease'
disease_choice_idx = 0

disease_hint_obj = disease_hint[disease_choice_type][disease_choice_idx]  
disease_hint_obj
## these inner dictionaries are keys = id type, 
##       values = curie, normal string, or dictionary (for the key 'primary')

{'MONDO': 'MONDO:0005091',
 'DOID': 'DOID:2945',
 'UMLS': 'C1175175',
 'name': 'severe acute respiratory syndrome',
 'MESH': 'D045169',
 'ORPHANET': '140896',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0005091'},
 'display': 'MONDO(MONDO:0005091) DOID(DOID:2945) ORPHANET(140896) UMLS(C1175175) MESH(D045169) name(severe acute respiratory syndrome)',
 'type': 'Disease'}

## SARS &rarr; PhenotypicFeature &rarr; Gene &rarr; Pathway

In this section, we dynamically generate a knowledge graph with paths connecting SARS to pathways *using PhenotypicFeature and Gene intermediates*.  

BTE performs the **query path planning** and **query path execution** by deconstructing the query into individual API calls, executing those API calls, and then assembling the results.

The code block below takes less than a minute to run.   

In [8]:
## the human user gives this input
q1_output_type = 'Pathway'
q1_intermediate = ['PhenotypicFeature', 'Gene']

## uses newer version of BTE Predict function
q1 = Predict(input_objs = [disease_hint_obj],\
             output_types = [q1_output_type], \
             intermediate_nodes = q1_intermediate,
             config = {})  ## no configs set
q1.connect(verbose = False)

In [9]:
# q1_r_graph = q1.fc.G   ## for changing the graph object to reflect the table
q1_r_paths_table = q1.display_table_view()

q1_type = re.findall("query.predict.([a-zA-Z]+)'", str(type(q1)))
q1_type = "".join(q1_type)  ## convert to string

q1 = None  ## clear memory

We can see the number of PhenotypicFeatures that were linked to SARS, the number of Genes linked to those PhenotypicFeatures, the number of Pathways returned as output, and the total number of paths from .and to a Gene, the number of Genes returned as output nodes, and the total number of paths from SARS to Pathway nodes. 

In [10]:
## show number of unique intermediate nodes
print("There are {0} unique {1}s for {2}.".format( \
    q1_r_paths_table.node1_label.nunique(), q1_intermediate[0], disease_starting_str))

print("There are {0} unique {1}s for {2}.".format( \
    q1_r_paths_table.node2_label.nunique(), q1_intermediate[1], disease_starting_str))

## show number of unique output nodes
print("There are {0} unique {1}s for {2}.".format( \
    q1_r_paths_table.output_label.nunique(), q1_output_type, disease_starting_str))

## show number of paths from disease to genes
print("There are {0} unique paths.".format( \
    q1_r_paths_table.shape[0]))

There are 16 unique PhenotypicFeatures for SARS.
There are 805 unique Genes for SARS.
There are 2059 unique Pathways for SARS.
There are 51768 unique paths.


The PhenotypicFeatures (symptoms) of SARS/COVID19 used:

In [11]:
q1_r_paths_table['node1_label'].unique()

array(['Cough', 'Headache', 'Neoplasm', 'Fever', 'Dyspnea',
       'Respiratory distress', 'Myalgia', 'Immunodeficiency',
       'Diabetes mellitus',
       'Respiratory failure requiring assisted ventilation',
       'Acute infectious pneumonia', 'Hypoxemia',
       'Abnormality of the cardiovascular system', 'Pharyngitis',
       'Acute kidney injury', 'Chronic lung disease'], dtype=object)

### Scoring

The scoring process for Predict queries (the type of query we're using now): 

It relies on the assumption that the user would be most interested in Pathways that share many intermediate nodes (phenotypes and genes) with SARS. 

1. To score individual Pathway nodes, we first take a copy of the knowledge graph (KG) and remove some multi-edges. 
    * Each edge has predicate, API, method, source, and pubmed information. For scoring purposes, we will ignore pubmed and source information because APIs handle this information differently (returning multiple edges or single edges). 
2. We then count the number of paths from the SARS node to each Pathway node.        
3. Finally, we "normalize" the score by dividing those counts by maximum-possible number of paths from the SARS node to a Pathway node.

We can then see the top-scored nodes. A score of closer to 1 means that the many PhenotypicFeatures and Genes link SARS and the Pathway node. A score closer to 0 means that only a few PhenotypicFeatures and Genes link SARS and the Pathway node. 

In [14]:
## create scoring table for Genes (output nodes)
q1_scoring = scoring_output(q1_r_paths_table, q1_type)
q1_scoring.index.name = 'output_label'
q1_scoring.reset_index(inplace = True)

q1_scoring.head(20)

Unnamed: 0,output_label,score,rank
0,Immune System,0.292723,1.0
1,Signal Transduction,0.280458,2.0
2,Metabolism,0.250204,3.0
3,Disease,0.239575,4.0
4,Metabolism of proteins,0.206051,5.0
5,Cytokine Signaling in Immune system,0.146361,6.0
6,Gene expression (Transcription),0.141455,7.0
7,Innate Immune System,0.140638,8.0
8,RNA Polymerase II Transcription,0.132461,9.0
9,Post-translational protein modification,0.130826,10.0


Notice that multiple top results have to do with the immune system, infectious disease, cytokines, and interleukins. 

### Comparing answers to bradykinin article

We manually took keywords from the parts of the ["bradykinin storm" mechanism article](https://elifesciences.org/articles/59177) that discussed pathways and mechanisms, and we searched for them in the pathways returned by BTE. The keywords that returned results and the results found are shown below. 

Notice that while RAS signaling and other processes mentioned in the article are found in BTE, they are not the top-scoring results overall. 

In [15]:
path_mask = 'RAS|ACE Inhibitor|Vascular|Coagulation|Clot|Inflamm|inflamm|'\
            'Vasopressin|Neutrophil|Fibrin|Cardiac|cardiac|Vitamin D'\
            'vitamin D|Potassium'

q1_scoring_BK = q1_scoring[q1_scoring['output_label'].str.contains(path_mask)].copy()
q1_scoring_BK.head(10)

Unnamed: 0,output_label,score,rank
38,Neutrophil degranulation,0.055601,34.0
165,Cardiac conduction,0.023712,67.0
190,Cardiac Progenitor Differentiation,0.022077,69.0
213,Calcium Regulation in the Cardiac Cell,0.020442,71.0
309,Inflammatory Response Pathway,0.016353,76.0
337,Cardiac Hypertrophic Response,0.014718,78.0
399,Resistin as a regulator of inflammation,0.013083,80.0
452,"Relationship between inflammation, COX-2 and EGFR",0.011447,82.0
492,Inflammasomes,0.01063,83.0
525,Fibrin Complement Receptor 3 Signaling Pathway,0.009812,84.0


### APIs used

Different knowledge sources (APIs) were called in different parts of the query. 

In the first part of the query (SARS &rarr; PhenotypicFeature), the following APIs returned results and the following predicates (semantic relationships) were found. 

In [16]:
## show that the APIs use different predicates
q1_r_paths_table[['pred1_api', 'pred1']].drop_duplicates().sort_values(by = ['pred1_api', 'pred1'])

Unnamed: 0,pred1_api,pred1
7,BioLink API,has_phenotype
0,mydisease.info API,related_to


In the second part of the query (PhenotypicFeature &rarr; Gene), the following APIs returned results and the following predicates (semantic relationships) were found. 

In [17]:
q1_r_paths_table[['pred2_api', 'pred2']].drop_duplicates().sort_values(by = ['pred2_api', 'pred2'])

Unnamed: 0,pred2_api,pred2
0,BioLink API,related_to
284,EBIgene2phenotype API,related_to


In the third part of the query (Gene &rarr; Pathway), the following APIs returned results and the following predicates (semantic relationships) were found.

In [18]:
## show that the APIs use different predicates
q1_r_paths_table[['pred3_api', 'pred3']].drop_duplicates().sort_values(by = ['pred3_api', 'pred3'])

Unnamed: 0,pred3_api,pred3
0,MyGene.info API,functional_association


## SARS &rarr; PhenotypicFeature &rarr; Gene &rarr; BiologicalProcess

In this section, we dynamically generate a knowledge graph with paths connecting SARS to biological processes (GO biological process terms) *using PhenotypicFeature and Gene intermediates*.  

BTE performs the **query path planning** and **query path execution** by deconstructing the query into individual API calls, executing those API calls, and then assembling the results.

The code block below takes less than a minute to run.   

In [19]:
## the human user gives this input
q2_output_type = 'BiologicalProcess'
q2_intermediate = ['PhenotypicFeature', 'Gene']

q2 = Predict(input_objs = [disease_hint_obj],\
             output_types = [q2_output_type], \
             intermediate_nodes = q2_intermediate, \
             config = {})  ## no configs set
q2.connect(verbose = False)

In [20]:
# q1_r_graph = q1.fc.G   ## for changing the graph object to reflect the table
q2_r_paths_table = q2.display_table_view()

q2_type = re.findall("predict.([a-zA-Z]+)'", str(type(q2)))
q2_type = "".join(q2_type)  ## convert to string

q2 = None  ## clear memory

In [22]:
## current bug: issue resolving UMLS ids, removing them for now
q2_r_paths_table = q2_r_paths_table[~ q2_r_paths_table['output_label'].str.contains('UMLS:')]

In [23]:
## show number of unique intermediate nodes
print("There are {0} unique {1}s for {2}.".format( \
    q2_r_paths_table.node1_label.nunique(), q2_intermediate[0], disease_starting_str))

print("There are {0} unique {1}s for {2}.".format( \
    q2_r_paths_table.node2_label.nunique(), q2_intermediate[1], disease_starting_str))

## show number of unique output nodes
print("There are {0} unique {1}s for {2}.".format( \
    q2_r_paths_table.output_label.nunique(), q2_output_type, disease_starting_str))

## show number of paths from disease to genes
print("There are {0} unique paths.".format( \
    q2_r_paths_table.shape[0]))

There are 16 unique PhenotypicFeatures for SARS.
There are 924 unique Genes for SARS.
There are 5736 unique BiologicalProcesss for SARS.
There are 73030 unique paths.


The PhenotypicFeatures (symptoms) of SARS/COVID19 used:

In [24]:
q2_r_paths_table['node1_label'].unique()

array(['Cough', 'Headache', 'Neoplasm', 'Fever', 'Dyspnea',
       'Respiratory distress', 'Myalgia', 'Immunodeficiency',
       'Diabetes mellitus',
       'Respiratory failure requiring assisted ventilation',
       'Acute infectious pneumonia', 'Hypoxemia', 'Chronic lung disease',
       'Abnormality of the cardiovascular system', 'Acute kidney injury',
       'Pharyngitis'], dtype=object)

### Scoring

The scoring process for Predict queries (the type of query we're using now): 

It relies on the assumption that the user would be most interested in Pathways that share many intermediate nodes (phenotypes and genes) with SARS. 

1. To score individual Pathway nodes, we first take a copy of the knowledge graph (KG) and remove some multi-edges. 
    * Each edge has predicate, API, method, source, and pubmed information. For scoring purposes, we will ignore pubmed and source information because APIs handle this information differently (returning multiple edges or single edges). 
2. We then count the number of paths from the SARS node to each Pathway node.        
3. Finally, we "normalize" the score by dividing those counts by maximum-possible number of paths from the SARS node to a Pathway node.

We can then see the top-scored nodes. A score of closer to 1 means that the many PhenotypicFeatures and Genes link SARS and the Pathway node. A score closer to 0 means that only a few PhenotypicFeatures and Genes link SARS and the Pathway node. 

In [25]:
## create scoring table for Genes (output nodes)
q2_scoring = scoring_output(q2_r_paths_table, q2_type)
q2_scoring.index.name = 'output_label'
q2_scoring.reset_index(inplace = True)

q2_scoring.head(20)

Unnamed: 0,output_label,score,rank
0,growth,0.102507,1.0
1,positive regulation of transcription by RNA po...,0.094708,2.0
2,gene expression,0.089694,3.0
3,negative regulation of transcription by RNA po...,0.071309,4.0
4,innate immune response,0.069081,5.0
5,"positive regulation of transcription, DNA-temp...",0.068524,6.0
6,signal transduction,0.067409,7.0
7,inflammatory response,0.066852,8.0
8,cell population proliferation,0.064067,9.0
9,catagen,0.060724,10.0


Notice that some top results include the immune system, including innate immunity, viral process, cytokines, neutrophils, and inflammation. 

### Comparing answers to bradykinin article

We manually took keywords from the parts of the ["bradykinin storm" mechanism article](https://elifesciences.org/articles/59177) that discussed pathways and mechanisms, and we searched for them in the biological processes returned by BTE. The keywords that returned results and the results found are shown below. 

Notice that while RAS signaling and other processes mentioned in the article are found in BTE, they are not the top-scoring results overall. 

In [26]:
bp_mask = 'aldosterone|angiotensin|hyaluronan|vaso|vascular permeability|'\
          'coagulation|sodium|pressure|neutrophil|inflamm|fibrin|'\
          'vitamin D|cardiac|pain|potassium'

q2_scoring_BK = q2_scoring[q2_scoring['output_label'].str.contains(bp_mask)].copy()
q2_scoring_BK.head(10)

Unnamed: 0,output_label,score,rank
7,inflammatory response,0.066852,8.0
42,neutrophil degranulation,0.037883,34.0
62,blood coagulation,0.029526,43.0
87,negative regulation of inflammatory response,0.024513,51.0
180,cardiac muscle contraction,0.015042,68.0
210,sodium ion transmembrane transport,0.01337,71.0
223,vasoconstriction,0.012813,72.0
281,regulation of inflammatory response,0.011142,75.0
288,sodium ion transport,0.011142,75.0
297,regulation of blood pressure,0.010585,76.0


### APIs used

Different knowledge sources (APIs) were called in different parts of the query. 

In the first part of the query (SARS &rarr; PhenotypicFeature), the following APIs returned results and the following predicates (semantic relationships) were found. 

In [27]:
## show that the APIs use different predicates
q2_r_paths_table[['pred1_api', 'pred1']].drop_duplicates().sort_values(by = ['pred1_api', 'pred1'])

Unnamed: 0,pred1_api,pred1
7,BioLink API,has_phenotype
0,mydisease.info API,related_to


In the second part of the query (PhenotypicFeature &rarr; Gene), the following APIs returned results and the following predicates (semantic relationships) were found. 

In [28]:
q2_r_paths_table[['pred2_api', 'pred2']].drop_duplicates().sort_values(by = ['pred2_api', 'pred2'])

Unnamed: 0,pred2_api,pred2
0,BioLink API,related_to
1964,EBIgene2phenotype API,related_to


In the third part of the query (Gene &rarr; Pathway), the following APIs returned results and the following predicates (semantic relationships) were found.

In [29]:
## show that the APIs use different predicates
q2_r_paths_table[['pred3_api', 'pred3']].drop_duplicates().sort_values(by = ['pred3_api', 'pred3'])

Unnamed: 0,pred3_api,pred3
98,CORD Gene API,related_to
0,MyGene.info API,functional_association
