***
***

<img width='700' src="https://user-images.githubusercontent.com/8030363/108961534-b9a66980-7634-11eb-96e2-cc46589dcb8c.png" style="vertical-align:middle">

## Knowledge Graph Entity Search Examples

***

**Author:** [TJCallahan](http://tiffanycallahan.com/)  
**GitHub Repository:** [PheKnowLator](https://github.com/callahantiff/PheKnowLator/wiki)  
**Wiki Page:** [OWL-NETS-2.0](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0)  
**Release:** **[v3.0.0](https://github.com/callahantiff/PheKnowLator/wiki/v3.0.0)**  
  
<br> 

### Purpose  
The goal of this notebook is to explore different ways to examine relationships between entities in a PheKnowLator knowledge graph.

#### PheKnowLator Knowledge Graph Build  
This notebook was built using a `v3.0.2` OWL-NETS-abstracted subclass-based build with inverse relations, which is publicly available and can be downloaded using the following links:  
- [PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle](https://storage.googleapis.com/pheknowlator/current_build/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle)  
- [PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NodeLabels.txt](https://storage.googleapis.com/pheknowlator/current_build/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NodeLabels.txt)  

<br>

***  
## Set-Up Environment 
***  

### Dependencies: [pkt_kg](https://pypi.org/project/pkt-kg/), [networkx](https://pypi.org/project/networkx/), [rdflib](https://pypi.org/project/rdflib/)

To prepare for the tutorial we need to make sure that the all needed libraries are downloaded and imported. If you don't already have `pkt_kg`, `rdflib`, and `networkx` installed, you can extend the code chunk below to include any libraries that you need to download. In addition to downloading needed libraries, you will also need to download the specific version of each knowledge graph that you want to analyze. Each data source is briefly described in the next section.  

In [None]:
# # uncomment and run to install any required modules from notebooks/requirements.txt
# import sys
# !{sys.executable} -m pip install -r ../../notebooks/requirements.txt

In [1]:
# # if running a local version of pkt_kg, uncomment the code below
# import sys
# sys.path.append('../')

In [3]:
# import needed libraries
import json
import networkx as nx
import os
import pandas as pd
import pickle
import random
import re

from pkt_kg.utils import *
from rdflib import Graph, Namespace, URIRef, BNode, Literal
from rdflib.namespace import RDFS
from tqdm.notebook import tqdm
from typing import Callable, Dict, List, Optional, Union

# create namespace for OBO ontologies
obo = Namespace('http://purl.obolibrary.org/obo/')

### Knowledge Graph 
The initial exploration will be performed using a `v3.0.2` OWL-NETS-abstracted subclass-based build with inverse relations, which is publicly available and can be downloaded using the following links:  
- [PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle](https://storage.googleapis.com/pheknowlator/current_build/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle)  
- [PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NodeLabels.txt](https://storage.googleapis.com/pheknowlator/current_build/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NodeLabels.txt)  


In [4]:
# notebook will create a temporary directory
# write_location = '../releases/Columbia_Collaboration/tara_anand/data/'
write_location = '../temp_directory/'
if not os.path.exists(write_location):
    os.mkdir(write_location)

In [5]:
# download data to the data directory
data_urls = [
    'https://storage.googleapis.com/pheknowlator/current_build/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NetworkxMultiDiGraph.gpickle',
    'https://storage.googleapis.com/pheknowlator/current_build/knowledge_graphs/subclass_builds/inverse_relations/owlnets/PheKnowLator_v3.0.2_full_subclass_inverseRelations_OWLNETS_NodeLabels.txt',
    'https://www.dropbox.com/s/ev0ea6v6fu70fbl/entity_metadata_dict.pkl?dl=1'
]

for url in data_urls:
    file_name = url.split('/')[-1] if 'entity_metadata_dict.pkl' not in url else re.sub(r'\?.*', '', url.split('/')[-1])
    if not os.path.exists(write_location + file_name):
        data_downloader(url, write_location, file_name)

### Helper Functions 
Create helper functions that are needed to process node data.

In [None]:
def nx_ancestor_search(kg: nx.multidigraph.MultiDiGraph, nodes: List, prefix: str, anc_list: Optional[List]=None) -> Union[Callable, List]:
    """Returns all ancestors nodes reachable through a direct edge. The returned list is ordered by senority.
    
    Args:
        kg: A networkx MultiDiGraph object.
        nodes: A list of RDFLib URIRef objects or None.
        prefix: A string containing an ontology prefix (e..g., MONDO).
        anc_list: A list that is empty or that contains RDFLib URIRef objects.
        
    Returns:
        anc_list: A list of period-delimited strings, where each string represents a path 
    """
    
    ancestor_list = [] if anc_list is None else anc_list
    
    if len(nodes) == 0: return ancestor_list
    else:
        node = nodes.pop()
        node_list = list(kg.neighbors(node))
        neighborhood = [a for b in [[[i, n] for j in [kg.get_edge_data(*(node, n)).keys()]
                          for i in j] for n in node_list] for a in b]
        ancestors = [x[1] for x in neighborhood if (prefix in str(x[1]) and x[0] == RDFS.subClassOf)]
        if len(ancestors) > 0:
            ancestor_list += [[str(x) for x in ancestors]]
            nodes += ancestors
        return nx_ancestor_search(kg, nodes, prefix, ancestor_list)
    
    
def processes_ancestor_path_list(path_list: List, node_metadata: Dict) -> Dict:
    """Processes a nested list of ancestor paths into a single unique list.
    
    Args:
        path_list: A nested list of ontology URLs, where each list represents a set of ancestors.
        node_metadata: A dictionary 
    
    Returns:
        ancestors: A nested list where each inner list contains ontology identifier strings
    """
    
    anc_dict = dict()
    
    for path in path_list:
        for x in path:
            idx = max([i for i, j in enumerate(path_list) if x in j])
            if str(idx) in anc_dict.keys(): anc_dict[str(idx)] |= {x}
            else: anc_dict[str(idx)] = {x}

    # reorder and format keys
    ancestors = [['{} ({})'.format(node_data_dict[str(x)]['label'], x) for x in anc_dict[str(k)]]
                 for k in sorted([int(x) for x in anc_dict.keys()])]
    
    return ancestors


def formats_node_information(neighborhood: List, metadata_dict: Dict, verbose: bool=False) -> None:
    """Processes neighborhood results.
    
    Args:
        neighborhood: A nested list of strings, where each string contains a node identifier.
        metadata_dict: node_metadata: A nested dictionary containing node attributes.
        verbose: A bool indicating whether or not node and edge metadata should be printed.
        
    
    Returns:
        None
    """
    
    for e, o in neighborhood:
        spe = '\n' if neighborhood.index([e, o]) == 0 else '\n\n'
        s, s_label = str(node[0]).split('/')[-1], metadata_dict[str(node[0])]['label']
        e_label = metadata_dict[str(e)]['label']
        o, o_label, o_def = str(o).split('/')[-1], metadata_dict[str(o)]['label'], metadata_dict[str(o)]['description']
        if verbose:
            if o_def != 'None': print(spe + '>>> {} ({}) - {} - {} ({})\n{} Definition: {}'.format(s_label, s, e_label, o_label, o, o, o_def))
            else: print(spe + '>>> {} ({}) - {} - {} ({})'.format(s_label, s, e_label, o_label, o))
        else: print('>>> {} ({}) - {} - {} ({})'.format(s_label, s, e_label, o_label, o))
    
    return None
    

def metadata_formatter(s: str, o: str, metadata_dict: Dict) -> None:
    """Function looks up edge-level metadata and prints it.
    
    Args:
        s: A string containing the identifier for the subject node of a predicate or triple.
        o: A string containing the identifier for the object node of a predicate or triple.
        metadata_dict: A nested dictionary containing node and edge-level metadata.
    
    Returns:
        None.
    """
    
    s = s + '-reactome_' if 'R-HSA' in s else s
    o = o + '-reactome_' if 'R-HSA' in o else o
    
    if s + '-' + o in metadata_dict['edges'].keys():
        print('\nEdge Evidence'); print(json.dumps(metadata_dict['edges'][s + '-' + o], indent=4))
    elif o + '-' + s in metadata_dict['edges'].keys():
         print('\nEdge Evidence'); print(json.dumps(metadata_dict['edges'][o + '-' + s], indent=4))
    else:
        pass
    
    return None


def formats_path_information(kg: nx.multidigraph.MultiDiGraph, paths: List, path_type: str, metadata_func: Callable, metadata_dict: Dict, node_metadata: Dict, verbose: bool=False, rand: bool=False, sample_size: int=10) -> None:
    """Processes shortest and simple path results.
    
    Args:
        kg: A networkx MultiDiGraph object.
        paths: A nested list of strings, where each string contains an an entity identifier.
        path_type: A string, either 'simple' or 'shortest' that indicates the types of paths to process.
        metadata_func: A function that processes edge metadata.
        metadata_dict: A nested dictionary containing node and edge-level metadata. 
        node_metadata: A nested dictionary containing node attributes.
        verbose: A bool indicating whether or not node and edge metadata should be printed.
        rand: A bool indicating whether or not to draw random samples from the path.
        sample_size: An integer used when rand is True to specify the size of the random sample to draw.
    
    Returns:
        None
    """
    
    if path_type == 'shortest': 
        for i in range(0, len(paths[0]) - 1):
            s = paths[0][i]; o = paths[0][i + 1]
            edges = kg.get_edge_data(*(s, o)).keys()
            for e in edges:
                spe = '\n' if list(edges).index(e) == 0 else '\n\n\n'
                s, s_label = str(s).split('/')[-1], node_metadata[str(s)]['label']
                e_label = node_metadata[str(e)]['label']
                o, o_label, o_def = str(o).split('/')[-1], node_metadata[str(o)]['label'], node_metadata[str(o)]['description']
                if verbose:
                    if o_def != 'None': print(spe + '>>> {} ({}) - {} - {} ({})\n\n{} Definition: {}'.format(s_label, s, e_label, o_label, o, o, o_def))
                    else: print(spe + '>>> {} ({}) - {} - {} ({})'.format(s_label, s, e_label, o_label, o))
                    metadata_func(s, o, metadata_dict)
                else: print('>>> {} ({}) - {} - {} ({})'.format(s_label, s, e_label, o_label, o))
    else:
        if rand: paths = random.sample(paths, sample_size)
        for path in paths:
            print('*' * 100)
            for i in range(0, len(path) - 1):
                spe = '\n' if i == 0 else '\n\n\n'
                s = path[i]; o = path[i + 1]; edges = kg.get_edge_data(*(s, o))
                try: edges.keys()
                except AttributeError: edges = kg.get_edge_data(*(o, s))
                for e in edges.keys():
                    s, s_label = str(s).split('/')[-1], node_metadata[str(s)]['label']
                    e_label = node_metadata[str(e)]['label']
                    o, o_label, o_def = str(o).split('/')[-1], node_metadata[str(o)]['label'], node_metadata[str(o)]['description']
                    if verbose:
                        if o_def != 'None': print(spe + '>>> {} ({}) - {} - {} ({})\n\n{} Definition: {}'.format(s_label, s, e_label, o_label, o, o, o_def))
                        else: print(spe + '>>> {} ({}) - {} - {} ({})'.format(s_label, s, e_label, o_label, o))
                        metadata_func(s, o, metadata_dict)
                    else: print('>>> {} ({}) - {} - {} ({})'.format(s_label, s, e_label, o_label, o))
            print('*' * 100); print('\n')
    
    return None
    

<br>


## Loading Data
***

___

The knowledge graph will be loaded as a `networkx` MultiDiGraph object and the node labels will be read in and converted to a dictionary to enable easy access to node labels and other relevant metadata.

In [6]:
# load the knowledge graph
kg = nx.read_gpickle(write_location + data_urls[0].split('/')[-1])
print('The knowledge graph contains {} nodes and {} edges'.format(len(kg.nodes()), len(kg.edges())))

The knowledge graph contains 780753 nodes and 7787308 edges


In [7]:
# convert multidigraph to undirected graph -- needed to run some of the algorithms
undirected_kg = kg.to_undirected()

In [8]:
# read in node metadata
node_data = pd.read_csv(write_location + data_urls[1].split('/')[-1], header=0, sep=r"\t", encoding="utf8", engine='python', quoting=3)
# remove angle brackets
node_data['entity_uri'] = node_data['entity_uri'].str.strip('<>')

node_data.head()

Unnamed: 0,entity_type,integer_id,entity_uri,label,description/definition,synonym
0,NODES,684158,https://www.ncbi.nlm.nih.gov/snp/rs864622148,NM_000051.4(ATM):c.5887G>A (p.Asp1963Asn),This variant is a germline single nucleotide v...,
1,NODES,668197,https://uswest.ensembl.org/Homo_sapiens/Transc...,CUL9-211,Transcript CUL9-211 is classified as type 'non...,
2,NODES,769680,http://purl.obolibrary.org/obo/CHEBI_116891,3-(3-methylphenyl)-2-sulfanylidene-1H-benzofur...,,
3,NODES,381659,https://www.ncbi.nlm.nih.gov/snp/rs61741688,NM_001272071.2(AP1S2):c.288T>C (p.Ser96=),This variant is a germline single nucleotide v...,
4,NODES,720533,http://purl.obolibrary.org/obo/PR_Q9Y2I7-3,1-phosphatidylinositol 3-phosphate 5-kinase is...,A 1-phosphatidylinositol 3-phosphate 5-kinase ...,hPIKFYVE/iso:h3


In [9]:
# remove angle brackets
node_data['entity_uri'] = node_data['entity_uri'].str.strip('<>')

# convert node data to dictionary
node_data_dict = dict()
for idx, row in tqdm(node_data.iterrows(), total=node_data.shape[0]):
    node_data_dict[row['entity_uri']] = {
        'label': row['label'],
        'description': row['description/definition'],
        'synonym': row['synonym']
    }




This file is temporary while the next release is being formatted.

In [10]:
# load metadata
filepath = write_location + data_urls[2].split('/')[-1]
max_bytes = 2**31 - 1; input_size = os.path.getsize(filepath); bytes_in = bytearray(0)
with open(filepath, 'rb') as f_in:
    for _ in range(0, input_size, max_bytes):
        bytes_in += f_in.read(max_bytes)
metadata_dict = pickle.loads(bytes_in)

FileNotFoundError: [Errno 2] No such file or directory: '../releases/Columbia_Collaboration/tara_anand/data/entity_metadata_dict.pkl?dl=1'

<br>


## Knowledge-based Characterization
***
____

The goal is to use the knowledge graph to explore what we know about specific concepts as well as what we can say about pairs of concepts. Additional details are presented by comparison below:

#### [Node-Level](#node-level)
 - <u>Node Ancestry</u>: Identify all ancestors for each node up to the root.
 - <u>Node Neighborhood</u>: Returns all nodes reachable from a node of interest via a single directed edge.   


#### [Path-Level](#path-level)
  - <u>All Shortest Paths</u>: Returns the shortest simple path, if there are multiple paths of the same length then they are all returned.
  - <u>All Simple Paths</u>: A simple path is a path with no repeated nodes. These nodes are identified using a modified depth-first search. Given that there are a lot of these, the initial output is limited to a random draw of 10 paths of length 10 from the first 100 derived paths.
  
For all comparisons, the full edge is returned along with all relevant node and edge metadata provided by each data source.

  ---

### Node-Level Characterization  <a class="anchor" id="node-level"></a>

This section characterizes the following concepts:
- [benazepril (`CHEBI_3011`)](#chebi1)  
- [hydrochlorothiazide (`CHEBI_5778`)](#chebi2)  
- [Acute Myocardial Infarction (`MONDO_0004781`)](#mondo1)  
- [Myocardial infarction (`HP_0001658`)](#hpo1)

*Note*. All output is presented twice for each analysis, the first without any metadata/evidence and the second time, with metadata. This is done to facilitate readability.

____


#### benazepril (`CHEBI_3011`) <a class="anchor" id="chebi1"></a>

*Ancestors*

In [None]:
# examine the node's ancestors
prefix = 'CHEBI'; node = [obo.CHEBI_3011]
path_list = nx_ancestor_search(kg, node.copy(), prefix)
chebi3011_ancestors = processes_ancestor_path_list(path_list, node_data_dict)

# print results -- nodes are ordered by seniority (higher numbers indicate closer to root)
print('Ancestors of {}\n'.format(node[0]))
for level in range(len(chebi3011_ancestors)):
    print('Level: {}'.format(str(level + 1)))
    for v in chebi3011_ancestors[level]:
        print('\t- {}'.format(re.sub('http://purl.obolibrary.org/obo/', '', v)))

*Neighborhood*

In [None]:
# examine the node's neigborhood
nodes = list(kg.neighbors(node[0]))
neighbors = [a for b in [[[i, n] for j in [kg.get_edge_data(*(node[0], n)).keys()]
                          for i in j] for n in nodes] for a in b]
chebi3011_sorted_neigbors = sorted(neighbors, key=lambda x: (str(x[1]).split('/')[-1], x[0]))

In [None]:
# print nodes without definitions
formats_node_information(chebi3011_sorted_neigbors, node_data_dict, verbose=False)

In [None]:
# print nodes with definitions
formats_node_information(chebi3011_sorted_neigbors, node_data_dict, verbose=True)

<br>

**hydrochlorothiazide (`CHEBI_5778`)** <a class="anchor" id="chebi2"></a>

*Ancestors*

In [None]:
# examine the node's ancestors
prefix = 'CHEBI'; node = [obo.CHEBI_5778]
path_list = nx_ancestor_search(kg, node.copy(), prefix)
chebi5778_ancestors = processes_ancestor_path_list(path_list, node_data_dict)

# print results -- nodes are ordered by seniority (higher numbers indicate closer to root)
print('Ancestors of {}\n'.format(node[0]))
for level in range(len(chebi5778_ancestors)):
    print('Level: {}'.format(str(level + 1)))
    for v in chebi5778_ancestors[level]:
        print('\t- {}'.format(re.sub('http://purl.obolibrary.org/obo/', '', v)))

*Neighborhood*

In [None]:
# examine the node's neigborhood
nodes = list(kg.neighbors(node[0]))
neighbors = [a for b in [[[i, n] for j in [kg.get_edge_data(*(node[0], n)).keys()]
                          for i in j] for n in nodes] for a in b]
chebi5778_sorted_neigbors = sorted(neighbors, key=lambda x: (str(x[1]).split('/')[-1], x[0]))

In [None]:
# print nodes without definitions
formats_node_information(chebi5778_sorted_neigbors, node_data_dict, verbose=False)

In [None]:
# print nodes with definitions
formats_node_information(chebi5778_sorted_neigbors, node_data_dict, verbose=True)

<br>

**Myocardial Infarction (`MONDO_0005068`)** <a class="anchor" id="mondo1"></a>

*Ancestors*

In [None]:
# examine the node's ancestors
prefix = 'MONDO'; node = [obo.MONDO_0005068]
path_list = nx_ancestor_search(kg, node.copy(), prefix)
mondo0005068_ancestors = processes_ancestor_path_list(path_list, node_data_dict)

# print results -- nodes are ordered by seniority (higher numbers indicate closer to root)
print('Ancestors of {}\n'.format(node[0]))
for level in range(len(mondo0005068_ancestors)):
    print('Level: {}'.format(str(level + 1)))
    for v in mondo0005068_ancestors[level]:
        print('\t- {}'.format(re.sub('http://purl.obolibrary.org/obo/', '', v)))

*Neighborhood*

In [None]:
# examine the node's neigborhood
nodes = list(kg.neighbors(node[0]))
neighbors = [a for b in [[[i, n] for j in [kg.get_edge_data(*(node[0], n)).keys()]
                          for i in j] for n in nodes] for a in b]
mondo0005068_sorted_neigbors = sorted(neighbors, key=lambda x: (str(x[1]).split('/')[-1], x[0]))

In [None]:
# print nodes without definitions
formats_node_information(mondo0005068_sorted_neigbors, node_data_dict, verbose=False)

In [None]:
# print nodes with definitions
formats_node_information(mondo0005068_sorted_neigbors, node_data_dict, verbose=True)

<br>

**Myocardial infarction (`HP_0001658`)** <a class="anchor" id="hpo1"></a>

*Ancestors*

In [None]:
# examine the node's ancestors
prefix = 'HP'; node = [obo.HP_0001658]
path_list = nx_ancestor_search(kg, node.copy(), prefix)
hp0001658_ancestors = processes_ancestor_path_list(path_list, node_data_dict)

# print results -- nodes are ordered by seniority (higher numbers indicate closer to root)
print('Ancestors of {}\n'.format(node[0]))
for level in range(len(hp0001658_ancestors)):
    print('Level: {}'.format(str(level + 1)))
    for v in hp0001658_ancestors[level]:
        print('\t- {}'.format(re.sub('http://purl.obolibrary.org/obo/', '', v)))

*Neighborhood*

In [None]:
# examine the node's neigborhood
nodes = list(kg.neighbors(node[0]))
neighbors = [a for b in [[[i, n] for j in [kg.get_edge_data(*(node[0], n)).keys()]
                          for i in j] for n in nodes] for a in b]
hp0001658_sorted_neigbors = sorted(neighbors, key=lambda x: (str(x[1]).split('/')[-1], x[0]))

In [None]:
# print nodes without definitions
formats_node_information(hp0001658_sorted_neigbors, node_data_dict, verbose=False)

In [None]:
# print nodes with definitions
formats_node_information(hp0001658_sorted_neigbors, node_data_dict, verbose=True)

<br>

___

### Path-Level Characterization  <a class="anchor" id="path-level"></a>

This section characterizes the following concept pairs:
- [benazepril (`CHEBI_3011`) - Myocardial Infarction (`MONDO_0005068`)](#pair1)     
- [hydrochlorothiazide (`CHEBI_5778`) - Myocardial Infarction (`MONDO_0005068`)](#pair2)  
- [benazepril (`CHEBI_3011`) - Myocardial infarction (`HP_0001658`)](#pair3)     
- [hydrochlorothiazide (`CHEBI_5778`) - Myocardial infarction (`HP_0001658`)](#pair4)  

*Note*. All output is presented twice for each analysis, the first without any metadata/evidence and the second time, with metadata. This is done to facilitate readability.
___

**benazepril (`CHEBI_3011`) - Myocardial Infarction (`MONDO_0005068`)** <a class="anchor" id="pair1"></a>

*Shortest Paths*

In [None]:
# look at all shortest paths between the nodes in pair
shortest_paths = list(nx.all_shortest_paths(kg, obo.CHEBI_3011, obo.MONDO_0005068))
formats_path_information(kg, shortest_paths, path_type='shortest', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=True)

*Simple Paths*

In [None]:
# look at all simple paths between the nodes in pair
simple_paths = []; counter = 0
for path in tqdm(nx.all_simple_paths(undirected_kg, source=obo.CHEBI_3011, target=obo.MONDO_0005068, cutoff=10)):
    simple_paths += [path]
    if counter == 100: break
    else: counter += 1

In [None]:
# print path information -- without definitions and metadata
formats_path_information(kg, simple_paths, path_type='simple', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=False, rand=True, sample_size=10)

In [None]:
# print path information -- with definitions and metadata
formats_path_information(kg, simple_paths, path_type='simple', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=True, rand=True, sample_size=10)

<br>

**hydrochlorothiazide (`CHEBI_5778`) - Myocardial Infarction (`MONDO_0005068`)** <a class="anchor" id="pair2"></a>

*Shortest Paths*

In [None]:
# look at all shortest paths between the nodes in pair
shortest_paths = list(nx.all_shortest_paths(kg, obo.CHEBI_5778, obo.MONDO_0005068))
formats_path_information(kg, shortest_paths, path_type='shortest', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=True)

*Simple Paths*

In [None]:
# look at all simple paths between the nodes in pair
simple_paths = []; counter = 0
for path in tqdm(nx.all_simple_paths(undirected_kg, source=obo.CHEBI_5778, target=obo.MONDO_0005068, cutoff=10)):
    simple_paths += [path]
    if counter == 100: break
    else: counter += 1

In [None]:
# print path information -- without definitions and metadata
formats_path_information(kg, simple_paths, path_type='simple', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=False, rand=True, sample_size=10)

In [None]:
# print path information -- with definitions and metadata
formats_path_information(kg, simple_paths, path_type='simple', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=True, rand=True, sample_size=10)

<br>

**benazepril (`CHEBI_3011`) - Myocardial infarction (`HP_0001658`)** <a class="anchor" id="pair3"></a>

*Shortest Paths*

In [None]:
# look at all shortest paths between the nodes in pair
shortest_paths = list(nx.all_shortest_paths(kg, obo.CHEBI_3011, obo.HP_0001658))
formats_path_information(kg, shortest_paths, path_type='shortest', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=True)

*Simple Paths*

In [None]:
# look at all simple paths between the nodes in pair
simple_paths = []; counter = 0
for path in tqdm(nx.all_simple_paths(undirected_kg, source=obo.CHEBI_3011, target=obo.HP_0001658, cutoff=10)):
    simple_paths += [path]
    if counter == 100: break
    else: counter += 1

In [None]:
# print path information -- without definitions and metadata
formats_path_information(kg, simple_paths, path_type='simple', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=False, rand=True, sample_size=10)

In [None]:
# print path information -- with definitions and metadata
formats_path_information(kg, simple_paths, path_type='simple', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=True, rand=True, sample_size=10)

<br>

**hydrochlorothiazide (`CHEBI_5778`) - Myocardial infarction (`HP_0001658`)** <a class="anchor" id="pair4"></a>

*Shortest Paths*

In [None]:
# look at all shortest paths between the nodes in pair
shortest_paths = list(nx.all_shortest_paths(kg, obo.CHEBI_5778, obo.HP_0001658))
formats_path_information(kg, shortest_paths, path_type='shortest', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=True)

*Simple Paths*

In [None]:
# look at all simple paths between the nodes in pair
simple_paths = []; counter = 0
for path in tqdm(nx.all_simple_paths(undirected_kg, source=obo.CHEBI_5778, target=obo.HP_0001658, cutoff=10)):
    simple_paths += [path]
    if counter == 100: break
    else: counter += 1

In [None]:
# print path information -- without definitions and metadata
formats_path_information(kg, simple_paths, path_type='simple', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=False, rand=True, sample_size=10)

In [None]:
# print path information -- with definitions and metadata
formats_path_information(kg, simple_paths, path_type='simple', metadata_func=metadata_formatter, metadata_dict=metadata_dict, node_metadata=node_data_dict, verbose=True, rand=True, sample_size=10)