***
***

<img width='700' src="https://user-images.githubusercontent.com/8030363/108961534-b9a66980-7634-11eb-96e2-cc46589dcb8c.png" style="vertical-align:middle">

## OWL-NETS Application - Example

***

**Author:** [TJCallahan](http://tiffanycallahan.com/)  
**GitHub Repository:** [PheKnowLator](https://github.com/callahantiff/PheKnowLator/wiki)  
**Wiki Page:** [OWL-NETS-2.0](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0)  
**Release:** **[v2.0.0](https://github.com/callahantiff/PheKnowLator/wiki/v2.0.0)**  
  
<br>  

`OWL-NETS` (NEtwork Transformation for Statistical learning) is a computational method that reversibly abstracts Web Ontology Language (OWL)-encoded biomedical knowledge into a more biologically meaningful network representation. OWL-NETS generates semantically rich knowledge graphs that contain heterogeneous nodes and edges and can be used for tasks that do not require OWL semantics. The algorithm consists of the following three steps:  
1. Decode all OWL-encoded classes  
2. Remove all triples that contain `subjects`, `predicates`, and/or `objects` that are needed to ensure OWL semantics, but are not biologically meaningful  
3. Purify the decoded knowledge graph to match an input [knowledge graph construction approach](https://github.com/callahantiff/PheKnowLator/blob/master/resources/construction_approach/README.md) (i.e. `subclass` or `instance`) 

**Resources:**  
> *Callahan TJ, Baumgartner Jr WA, Bada M, Stefanski AL, Tripodi I, White EK, Hunter LE. OWL-NETS: Transforming OWL representations for improved network inference. Pacific Symposium for Biocomputing 2018 Nov (pp. 133-144).*  
*[Article Access](https://www.worldscientific.com/doi/abs/10.1142/9789813235533_0013)*

<br>
 
**Notebook Purpose:**  
Provide an example of how to run the `OWL-NETS` independent of the [pkt_kg](https://pypi.org/project/pkt-kg/) knowledge graph construction work flow. In this notebook, we demonstrate how to apply `OWL-NETS` to the [`Human Phenotype Ontology`](https://hpo.jax.org/). 

*Generated Output:* 
- `OWLNETS_edgelist.txt` ➞ A tab-delimited file containing 3 columns (i.e. subject, predicate, object) each populated with a Universal Resource Identifier  
- `OWLNETS_node_metadata.txt` ➞ A tab-delimited file containing 5 columns (i.e. node_id, node_label, node_synonym, node_dbxref, and namespace) for each node in the edge list     
- `OWLNETS_relations.txt` ➞ A tab-delimited file containing 3 columns (i.e. relation_id, relation_label, and namespace) for each relation in the edge list  
- `OWLNETS.nt` ➞ An `n-turtle` formatted file containing the `OWL-NETS` graph    
- `OWLNETS_NetworkxMultiDiGraph.gpickle` ➞ A Networkx MukltiDiGraph representation of the `OWL-NETS` graph    
- `OWLNETS_deocding_dict.pkl` ➞ A dictionary of important metadata from running `OWL-NETS`    


<br>

**Assumptions:**   
- `pkt_kg` has been downloaded (example code to download the library is shown below)  
- A URL to the ontology you want to download OR ensure that the pre-downloaded ontology is located within your current working directory. An argument to set the working directory is provided below.  
- [OWLTools]() if you have cloned the `pkt_kg` library from GitHub then you don't need to do anything, otherwise follow the directions on the `OWLTools` wiki.

<br>


 
_____
***

### Program Workflow  
* [Set-Up Environment](#environment-setup)
* [Download Ontology Data](#data-download)  
* [Run OWL-NETS](#run-owlnets)  
* [Finalize Output](#finalize-output)


## Set-Up Environment <a class="anchor" id="environment-setup"></a>
_____

#### Python Libraries

If you have not cloned `pkt_kg` from GitHub, you need to install the latest version of the library from `PyPI`. TO do this, uncomment the second and third lines in the code chunk below.

In [None]:
# # uncomment and run to install any required modules from notebooks/requirements.txt
# import sys
# !{sys.executable} -m pip install -r requirements.txt

In [None]:
# # if running a local version of pkt_kg, uncomment the code below
# import sys
# sys.path.append('../')

In [None]:
# import needed libraries
import os
import pkt_kg as pkt
import psutil
import ray
import re

from collections import Counter
from functools import reduce
from rdflib import Graph, Namespace, URIRef, BNode, Literal
from rdflib.namespace import OWL, RDF, RDFS
from tqdm import tqdm

#### Dependencies

If you have not cloned `pkt_kg` from GitHub, you need to install the latest version of [OWLTools](https://github.com/owlcollab/owltools) from GitHub. Run the code chunk below to do this (only if you have not cloned `pkt_kg` from GitHub). Then, set the location where you downloaded `OWLTools` to in the "Define GLobal Variables" chunk below.

In [None]:
# move into pkt_kg/libs/ directory
cwd = os.getcwd()
os.chdir('../pkt_kg/libs')

# download owltools and update permissions
os.system('wget https://github.com/callahantiff/PheKnowLator/raw/master/pkt_kg/libs/owltools')
os.system('chmod +x owltools')

# move back to roject working directory
os.chdir(cwd)

#### Define Global Variables

In [None]:
working_dir = '../owlnets_output'

# make sure working directlry exists
if not os.path.exists(working_dir): os.mkdir(working_dir)
    
# set path to owltools location   
owltools_location = '../pkt_kg/libs/owltools'

# cpus
cpus = psutil.cpu_count(logical=True)

<br>

## Ontology Data Download <a class="anchor" id="data-download"></a>
_____

Downloads the [Human Phenotype Ontology](http://purl.obolibrary.org/obo/hp.owl) (`HPO`) file using the PURL URL (i.e. `http://purl.obolibrary.org/obo/hp.owl`).

In [None]:
# Specify the url for the ontology you want to download
url = 'http://purl.obolibrary.org/obo/hp.owl'

# download ontology data
filename = working_dir + '/' + url.split('/')[-1][:-4] + '_with_imports.owl'

if not os.path.exists(filename):
    command = "{} {} --merge-import-closure -o {}"
    os.system(command.format(owltools_location, url, filename))

In [None]:
# load downloaded ontology into memory
print('Loading Ontology Data Downloaded From: {}'.format(url))
graph = Graph().parse(filename)

### Extract Ontology Metadata

Prior to running `OWL-NETS` we want to extract metadata for the nodes and edges. For this project we download labels for both nodes and relations. In addition to downloading labels for nodes, we also download synonyms and any database dross-references (dbxRefs). Metadata for nodes and relations are stored in the `entity_metadata` dictionary under the keys `nodes` and `relations`. An example of what the dictionary looks like is shown below:

<br>

```python
entity_metadata = {
    'nodes': {
        'http://purl.obolibrary.org/obo/HP_0001052': {
            'labels': 'Nevus flammeus',
            'synonmys': 'nevus simplex|port-wine stain',
            'dxbrefs': 'snomedct_us:416377005|snomedct_us:254211001|msh:d019339|umls:c0235752|meddra:10067193',
            'namespace': 'HP'
        }...},
    'relations': {
        'http://purl.obolibrary.org/obo/RO_0002231': {
            'labels': 'has start location',
            'namespace': 'RO'
        }...}
}

```

In [None]:
# create empty node metadata dictionary
entity_metadata = {'nodes': {}, 'relations': {}}

*Extract Node Metadata*

In [None]:
ont_classes = pkt.utils.gets_ontology_classes(graph)
ont_labels = {str(x[0]): str(x[2]) for x in list(graph.triples((None, RDFS.label, None)))}
ont_synonyms = pkt.utils.gets_ontology_class_synonyms(graph)
ont_dbxrefs = pkt.utils.gets_ontology_class_dbxrefs(graph)

In [None]:
# add the metadata to the master metadata dictionary
for cls in tqdm(ont_classes):
    # get class metadata - synonyms and dbxrefs
    syns = '|'.join([k for k, v in ont_synonyms[0].items() if v == str(cls)])
    dbxrefs = '|'.join([k for k, v in ont_dbxrefs[0].items() if v == str(cls)])
    
    # extract metadata
    if '_' in str(cls): namespace = re.findall(r'^(.*?)(?=\W|_)', str(cls).split('/')[-1])[0].upper()
    else: namespace = str(cls).split('/')[2]
    
    # update dict
    entity_metadata['nodes'][str(cls)] = {
        'label': ont_labels[str(cls)] if str(cls) in ont_labels.keys() else 'None',
        'synonyms': syns if syns != '' else 'None',
        'dbxrefs': dbxrefs if dbxrefs != '' else 'None',
        'namespace': namespace 
    }

*Extract Relation Metadata*

In [None]:
ont_objects = pkt.utils.gets_object_properties(graph)

In [None]:
# add the metadata to the master metadata dictionary
for obj in tqdm(ont_objects):
    # get object label
    label_hits = list(graph.objects(obj, RDFS.label))
    label = str(label_hits[0]) if len(label_hits) > 0 else 'None'
    
    # get object namespace
    if 'obo' in str(obj) and len(str(obj).split('/')) > 5: 
        namespace = str(obj).split('/')[-2].upper()
    else:
        if '_' in str(obj): namespace = re.findall(r'^(.*?)(?=\W|_)', str(obj).split('/')[-1])[0].upper()
        else: namespace = str(obj).split('/')[2]
    
    # update dict
    entity_metadata['relations'][str(obj)] = {'label': label, 'namespace': namespace}

# add RDF:type and RDFS:subclassOf
entity_metadata['relations']['http://www.w3.org/2000/01/rdf-schema#subClassOf'] = {'label': 'subClassOf', 'namespace': 'www.w3.org'}
entity_metadata['relations']['https://www.w3.org/1999/02/22-rdf-syntax-ns#type'] = {'label': 'type', 'namespace': 'www.w3.org'}

<br>

## Run OWL-NETS <a class="anchor" id="run-owlnets"></a>
_____

Run the `OWL-NETS` algorithm. We pass a standard set of input parameters, ones which are unlikely to change for any application, unless you are interested in obtain a knowledge-purified version of the original input knowledge graph (i.e. `kg_construct_approach`). To learn more about this option, please see the project wiki page ([here](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0)). To help make it easier to understand what `OWL-NETS` does, the program is broken down into it's component steps below. If you want to run the full program, you can do the following instead:

``` python
# initialize owlnets class
owlnets = pkt.OWLNETS(graph=graph,
                      write_location=write_dir,
                      filename=filename,
                      kg_construct_approach=None,
                      owl_tools=owltools_location)
# run algorithm
owlnets_res = owlnets.runs_owlnets(cpus)
ray.shutdown()
```

In [None]:
# print stats for original graph before running OWL-NETS
pkt.utils.derives_graph_statistics(graph)

In [None]:
# initialize owlnets class
owlnets = pkt.OwlNets(graph=graph,
                      write_location=working_dir + '/',
                      filename=filename.split('/')[-1],
                      kg_construct_approach=None,
                      owl_tools=owltools_location)


#### Remove Disjointness Axioms

The first step is to remove all `owl:disjointWith` axioms from the graph. These axioms are removed because they are a form of negation in the graph and cannot currently be presented in another way while also being biologically menaingful.

In [None]:
# remove disjointness
owlnets.removes_disjoint_with_axioms()

#### Remove Semantic Triples

Creates a filtered knowledge graph, such that only nodes that are owl:Class/owl:Individual connected via a `owl:ObjectProperty` and not an `owl:AnnotationProperty`. For example:

REMOVE - edges needed to support owl semantics (not biologically meaningful):  
- subject: `http://purl.obolibrary.org/obo/CLO_0037294`   
- predicate: `owl:AnnotationProperty `   
- object: `rdf:about="http://purl.obolibrary.org/obo/CLO_0037294"`

KEEP - biologically meaningful edges:
- subject: `http://purl.obolibrary.org/obo/CHEBI_16130`
- predicate: `http://purl.obolibrary.org/obo/RO_0002606`
- object: `http://purl.obolibrary.org/obo/HP_0000832`

In [None]:
# remove triples used only to support semantics
cleaned_graph = owlnets.removes_edges_with_owl_semantics()
filtered_triple_count = len(owlnets.owl_nets_dict['filtered_triples'])

print('removed {} triples that were not biologically meaninginful'.format(filtered_triple_count))

#### Decode OWL Classes
The algorithm used to decode all OWL classes is shown below, which works to unwind class that are constructed from `owl:Restrictions` and the OWL constructors `owl:UnionOf` and `owl:intersectionOf`. Please see the [wiki](https://github.com/callahantiff/PheKnowLator/wiki/OWL-NETS-2.0) for more details.

<img src="https://user-images.githubusercontent.com/8030363/110973128-120a8600-831a-11eb-9064-9bf02608da73.png" width="450" height="600" align="left"/>
    

In [None]:
# gather list of owl:class and owl:axiom entities
owl_classes = list(pkt.utils.gets_ontology_classes(owlnets.graph))
owl_axioms = []
for x in tqdm(set(owlnets.graph.subjects(RDF.type, OWL.Axiom))):
    src = set(owlnets.graph.objects(list(owlnets.graph.objects(x, OWL.annotatedSource))[0], RDF.type))
    tgt = set(owlnets.graph.objects(list(owlnets.graph.objects(x, OWL.annotatedTarget))[0], RDF.type))
    if OWL.Class in src and OWL.Class in tgt: owl_axioms += [x]
    elif (OWL.Class in src and len(tgt) == 0) or (OWL.Class in tgt and len(src) == 0): owl_axioms += [x]
    else: pass
node_list = list(set(owl_classes) | set(owl_axioms))

print('There are:\n-{} OWL:Class objects\n-{} OWL:Axiom Objects'. format(len(owl_classes), len(owl_axioms)))

In [None]:
# decode owl semantics
decoded_graph = owlnets.cleans_owl_encoded_entities(node_list)
decoded_graph = owlnets.gets_owlnets_graph()

In [None]:
# update graph to get all cleaned egdes
owlnets.graph = cleaned_graph + decoded_graph

In [None]:
# print owlnets results
str1 = 'Decoded {} owl-encoded classes and axioms. Note the following:\nPartially processed {} cardinality ' \
               'elements\nRemoved {} owl:disjointWith axioms\nIgnored:\n  -{} misc classes;\n  -{} classes constructed with ' \
               'owl:complementOf;\n  -{} classes containing negation (e.g. pr#lacks_part, cl#has_not_completed)\n' \
               'Filtering removed {} semantic support triples'
stats_str = str1.format(
    len(owlnets.owl_nets_dict['decoded_entities'].keys()), len(owlnets.owl_nets_dict['cardinality'].keys()),
    len(owlnets.owl_nets_dict['disjointWith']), len(owlnets.owl_nets_dict['misc'].keys()),
    len(owlnets.owl_nets_dict['complementOf'].keys()), len(owlnets.owl_nets_dict['negation'].keys()),
    len(owlnets.owl_nets_dict['filtered_triples']))
print('=' * 80 + '\n' + stats_str + '\n' + '=' * 80)

**Make Graph Single Connected Component**  
Depending on the source ontology that you apply `OWL-NETS` to, it's possible that the decoded graph may contain more than a single connected component. If you'd like to ensure that the graph contains only a single connected component, run the code chunk below.

*How Does it Work?*  
<u>Goal</u>: Ensure the resulting graph is connected without reducing the biological meaningfulness of the resulting graph  

<u>Solution</u>: Enforce that the highest ancestor node for each node in the graph is `rdfs:subClassOf` of a specific ontology class. The node used can have a significant impact on the resulting graph so the selected node should be done with caution     
- `BFO` entity ([BFO_0000001](http://purl.obolibrary.org/obo/BFO_0000001)) is the default choice    
- If you prefer a different ontology concept be used, update the code chunk below by replacing the `http://purl.obolibrary.org/obo/BFO_0000001` with your preferred URI

In [None]:
# run line below if you want to ensure resulting graph contains 
common_ancestor = 'http://purl.obolibrary.org/obo/BFO_0000001'
owlnets.graph = owlnets.makes_graph_connected(owlnets.graph, common_ancestor)

#### Write OWL-NETS Results

The following output files are generated after running the `OWL-NETS` algorithm:
- `OWLNETS.nt` ➞ An `n-turtle` formatted file containing the `OWL-NETS` graph    
- `OWLNETS_NetworkxMultiDiGraph.gpickle` ➞ A Networkx MukltiDiGraph representation of the `OWL-NETS` graph    
- `OWLNETS_deocding_dict.pkl` ➞ A dictionary of important metadata from running `OWL-NETS`  

In [None]:
# save and write owl-nets results
owlnets.write_location = working_dir
owlnets.write_out_results(owlnets.graph)

## Finalize Output <a class="anchor" id="finalize-output"></a>
_____

This section generates the following additional output files from the `OWL-NETS` decoded knowledge graph:  
- `OWLNETS_edgelist.txt` ➞ A tab-delimited file containing 3 columns (i.e. subject, predicate, object) each populated with a Universal Resource Identifier  
- `OWLNETS_node_metadata.txt` ➞ A tab-delimited file containing 5 columns (i.e. node_id, node_label, node_synonym, node_dbxref, and namespace) for each node in the edge list     
- `OWLNETS_relations.txt` ➞ A tab-delimited file containing 3 columns (i.e. relation_id, relation_label, and namespace) for each relation in the edge list 

### Edge List
Write out the `OWL-NETS` results as a `tab-delimited` text file. An example of the output is shown in the table below: 

subject | predicate | object
:--: | :--: | :--:
http://purl.obolibrary.org/obo/UBERON_0003060 | http://purl.obolibrary.org/obo/RO_0002202 | http://purl.obolibrary.org/obo/UBERON_0005721
http://purl.obolibrary.org/obo/HP_0009402 | http://purl.obolibrary.org/obo/RO_0000086 | http://purl.obolibrary.org/obo/PATO_0001512
http://purl.obolibrary.org/obo/CHEBI_35804 | http://www.w3.org/2000/01/rdf-schema#subClassOf | http://purl.obolibrary.org/obo/CHEBI_133748
http://purl.obolibrary.org/obo/HP_0100717 | http://www.w3.org/2000/01/rdf-schema#subClassOf | http://purl.obolibrary.org/obo/HP_0011061

In [None]:
# set filename for writing edge list
edge_list_filename = working_dir + '/' + 'OWLNETS_edgelist.txt'

# write out results
with open(edge_list_filename, 'w') as out:
    out.write('subject' + '\t' + 'predicate' + '\t' + 'object' + '\n')
    for row in tqdm(owlnets.graph):
        out.write(str(row[0]) + '\t' + str(row[1]) + '\t' + str(row[2]) + '\n')

### Node Metadata
Write out metadata for all of the `OWL-NETS` nodes as a `tab-delimited` text file. The file contains the following columns:
- <u>node_id</u>: A node identifier in the form of a resolvable URI.  
- <u>node_namespace</u>: A string contain the namespace for the ontology.  
- <u>node_label</u>: A string containing the node's label (If no value provided then 'None')
- <u>node_synonyms</u>: A `|`-delimited string of node synonyms (If no value provided then 'None')
- <u>node_dbxrefs</u>: A `|`-delimited string of node database cross-references (If no value provided then 'None')

<br>

An example row of data is shown in the table below:

node_id | node_namespace | node_label | node_synonyms | node_dbxrefs
:--: | :--: | :--: | :--: | :--:
http://purl.obolibrary.org/obo/GO_0097164 | GO | ammonium ion metabolic process | ammonium ion metabolism\|ammonium metabolic process | None
http://purl.obolibrary.org/obo/CHEBI_36242 | CHEBI | 3-(4-hydroxyphenyl)pyruvate | hpp\|3-(p-hydroxyphenyl)pyruvate\|4-hydroxyphenylpyruvate\|3-(4-hydroxyphenyl)pyruvate\|3-(4-hydroxyphenyl)-2-oxopropanoate\|p-hydroxyphenylpyruvate | um-bbd_compid:c0235\|reaxys:3950858\|beilstein:3950858\|pmid:11948155\|pmid:14593448
http://purl.obolibrary.org/obo/HP_0009293 | HP | Broad middle phalanx of the 4th finger | broad middle bone of the 4th finger | umls:c4024463
http://purl.obolibrary.org/obo/HP_0009643 | HP | Bullet-shaped distal phalanx of the thumb | bullet-shaped outermost bone of the thumb | umls:c4024260


In [None]:
# set filename for writing node metadata
node_metadata_filename = working_dir + '/' + 'OWLNETS_node_metadata.txt'

# get all unique nodes in OWL-NETS graph
nodes = set([x for y in [[str(x[0]), str(x[2])] for x in owlnets.graph] for x in y])

# write out results
with open(node_metadata_filename, 'w') as out:
    out.write('node_id' + '\t' + 'node_namespace' + '\t' + 'node_label' + '\t' + 'node_synonyms' + '\t' + 'node_dbxrefs' + '\n')
    for x in tqdm(nodes):
        if x in entity_metadata['nodes'].keys():
            namespace = entity_metadata['nodes'][x]['namespace']
            labels = entity_metadata['nodes'][x]['label']
            synonyms = entity_metadata['nodes'][x]['synonyms']
            dbxrefs = entity_metadata['nodes'][x]['dbxrefs']
            out.write(x + '\t' + namespace + '\t' + labels + '\t' + synonyms + '\t' + dbxrefs + '\n')

### Relations

Write out metadata for all of the `OWL-NETS` relations as a `tab-delimited` text file. The file contains the following columns:
- <u>relation_id</u>: A relation identifier in the form of a resolvable URI.  
- <u>relation_namespace</u>: A string contain the namespace for the ontology.  
- <u>relation_label</u>: A string containing the relation's label (If no value provided then 'None')

<br>

An example row of data is shown in the table below:

relation_id | relation_namespace | relation_label
:--: | :--: | :--:
http://purl.obolibrary.org/obo/chebi#is_tautomer_of | CHEBI | is tautomer of
http://purl.obolibrary.org/obo/uberon/core#indirectly_supplies | UBERON | indirectly_supplies
http://purl.obolibrary.org/obo/cl#has_not_completed | CL | has_not_completed
http://purl.obolibrary.org/obo/RO_0002380 | RO | branching_part_of
http://purl.obolibrary.org/obo/pato#has_relative_magnitude | PATO | has_relative_magnitude


In [None]:
# set filename for writing relation metadata
relation_filename = working_dir + '/' + 'OWLNETS_relations.txt'

# get all unique nodes in OWL-NETS graph
relations = set([str(x[1]) for x in owlnets.graph])

# write out results
with open(relation_filename, 'w') as out:
    out.write('relation_id' + '\t' + 'relation_namespace' + '\t' + 'relation_label' + '\n')
    for x in tqdm(relations):
        namespace = entity_metadata['relations'][x]['namespace']
        labels = entity_metadata['relations'][x]['label']
        out.write(x + '\t' + namespace + '\t' + labels + '\n')