In [None]:
!pip install mowl-borg

# Syntactic embeddings of ontologies

Syntactic embeddings embedding uses the syntax of axioms to generate sentences out of them. mOWL provides methods to generate text sentences from the axioms and/or the annotations in the ontology. The syntax chosen to generate the sentences is [Manchester Syntax](https://www.w3.org/2007/OWL/draft/ED-owl2-manchester-syntax-20081128/).

In [None]:
import mowl
mowl.init_jvm("10g")

We import our `Family Ontology` and the method `extract_axiom_corpus`, which extracts the axioms from the ontology and generates sentences in *Manchester Syntax*.

In [None]:
from mowl.corpus import extract_axiom_corpus
from mowl.datasets import PathDataset
dataset = PathDataset("data/family.owl")
corpus = extract_axiom_corpus(dataset.ontology)
len(corpus)

Let's see the corpus generated:

In [None]:
for s in corpus[:10]:
    print(s)

Now it is possible to input this corpus in a model like Word2Vec, which will generate numerical representations for our vocabulary. We will use the `gensim` library to do this.

In [None]:
from gensim.models import Word2Vec

sentences = [s.split(" ") for s in corpus]
w2v = Word2Vec(sentences, epochs=200, vector_size = 50, min_count = 0)

Finally, we can provide a visual representation of the entities. We will use a modified version of TSNE, which is implemented in mOWL:

In [None]:
from scripts.tsne import TSNE

vectors = w2v.wv
vocab_dict = vectors.key_to_index
name_to_label = {c: c.split("/")[-1] for c in vocab_dict if str(c).startswith("http://")}
name_to_emb = {c: vectors[[c]][0] for c in name_to_label}

tsne = TSNE(name_to_emb, name_to_label)
tsne.generate_points(500, workers=4)

In [None]:
tsne.show(thickness=300)

## Data augmentation via reasoning

We can generate more axioms by performing reasoning over the current ontology. mOWL provides access to ELK and Hermit reasoners. Those reasoners can be accessed using the OWLAPI directly or using the `MOWLReasoner` wrapper class that provides some shortcuts to reasoner methods.

In [None]:
from mowl.reasoning.base import MOWLReasoner
from org.semanticweb.HermiT import Reasoner

reasoner = Reasoner.ReasonerFactory().createReasoner(dataset.ontology)
reasoner.precomputeInferences()

mowl_reasoner = MOWLReasoner(reasoner)
classes_to_infer_over = list(dataset.ontology.getClassesInSignature())

subclass_axioms = mowl_reasoner.infer_subclass_axioms(classes_to_infer_over)
equivalence_axioms = mowl_reasoner.infer_equivalent_class_axioms(classes_to_infer_over)
disjointness_axioms = mowl_reasoner.infer_disjoint_class_axioms(classes_to_infer_over)

Once the axioms were generated, it is time to add them to the ontology:

In [None]:
from mowl.owlapi import OWLAPIAdapter

manager = OWLAPIAdapter().owl_manager

for ax in subclass_axioms:
    manager.addAxiom(dataset.ontology, ax)
for ax in equivalence_axioms:
    manager.addAxiom(dataset.ontology, ax)
for ax in disjointness_axioms:
    manager.addAxiom(dataset.ontology, ax)

Then we can do the embedding process with the updated ontology:

In [None]:
corpus = extract_axiom_corpus(dataset.ontology)
print(f"The inferred ontology contains {len(corpus)} axioms")

In [None]:
sentences = [str(s).split(" ") for s in corpus]
sentences = [[w.replace(",", "") for w in s] for s in sentences]
w2v = Word2Vec(sentences, epochs=200, vector_size = 50, min_count = 0)

vectors = w2v.wv
vocab_dict = vectors.key_to_index
name_to_label = {c: c.split("/")[-1] for c in vocab_dict if str(c).startswith("http://")}
name_to_emb = {c: vectors[[c]][0] for c in name_to_label}

tsne = TSNE(name_to_emb, name_to_label)
tsne.generate_points(500, workers=4)

In [None]:
tsne.show(thickness=300)