Skip to content
jefferis edited this page Apr 5, 2014 · 3 revisions

Ontologies

## Background on owl, elk and brain

(All of this information is, of course, available elsewhere. I've added it here so that it provides the basic knowledge and terminology needed to make sense of the query server specs).

### OWL

owl is a language for classifying individuals and classes via the facts (axioms) we record about them.

The fundamental types in OWL are: class, individual, object property (e.g part_of - used in expressions relating classes of individuals to each other), annotation property (non-logical components such as labels and definitions).

Every class, individual, object property or annotation property has a URI as an identifier. The terminal part of the URI (after a / or a #) is called the short-form identifier. Where the short-form identifier is not human readable, a human readable label is typically provided as well, usually using the annotation property 'rdfs:label'.

Axioms in OWL can be expressed in a number of dialects - the easiest to read being Manchester Syntax, which reads rather like English. Commonly used, built-in terms in Manchester syntax include: all; some; and/that; or; EquivalentTo; SubClassof. (from here on these will be in italics)

Perhaps the simplest axioms relate name classes to named subclasses e.g.

ac1 _SubClassOf_  'sense organ'

But we can also refer to classes anonymously using class expressions e.g. the class of all things that are part of some antenna is defined by the class expression: 'part_of some antenna'. Class expressions can be treated just like named classes, so we can make a named class a SubClassOf some anonymous class:

ac1 _SubClassOf_ __part\_of__ _some_ 'antenna'

Or we can make a named class EquivalentTo an anonymous class (providing necessary and sufficient conditions for class membership

'antennal sense organ' EquivalentTo 'sense organ' that part_of some 'antenna'

A reasoner, using all 3 axioms can automatically classify ac1 as a SubClassOf 'antennal sense organ'

We can also use anonymous class expressions to specify queries of the ontology. Starting from a named class, or a Manchester syntax class expression, we can use OWL reasoning software to return subclasses, superclasses, equivalent classes or members of the specified class

Reasoners:

We use software called reasoners to classify an ontology and to answer queries. Before querying, it is necessary to run a classification step. Typically, classification is much slower than querying.

Profiles:

There are various defined profiles for OWL For our purposes, we need only worry about EL and DL. DL is more expressive and lots of reasoners are available. Algorithms available for reasoning (classification) can be fast with ontologies below a certain size or complexity, but above this memory and CPU requirements scale badly. No parallel DL query algorithms are available so we can't take advantage of multiple cores to speed up reasoning.

EL is less expressive, but algorithms are available that scale very well (polynomially at worst) with size and complexity. Concurrent reasoning is also possible, so we can take advantage of multiple cores to speed reasoning.

FaCT++ and JFACT are fully expressive OWL-DL reasoners. Early versions of the VFB used these reasoners. Classification of the ontology version we use for VFB is long, but historically query times have been very short. But increases in size and complexity of the ontology have slowed both classification time and, more importantly, query time to the point where site performance is adversely affected.

### elk

elk is an EL reasoner. This means it has some limitations compared to a full DL reasoner. For example, it does not support ontologies or queries with 'or' in them. However, for our purposes it is safe to step outside the model, running multiple queries - one for each disjunctive (or) clause and then concatenating the results.

elk has one more limitation: It only allows queries with named classes. Queries with class expressions, e.g. "X that overlaps some Y", require named classes to be rolled and defined using axioms to assert their equivalence to the query expression. Queries can then be run with the named classes. However, classification is so fast (under 500ms with our anatomy ontology circa Dec 2012) that acceptable site performance is possible even with complete reclassification of the ontology. The most recent versions of elk take advantage of incremental reasoning, meaning that a shorter reasoning step than complete reclassification can be used when adding query classes.

The standard way to interact with OWL ontologies and reasoners is via the OWL-API. However, this is a rather complicated API with quite a steep learning curve. brain is a facade that simplifies interaction with the OWL-API + elk in many ways, making it very simple to load ontologies, add classes and run Manchester syntax queries. Importantly, for a single query there is no need to add a query class or to trigger classification directly. brain simply detects if you are trying to query with a class expression, rolls a new class, then triggers classification, runs the query and returns the answer. When querying with a named class, it also knows whether new axioms have been added to the ontology since the last classification and triggers classification if it does. This simplification also adds a limitation: if you want to query with multiple class expressions, you still need to roll query classes before running any queries to avoid triggering multiple rounds of classification.

At this time we use brain1.3, which has no support for reasoning with individuals and does not use a version of elk supporting incremental reasoning. We aim to move to using a newer version of brain with both of these features in the near future.

OWL files

Drosophila anatomy ontology (FBbt)

The edited, master copy of the anatomy ontology contains many axioms that are necessary for ontology maintenance but not for querying (at least not so far). For querying, and display, we use a derived version in which: (a) definitions have been automatically generated for logically defined terms lacking textual defs (b) inferred classifications have been non-redundantly asserted (c) equivalent class axioms + imported terms have been stripped.

Note that removal of imported terms means no direct queries are possible with GO terms (for function) or CHEBI terms (for neurotransmitter/hormone). We may wish to revisit this in future - producing a version that retains these imported terms.