Editors manual

Note - this page needs updating - particularly to link to new SOPs

AIMS

A set of well defined terms for use in annotation
- Definitions should be easy for biologists to read and understand and should include references to the literature.
- Terms should have extensive synonyms so that curators and users of the ontology can easily find the term they need - no matter what names they are familiar with from the literature. Where possible synonyms should also be linked to references
- Additional comments should clarify confusing and conflicting aspects of term usage.
A logically consistent classification and partonomy for use in searching and grouping annotations.
- Construct the ontology such that contradictions can be automatically flagged by reasoners:
  - declare disjointness
  - Add domain and range constraints to relations where possible.
- Use reasoner output to sanity check the results of your edits by eye.
  - e.g.- check inferred superclasses for edited terms and those related to them using Protege 4.
An ontology that can be used to return accurate lists of terms in response to a range of standard queries of use to biologists.
- With standard queries in mind (e.g.- find all the neurons that connect two specified parts of the brain), run test DL queries during editing
  - are the result of test queries correct? Are they complete?
An ontology that is scale-able and maintainable.
- It is very easy to build tangled, unmaintainable ontology structures and very difficult to untangle the results if you do.
  - to avoid tangle, maximise the proportion of classification that is inferred.
  - to make work easier for future editors, add good written definitions, references, comments and extensive synonyms.
  - Use and document ontology design patterns

Introduction to anatomy ontologies

Anatomy ontologies are queryable classifications of anatomical structures. They are commonly used by bioinformatics resources to provide controlled vocabularies for annotating a range of entities (such as research papers, genes and genotypes). Typically curation is done manually and consists of assertions about phenotypes and expression patterns, but many other types of assertion are possible and ontologies are also used in conjunction with text mining to automatically annotate mentions in text.

For manual annotation, class and part hierarchies in anatomy ontologies provide terms with a range of specificity, allowing curators to choose an appropriately specific term depending on the information available. Textual definitions of terms, ideally supplemented with images, are important for consistent and accurate manual annotation as term names on their own are frequently ambiguous.

Anatomy ontologies are also used to group annotations in biologically meaningful ways. Typically, this is done by grouping annotations using class and part hierarchies (partonomy). For example, a query for genes expressed in the Drosophila leg could return gene expression annotated with the term middle leg (a subclass of leg) and claw (a part of the leg) as well as with the term leg. The usefulness of such grouping depends, of course, on the accuracy of classification and of assertions about partonomy. More sophisticated groupings can be achieved by taking advantage of ontology semantics expressed in a formal language such as OWL (for example, see Virtual Fly Brain).

OBO and OWL

OWL 2 Web Ontology Language (OWL2) is a W3C recommendation description logic. Its rigorous definition, web integration and the wide availability of fast reasoners make it a very attractive language to use for ontology building and end use. The EL profile of OWL2 is particularly attractive as reasoning times scale very well with increasing size and complexity and new, fast reasoners are available that take advantage of this e.g. elk.

For various historical and technical reasons, the Drosophila anatomy ontology (DAO) is built and maintained in OBO format. OBO format semantics are defined via translation to OWL in the OBO 1.4 specification. The OWLtools and OBOformat libraries provide a means to translate between OBO an OWL following this standard.

With 2 minor exceptions*, expressiveness of the DAO is currently limited to the OWL-EL profile, which corresponds approximately to the OWL expressiveness of OBO format - or at least those elements of it with tooling support.

(* Some object properties have inverses and specify a domain).

For more background and details of OBO to OWL translation please see these slides

Entities in OBO and OWL - OBO Foundry standard

All entities in the DAO have both a meaningless identifier and a human readable label. Entity IDs follow the OBO foundry ID standard, with OBO IDs taking the form (expressed as Perl REGEX).

"$IDP:\d{8}"

Translated to an OWL URI as

"http://purl.obolibrary.org/obo/$IDP_\d{8}"

Where $DAO is an ID Prefix

All anatomical classes in the DAO have the IDP FBbt. All classes corresponding to Drosophila stages of processes in the Drosophila Stage Ontology (DSO) have the IDP FBdv.

In OBO, the human readable label is the value of the tag 'name'. In OWL, it is defined by an annotation property axioms using the annotation property 'rdfs:label'.

How classification works

(For readability, IDs are omitted in the examples that follow).

In OBO, classes and subclasses are related using is_a:

name: thoracic segment
is_a: segment

In OWL the term SubClassOf is used: 'thoracic segment' SubClassOf segment

Classification can use many different types of criteria. For example

what something is part of (the e.g.- prothoracic segment is part of the thorax)
what it develops from (vPN projection neurons develop from the neuroblast ALv1)
what neurotransmitter it releases (e.g.- a cholinergic neuron releases acetylcholine as a neurotransmitter).

In order to record such criteria for class membership in a computationally tractable form, we use relations (in OWL, these are called object properties). Each ontology contains a set of defined relations that can be used to relate different ontology terms in order to specify the criteria for class membership.

So, for example, we can use the relation 'part_of' to record that, in order to be classified as a T1 segment, an anatomical structure must be part of some thorax. In OBO:

name: prothoracic segment
relationship: part_of thorax.

In OWL: 'prothoracic segment' SubClassOf part_of some 'thorax'

The meaning of both forms is captured by the English sentence "all prothoracic segments are part of some thorax".

It is important to note that these statements don't specify sufficient criteria for class membership. We have stated that, in order to be classified as a prothoracic segment, a structure must be part of an thorax. But clearly, this is not enough information to classify something as a prothoracic segment. Wings, halteres and legs are part of every thorax but are not prothoracic segments.

For some classes at least, we can go further and specify all the information necessary for determining if something belongs to a particular class. For example, we can define a prothoracic leg as "Any leg that is part of some prothoracic segment".

In OBO this is expressed as:

name: prothoracic leg
intersection_of: leg
intersection_of: part_of prothoracic segment

In OWL: 'prothoracic leg' EquivalentTo leg that part_of some 'prothoracic segment'

. The utility of this approach is more clear for cases where many instances of a structure exist in a single animal. So for example, a reasoner can take the following three statements:

'vPN projection neuron' SubClassOf 'neuron'
'vPN projection neuron' SubClassOf capable_of _some 'acetylcholine secretion, neurotransmission (GO:0014055)'
'cholinergic neuron' EquivalentTo neuron that capable_of _some 'acetylcholine secretion, neurotransmission (GO:0014055)'

and conclude that: 'vPN projection neuron' _SubClassOf_ 'cholinergic neuron'.

This means that there is no need to assert the classification 'vPN projection neuron' SubClassOf 'cholinergic neuron'. In an ontology with lots of different types of classification, it rapidly becomes difficult to keep track of all the different assertions of classification that are needed. Letting a reasoner do as much of this work as possible therefore produces a maintainable ontology.

However, some asserted classification will always be necessary as some important types of classification are extremely difficult to formalise.

Classes still need human-readable definitions

Logically consistent classification is important, but an ontology is only useful (and maintainable) if all humans that interact with it, users, curators and editors, can quickly find the terms they need and understand what they refer to. This requires clear, unambiguous, human readable definitions. It also requires extensive addition of synonyms to cope with varied usage.

Non-overlapping classes

Some classes never overlap. No one thing can be both a process (such as gastrulation) and an object (such as your head). Where we are certain that two classes are non-overlapping, it is useful to add this information to the ontology. Doing so serves at least a couple of important functions. Firstly, it serves a useful function in error checking. If, you accidentally assert or infer classification between two classes that you declared can never overlap, a reasoner can tell you. Secondly, it speeds up reasoning.

IN OBO:

- name: x
- disjoint_from: y

IN OWL:

- X DisjointWith Y

Using terms from other ontologies

We can and should use the hard work of other ontologists to make structuring our own ontologies easier. For example, GO maintains a hierarchy of sensory process terms that are perfect for classifying sensory neurons and sensory organs according to their sensory modality. For this reason, we make extensive use of terms from GO, and increasingly use terms from PATO and CHEBI.

The ontologies we use are constantly evolving. It is therefore important to regularly update the terms we rely on for classification in our own ontologies. We have a scripted mechanism for this. For details, please see the SOP for importing or updating foreign terms

Relationship of editor version to release versions

The edited version of the anatomy ontology is named fbbt-edit.obo. At the time of writing, we also maintain a file of OWL axioms (FBbt-ext.owl) that cannot be translated into OBO. For now at least, this should be used ONLY for adding GCIs with expressiveness limited to the OWL-EL profile. We also maintain a file of OWL metadata containing details of licensing and authorship (fbbt_auth_attrib_licence.owl). A job running on our [Jenkins Continuous Integration server] adds automated definitions where applicable, knits these three files together, runs various syntax and consistency checks and, if checks are passed, rolls a standard set or release files. These files include a full OWL version (fbbt-non-classified.owl) and 'simple' OBO version used for loading into FlyBase CHADO (fbbt-simple.obo). For more details of the various versions available please see the downloads_guide. For details of where these files live, as well as others used for the release process and official releases, please see the repository guide. Details of the release process can be found in our public release SOP.

Before you edit an OBO file

Advice and guidelines for co-ordinating large scale work on an OBO ontology can be found here. For small scale edits (to one or a few terms), you can ignore this advice.

Naming your term:

Rules:

All names should be singular nouns.
Names should not be capitalised except where they begin with a proper name (e.g.- Johnston's Organ)
Avoid special characters: use only alpha-numeric, space, -, /, apostrophe.

Advice:

Bear in mind that users will often encounter terms in isolation. Long descriptive names (within reason!) are therefore preferable, especially where there is obvious potential for confusion. For example, calyx (which simply means cup), could refer to a structure in the oviduct or the mushroom body, depending on your field of specialisation. It is therefore better to use 'mushroom body calyx' and 'oviduct calyx' than simply 'calyx' alone.

Try to maintain consistent patterns of naming where possible. However, it may make sense to override this in order to conform to common usage. For example, 'wing disc' is preferable to 'dorsal mesothoracic disc', despite the fact that 'dorsal prothoracic disc' keeps its latinate name because there is no commonly used, plain english equivalent for it.

More advice on naming can be found in the FB ontologies style guide. Where this is not sufficient the OBO naming guide](http://obofoundry.org/wiki/index.php/Naming) or the GO curator guide may be useful.

Definitions:

AIM: a reasonably succinct statement about the class allowing curators/users to easily distinguish it from other, similar classes and which captures key points of interest about that class with links to the literature. It should capture assertions made in the formal part of the definition (the relationships) as closely as possible without becoming stilted and difficult to read.

Basic definition structure:

A <genus> that <diff1> and <diff2>.  It also <diff3>....  predominantly/mainly/mostly <blah>.

Where genus is a general classification and differentia (<diff1>, <diff2> etc...) state what differentiates this class from others that share the same general classification. Finally, optionally add (sparingly) statements that apply to only some members of the class where there are no more specific class terms to which these details could be added.

Evidence and citation

Definitions should consist of assertions about a class. They should NOT include reasons for believing those assertions to be true. These should be recorded in comments.

e.g.-
- name: ab1
- def: A large, olfactory basiconic sensillum of the 3rd antennal segment.
- comment: This class of sensillum has been shown by direct electrophysiological assay to be excited by various odours (Reeve et al., 2005).

As far as possible, all assertions about a class made in a definition should be referenced (see defDBxref section below for syntax). If different references are used for different aspects of the definition, then include references in the text as you would for any piece of scientific writing. To do this, use the first part of a miniref in brackets, e.g.: (Reeve, 2009), (Reeve and Ashburner, 2009), (Reeve et al., 2009).

Or, if subject authors/paper are the subject of a sentence use: Reeve et al., 2009, ...

e.g.- Yorozu et al., 2009, identify tonically activated....

The content of definitions

It is difficult to specify, a priori, what assertions should be included in a textual definition. Here are a few guidelines:

DO: Make sure your definition is consistent with that of superclasses^.
DO: Make sure your textual definition includes the information that is recorded in direct formal relationships to the class.
Avoid: assertions about structures that are not part of the structure being described except where they pertain to some direct relationship with the structure being described (e.g. connected or adjacent to).
Avoid: including details that could better be included in terms referring to subtypes or subparts of the structure.
DO NOT: Information about what happens in mutant backgrounds. For example, do not use a differentium of the form : 'that is lost in glass mutant animals'
Limit: Information that applies to only some members of the class. This should only be used sparingly. Where it is added to the definition, it should be made clear that it does not apply to all members of the class.
If at all possible: avoid using gene expression as a differentium.
Avoid: extensive repetition of assertions made in superclass term definitions^ unless providing direct evidence for class membership.

References for definitions and synonyms

FlyBase:FBrfnnnnnnn
ISBN:
PMID:
FBC:<curator_initials>
http://....

Comments

The comment field should be used for:

Evidence

In some cases it is useful to know the type of evidence for an assertion. This can not be recorded in the definition, but can be recorded in comments. Where this is done, references should be included in the text, as you would for any piece of scientific writing.

Disambiguation

Sometimes a single term is used in the literature with multiple meanings. In such cases, a comment should be added outlining these different uses and how they relate to the standard definition for the term in the ontology.

Potential merges and splits.

Comments on potential term merges or splits for which sufficient evidence is not yet available. Such comments should include references.

Synonyms:

Extensive addition of synonyms helps searching. Add as many as you like. Use references for these where possible. Use you judgment in assigning a category to your synonym, e.g.- broad, narrow, exact, related. These are deliberately kept vague as they function as a bridge from ontology to language.

Asserting classification

In order to keep the ontology maintainable, asserted classification should be limited where possible. Ideally, all terms will have only a single asserted parent - either in a regular relationship (SubClassOf/is_a) or as Genus in an EquivalentClass/intersection. Given the presence of suitable defined classes and sufficient relationships for the term you are making, additional classification can be inferred automatically using a reasoner.

However, as we are limited in what types of classification we are able to infer, you may need to assert multiple parents: 2 is_a parents are acceptable. If you absolutely must assert 3 is_a parents, then you should make a note with a suggestion for which of the asserted classifications are good candidates for formalising for inferred classification.

Inference of classification requires suitable defined classes to be present so that classification can occur automatically on the basis of properties (relationships) your class has. The next section deals with how to make these.

defined classes - Necessary and sufficient definitions:

You can limit asserted classification using defined classes (sometimes called cross-products (XPs), genus and differentia definitions, equivalent class or intersections). These terms have definitions that specify complete necessary and sufficient conditions for membership of a class. As a result, a reasoner can auto-classify the ontology by searching for terms that fulfil these conditions.

XP definitions approximate to the plain English sentence: Any X that (is) rel some Y [and (is) rel some Z...]. X is referred to as the Genus. Each clause following it (that rel some Y) is a differentium. Each XP definition can have 1-many differentia.

For example, say the term antennal sense organ has the XP definition meaning: "Any sense organ that is part of some antenna". A reasoner given this definition and relationships stating that 'ab1 is a sense organ' and 'ab1 is part of some antenna' can conclude that ab1 is_a antennal sense organ.

syntax

In OBO -

intersection_of: X
intersection_of: rel Y
[intersection_of: rel Y]

(square brackets indicate 0-many allowed)

or in OWL Manchester Syntax (MS):

EquivalentTo: X and rel some Y [and rel some z]
(square brackets indicate 0-many allowed)

This approximates to the plain English definition

"Any X that (is) rel some Y [and rel some Z]."

e.g.

name: antennal sense organ
intersection_of: sense organ
intersection_of: part_of some antenna

(OWL(MS) - EquivalentTo: 'sense organ' and part_of some 'antenna')

Candidates for XP definitions include classes whose only differentium is:

what it is part of (e.g.- antennal sensillum)
what it innervates
what its axon(s) innervate
what its dendrite(s) innervate
its function (e.g.- olfactory neuron)
what neuron projection bundle it fasciculated with

Using differentia that require new foreign terms

Sometimes, a new differentium will require a term from some foreign ontology to be imported into your XP file. This is currently done by hand. Simply copy the following fields from the stanza of the term in question into your working copy of the XP file.

[Term] id name def

The next time the import script is run, the appropriate related terms will be imported.

Making new defined classes in OBO-Edit

Add new root class (Don't add a new child to an existing term - you are aiming to automate classification not assert it.)
Add appropriate genus and differentia to the cross-product tab of the text editor.
Commit

Checking defined classes in OBO-Edit

(note: testing in P4 may be quicker; it may be more efficient to add a few such classes before checking the results.)

Re-run the reasoner.
check classifications in graph viewer
check classification via the search tab: Select terms that [have] a [Name] that [Contains] om [Ancestor] that can be reached via [is_a]

Making new defined classes in Protege 4

(Currently, this should only be done for testing purposes as there is, as yet, no safe roundtripping between OBO and OWL2 )

Make new root class
Add EquivalentTo axiom to "Equivalent classes" in the description window.
Run the reasoner
To check results - check descendant classes for the new term in the DL tab.

Textual definitions of terms with XP definitions

As part of the release cycle, these terms get auto-generated text definitions. For example, the above example would get the automatic definition: "Any sense organ that is part of some antenna."

However, in some cases it can be useful to add more detail and/or to add references. To add references alone, without any definition, use '.' for the def. The term will still get an autodef in addition to the references added. However, if you add more than this to the definition, the auto-def will not be added. In this case, some part of the manually added definition should convey the same meaning as the auto-def would have done.

How to track terms with necessary and sufficient definitions.

Render in OBO-Edit with:

'select terms that [have] a [is intersection]'

In Protege 4, the presence of an equivalent class definition is indicated by '≡'

Patterns of generalisation

Ontologies are good at capturing generalisations, e.g.- all trichoid sensilla are mechanosensory. But broad generalisations are dangerous, as they often have exceptions. So, be wary of capturing broad generalisations, especially where direct evidence for them is not presented. Such generalisations are frequently found in text books, reviews, or the introductions to papers.

The safest approach is to keep generalisations as specific as possible and to use an ontology reasoner to classify. However, high level generalisations are more efficient. We can use them to add large amounts of useful information simply by adding a single axiom to a general class. It may also be the case that the published evidence says nothing about specific subclasses, so it makes more sense (to both users of the ontology and subsequent editors) to attach an assertion and the evidence and references for it, to a general term.

When choosing to add a general assertion to the ontology you should ask: How good is the evidence? Is it necessary to identify all members of the class in order for this generalisation to be safe? If so, are we certain that all members of the class being generalised about have been identified (e.g.- it is possible to identify all members of a clone descending from a specific neuroblast but relying only on enhancer traps markers for a particular class can be easily miss some (e.g.- see various papers on Antennal lobe Projection Neurons).

If you choose to capture a broad generalisation, then

a. you should state that generalisation clearly and unambiguously in the textual definition along with a reference for its origin. b. Where possible, you should include a brief summary of the evidence for the assertion in the comment field.

There are two forms that generalisations can take:

Generalisations inherited due to asserted classification.

An Example.

In OBO:

name: macrochaeta
relationship: has_function_in detection of mechanical stimulus involved in sensory perception
is_a: sensillum
+
name: dorsocentral bristle
is_a: macrochaeta
+
name: mechanosensory sensillum
intersection_of: sensillum
intersection_of: capable_of 'detection of mechanical stimulus involved in sensory perception'
=>
- dorsocentral bristle
- relationship: capable_of detection of mechanical stimulus involved in sensory perception {inferred}
- +
- dorsocentral bristle
- is_a: mechanosensory sensillum {inferred}

In OWL:

macrochaeta SubClassOf capable_of some 'detection of mechanical stimulus involved in sensory perception'
macrochaeta SubClassOf sensillum
'dorsocentral bristle' SubClassOf macrochaeta
'mechanosensory sensillum' EquivalentTo sensillum that capable_of some 'detection of mechanical stimulus involved in sensory perception'
=>
- dorsocentral bristle SubClassOf capable_of some 'detection of mechanical stimulus involved in sensory perception' {inferred}
- dorsocentral bristle SubClassOf 'mechanosensory sensillum' {inferred};

In English:

All macrochaetae have function 'detection of mechanical stimulus involved in sensory perception'.
dorsocentral bristle is a macrochaeta.
Therefore:
- all dorsocentral bristle(s) have function 'detection of mechanical stimulus involved in sensory perception'.
macrochaeta is a sensillum.
Any sensillum that has function 'detection of mechanical stimulus involved in sensory perception' is a mechanosensory sensillum.
Therefore:
- dorsocentral bristle is a mechanosensory sensillum.

Scope for these is somewhat limited by our policy of limiting asserted classification.

Hidden assertions: generalisations inherited due to inferred classification.

e.g. All uniglomerular antennal lobe projection neuron that develop from some 'ventral PN neuroblast' axon_innervates some 'lateral horn'.

OWL:

EquivalentTo 'uniglomerular antennal lobe projection neuron' that develops_from some 'ventral PN neuroblast'
SubClassOf has_presynaptic_terminal_in some 'lateral horn'

OBO:

intersection_of: uniglomerular antennal lobe projection neuron
intersection_of: develops_from ventral PN neuroblast
relationship: axon_innervates lateral horn

OBO-Edit and Protege 4 currently have no direct way to track or search for terms with definitions of this form. The following can then be used to search for or render these in OBO-Edit

Select terms that [have] a [Subset] that [contains] the value {hidden_assertion}

Tracking inherited assertions

The most efficient way to track the effects of inherited assertions as you edit is to run your ontology in Protege 4 while editing in OBO-Edit or emacs. Inherited assertions are listed in the description panel under 'inferred anonymous superclass'. Protege 4 will prompt for re-load whenever the ontology changes on disc. After every re-load, you should run the reasoner.

Relations

See definitions in OBO file for latest official versions.

Researching the meaning of old terms

In editing the anatomy ontology, you will frequently come across terms whose origin, meaning and usage you need research. Here are a few pointers

First, check what the term actually refers to in the ontology.

What definitions and references does it have?
What definitions and references do its supertypes have?
What assertions are made in the ontology about this term?

Find this by checking the superclasses and inferred anonymous superclasses panel in Protege 4 (make sure you run the reasoner first.)

Checking references in ancient archival versions of the ontology

References that appear to have been lost in the mists of some ancient file format conversion can be viewed here

This referencing is unfortunately not good enough to import wholesale, but can provide useful clues to the original intentions behind making the term.

Checking what meaning(s) is/are suggested by previous usage in annotation

The following scripts find details of annotation with one or more terms provided as a list of OBO-style term IDs via STDIN**.

To find all references where input term(s) have been used in curation:

annotation_ref_finder.pl <path to modules.cfg>

To get get free text descriptions of phenotypes for a given cvterm:

phen_desc_by_cvterm.pl -a  <path to modules.cfg>

To get detailed information on expression curation for a given ref or list of references:

ex_hack.pl <path to modules.cfg>

** STDIN examples:

Single term:

echo 'OBO:ID' | ...
Multiple terms in return delimited list in file:

cat <file_path> | ...
DL query results list (requires reasoner to be initialised - see below):

(a) Initialise reasoner first:
```
 reason \<path to ontology file\> &
```
(b) then run your query (note -the sed command converts between OBO and OWL ID formats)
```
 query 'DL query' | sed 's/\_/:/'  | ...
```
Note - the DL query must have OWL style IDs in place of names. The output also has OWL - style IDs, hence the sed command to convert between formats.

Linguistic usage

It's sometimes worth surveying general term usage in the literature or more generally before deciding on a term name. In this, Google is your friend... ( and so is PubMed).

Dealing with dangerously ambiguous terms

It is not uncommon for the meaning of an existing term to be ambiguous. In particular, the implicit ontology term definition deduced from relationships, may not match usage in annotation. This is, at least in part, the result of lousy systems for presenting ontology terms to curators and so should improve in future.

What do we do in such cases?

In some cases, we may want to retrofit the existing data in Chado to be consistent with the ontology. However, it is also worth bearing in mind that others use our ontologies - particularly the anatomy ontology - for annotation we have no power to change. There are at least a couple of possible solutions:

Make replacement terms, one for each meaning and obsolete the mis-used term, adding a comment about the choice of replacements.
Make the meaning of the term general enough to cover both meanings. This may lead to some reduction in specificity of annotations, but at least none of the annotations will then be wrong. The commonest case here is where a term is commonly used in curation with a more general stage specificity than its definition (in the broad sense of the term - including things only specified in relationships) implies. In this case, it is best to make the term stage neutral and add more stage specific terms.