Dead simple owl design pattern (DOS-DP) exchange format
For details please see:
Dead Simple OWL Design Patterns David Osumi-Sutherland, Melanie Courtot, James P. Balhoff and Christopher Mungall Journal of Biomedical Semantics 2017 8:18 DOI:10.1186/s13326-017-0126-0
The job of editing the GO and many other OBOish OWL ontologies increasingly involves specifying OWL design patterns. We need a simple, light-weight standard for specifying these design patterns that can then be used for generating documentation, generating new terms and retrofitting old ones. The solution must be readable and editable by anyone with a basic knowledge of OWL and the ability to read manchester syntax. It must also be easy to use programatically without the need for custom parsers - i.e. it should follow some existing data exchange standard.
Human readability and editability requires that Manchester syntax be written using labels, but sustainability and consistency checking requires that the pattern record IDs.
Patterns are specified in the subset of YAML that can be converted to JSON.
- But YAML is much easier than JSON for humans to edit (it can be difficult for human editors to keep curly braces and quotes balanced and to add commas correctly in JSON). YAML also has the great advantage over JSON of allowing comments to be embedded. Conversion between YAML and JSON is trivial
All patterns contain dictionaries (hash lookups) that can be used to lookup up OWL shortform IDs from labels. OWL ShortFormIDs are assumed to be sufficient for entity resolution during usage of the pattern. Labels are assumed to be sufficient for entity resolution within a pattern.
Variable interpolation into Manchester syntax and text is specified using printf format strings. Variable names are stored in associated lists.
Variables are specified in a dictionary with variable name as key and value as range specified as a Manchester syntax expresssion.
The following is a quick guide to commonly used fields, with an emphasis on OBO-specific (derived) fields:
Manchester syntax expressions use names (labels). These are always single quoted inside an expression that is double quoted. (Note single quotes in term names must be escaped).
- pattern_name (string): the name of the pattern. No spaces or special characters allowed.
- description(string): Text describing the pattern and its uses. For use in documentation - not in OWL files.
- classes (associative array): hash lookup for OWL classes used in the pattern. key = name, value = ID
- relations (associative array): hash lookup for OWL object properties (relations) used in the pattern. key = name, value = ID
- vars (associative array): a hash lookup for vars in the pattern, key = var name, value = range expressed as manchester syntax.
- name (associative array):
- text (string): sprintf label text
- vars (string): list of vars for interpolation of class names into sprintf of text.
- def (associative array):
- text (string): sprintf definition text.
- vars (array): List of vars for interpolation of class names into sprintf of text.
- EquivalentTo (associative array):
- text (string): Sprintf OWL Manchester syntax string. All OWL entities must be quoted; %s not quoted).
- vars (array): List of vars for interpolation into sprintf owl MS text.
Draft yaml example - import_into_cell
Draft json example - import_into_cell
- test converstion of YAML to JSON
- validate against JSON schema (e.g. see test_schema.py)
Recommendations for additional validation outside of JSON schema:
For all printf fields,
- test length of var array matches number of interpolation slots in string
- test that all var names are valid for the pattern
For printf_owl fields:
- Check quoted names in the printf field correspond to dictionary entries in the pattern.
Tests against referenced ontologies:
- Are the entities in the dictionaries present and non-obsolete in the latest releases of the relevant ontologies?
- Are the readable names up to date ?
- For all printf manchester syntax strings: Is a valid Manchester syntax string generated when variable slots are filled using the range for each variable?
Validation when creating instances:
- Are values for variable slots present and non-obsolete in the latest releases of the relevant ontologies?
- Are values for the variable slots in the specified range?
The aim of this project is to specify a simple design pattern system that can easily be consumed, whatever your code base. This repository includes a simple Python validator (src/simple_pattern_tester.py).
For implementation, we recommend dosdp-tools.