The Genomic Epidemiology Entity Mart (GEEM) is a web portal for examining and downloading ontology-driven specifications for standardized data components. The portal aims to provide subject matter experts and software developers with ways to review, comment on, and utilize ontology terms without the need to be trained in ontology curation or querying, and to show how OWL 2.0 ontologies are a great platform for detailing standards with controlled vocabularies. The GEEM platform has finished its first round of development, and is available to explore at http://watson.bccdc.med.ubc.ca/geem/portal.html. User documentation is at: https://genepio.org/geem-user-guide-introduction/.
Screenshot of GEEM Portal for reviewing and downloading specifications and standards
Ontology-driven standards benefit from features of open-source published OWL 2.0 ontologies such as globally unique identifiers for terms, multilingual label and definition functionality, and logical validation and reasoning over controlled vocabularies. Such a specification can be designed to satisfy the requirements of an environmental pH measurement, or a person's age, or a more structured entity like a contact address, or a genomic sequence repository submission for example. Essentially, an application ontology - a collection of terms and relations from other ontologies that combine to model the operation of some domain - can be extended for use in GEEM by adding certain annotations and relations that bring portions of its content into a form that GEEM can present and distribute as specifications.
A key motivation for this design is the idea that ontology vocabularies should be at the core of a star network data conversion model to connect domain-specific data silos. Rather than entertain peer-to-peer data translation projects, a data silo curator can develop a converter for the ontology "hub" vocabulary. GEEM works with http://OBOFoundry.org/ family of ontologies expressed in OWL 2.0, enabling an ontology curator to create a specification for each set of numeric, categorical, or ordinal fields they wish to share in a more structured way. GEEM expects these specifications to be placed under the Ontology For Biomedical Investigation (OBI) "data representational model" class. Popular open source tools like Stanford's Protege can be used to curate these specifications along with the ontologies they are composed of.
A pragmatic use-case for developing specifications is to begin with standards for data repository submissions. Our test cases so far involve mainly a variety of genomic sequence curation and submission standards described in the Genomic Epidemiology Ontology (GenEpiO).
Example of how a particular standard - the NCBI Antibiogram Standard for data submissions - can be organized within a family of related standards, and annotated for GEEM display
Ontology development introduces extra complexity over and above regular data dictionaries and object oriented design schemas. GEEM aims to show the benefits delivered by an ontology approach via tangible web forms and downloadable specifications in JSON, YAML and soon Microsoft Excel templates.
GEEM will also be targeting the specification format of ontology and mobile friendly data collection and curation tools like EpiCollect, RedCap, and Stanford's Cedar. However, there is much work to do to standardize broader domain vocabularies and relations (not to mention secure access) in order for the ultimate vision of global data silo integration and seamless querying to be fulfilled.