Skip to content

Annotation protocol

maxneal edited this page Jul 31, 2020 · 33 revisions

Rules of thumb

  • Be as precise as possible with your annotations
  • Use a composite annotation wherever possible when annotating codewords
  • If the codeword’s meaning cannot be captured with the composite annotation structure, use a singular annotation that provides a precise definition. If a precise term for a singular annotation is not available, send a term request to the curators of the appropriate knowledge resource.
  • Provide semantic context for custom terms used in composite annotations: Assert each custom physical entity and process as a subclass of some knowledge resource reference term so that software tools can place custom terms within the broader landscape of biomedical semantics.

With SemGen we aim to guide and constrain the annotation process to ensure a high level of quality and consistency in semantic annotation. If you have suggestions about how to improve quality and/or consistency, please contact Maxwell Neal: mneal { at } uw [ dot ] edu.

Model-level annotations

The following are some basic recommendations for applying metadata to the model as a whole (such as curatorial information). The question of how this metadata should be serialized across modeling formats remains open.

  • Describe the taxon in which the simulated processes occur using an NCBI organismal taxonomy reference term
  • Provide the PubMed ID and title of the model’s reference publication
  • Provide a list of authors associated with the model
  • Provide the contact name and email for those who annotated the model
  • Provide a concise description of the model (the reference publication abstract could be used).

Codeword-level annotation

In this section we provide an ordered list of recommended steps for annotating a model's codewords, i.e. those model structures that represent quantities of physical entities, physical processes, physical energy differentials, and physical dependencies.

1. Provide free-text definitions for all codewords

This step simplifies subsequent annotation steps by removing the need to look up definitions in external sources such as journal articles. It provides a minimal, human-readable, semantic definition of the model. Model search tools can also leverage free-text definitions to help researchers retrieve models of interest.

  • Collect from source publication(s), in-line comments, communication with authors, etc.
  • Spell out acronyms
  • Do not require model users to look elsewhere to understand the biophysical meaning of a codeword or sub-model

2. Confirm auto-added physical property annotations, add manually as needed

When loading an un-annotated model into SemGen, the software will attempt to accurately identify the OPB:Physical property that a codeword represents based on the codeword’s physical units and its mathematical relationship to other codewords. However, these assignments are not guaranteed to be correct.

  • Always use an OPB term
  • In ambiguous cases, SemGen will not include a property annotation; it will be the user’s responsibility to specify it. In the future, SemGen will offer suggestions on which OPB terms might be appropriate.

3. Annotate the codewords that represent properties of physical entities

Use the following recommended reference ontologies or create a custom term if an ontology term with the precise meaning is not available:

  • Foundational Model of Anatomy (FMA) for macromolecular structures on up
  • Mouse Adult Gross Anatomy (MA) ontology for rodent-specific anatomy
  • Cell Type Ontology (CL) for cell types not provided by the FMA
  • Gene Ontology:cellular component for subcellular structures not in the FMA
  • Protein Ontology (PR) for proteins (UniProt can also be used, but unless proteins in a model must be disambiguated by their associated taxon, we recommend using terms from PR that are not taxon-specific)
  • Chemical Entities of Biological Interest (ChEBI) for atoms and small molecules (e.g. metabolites)
  • Ontology for Biomedical Investigations (OBI) for laboratory materials

4. Annotate the codewords that represent properties of physical processes

Always create a custom term for physical processes.

  • Provide a precise, human-readable definition for the new term.
  • Define the term logically by specifying the dynamical sources, sinks and mediating entities that participate in the process (SemGen provides an interface for this that allows the user to select from the physical entities entered in step 3).

Rationale: Currently there is no reference ontology for multi-scale biological processes that associates the processes with their physical entity participants. This information is crucial for automating model composition because it indicates how the processes affect the thermodynamic states of the physical entities in the system. This information is also critical for intelligently re-formulating conservation and flow equations during the model merging process. The long-term vision is to eventually create a reference ontology of logically-defined biological processes by “harvesting” the custom process terms in SemSim models and aggregating them into a single knowledge base.

Setting sources, sinks and mediators: Source participants are those physical entities that are consumed when the rate of the process is positive. Sink participants are those physical entities that are produced when the rate of the process is positive. Mediator participants are the physical entities that are necessary for the process to occur and whose physical properties modulate the rate of the process, but whose amounts remain unchanged by the process. For chemical reactions, the sources are the reaction's reactants and the sinks are the reaction's products. Mediators are chemicals required for the reaction to occur and that influence the rate of the reaction, but are neither consumed nor produced.

The directionality of processes may not always be clear from descriptions in the model's source publication or associated flow diagrams. Annotators should inspect the model's computational code to ensure that sources, sinks, and mediators are assigned in a way that is consistent with the model's mathematical formulations. We recommend examining conservation equations in the model, if present, when assigning sources and sinks for processes. For example, in electrophysiology models, the temporal rate of change of an intracellular ionic species is often a function of the sum of various transmembrane currents moving ions across the cell membrane. By examining how those currents alter the amount of the ionic species, it becomes clearer which species are source participants and which are sinks. When making this determination for a specific current, it can be helpful to examine how the ionic species amount will change according to its conservation equation when the current is positive and all other currents are set to zero. If the equation indicates that the species amount decreases when the current is positive, then the solved species is a source: it is consumed when the process is positive.

5. Annotate the codewords that represent properties of physical energy differentials

Conventionally, instances of energy differentials are not given unique names, so they are always anonymous in SemSim models. Examples of properties of energy differentials include voltages, chemical potentials, fluid pressures, and temperatures (for heat transfer processes).

  • Define the energy differential logically by specifying the dynamical sources and sinks that generate the differential (SemGen provides an interface for this that allows the user to select from the physical entities entered in step 3).
  • If a sink is unspecified, the sink is considered to be whatever "ground" is applicable, given the OPB property term used in the composite annotation. For example, the ground for a codeword annotated as an OPB:Voltage would be the electrical ground; the ground for a codeword annotated as an OPB:Temperature would be the ambient temperature.

Rationale: There is no comprehensive set of knowledge resource terms that define specific energy differentials, and so they are defined logically within SemSim models by virtue of the dynamical sources and sinks that generate them.

Examples: In the case of a transmural fluid pressure that is formulated as the chamber pressure inside a blood vessel minus the external pressure, then the blood inside the vessel is the source and the fluid external to the vessel is the sink. For chamber pressures, the source is the blood inside the vessel and the sink is left unspecified. The sink is therefore assumed to be the common "ground" for all fluid pressures in the model.

For membrane voltages, the energy differential is generated by the difference in ionic charge inside the cell and outside the cell. Many electrophysiology models consider the resting potential of a cell to be near -70 mV. At this potential, the charge outside the cell is greater than inside. Thus, the collective ionic species inside the cell constitute the energy differential's source and the ionic species outside the cell constitute the sink.

6. Annotate the codewords that are properties of constitutive dependencies using terms subclassed under OPB:Constitutive property.

These include curve-shaping constants, reaction rate parameters, resistances, and other properties that are defined by the relationship between two or more disjoint physical properties.

  • Apply an OPB property to the codeword's composite annotation and leave the rest of the composite empty. Available OPB terms for these annotations are the leaf classes under the "Dynamical constitute property" sub-tree.
  • In future versions, SemGen will be able to automatically determine the constitutive dependency associated with the physical property based on the OPB annotation.

Delineating and annotating sub-models

A SemSim sub-model is defined as a model section comprised of one or more codewords and the computations that solve them. Any codewords required for these computations, but which are not explicitly included in the sub-model, go “along for the ride” as input parameters when extracting a submodel.

  • Provide a free-text definition for the sub-model
  • Identify the codewords in the sub-model and any sub-models that it subsumes