SWEET IRI Patterns for Ontologies and Their Terms

Simon Cox edited this page Jan 9, 2018 · 3 revisions

Note to Readers

This document is overwhelmingly based off of documentation present on the Center for Expanded Data Annotation and Retrieval (CEDAR) Website. The documentation is primarily maintained by John Graybeal, we thank him for formalizing the content for use in SWEET.

The document does not reflect the official recommendation of the CEDAR project or its funders; in fact, it originate with the Marine Metadata Interoperability project, and is re-hosted on the CEDAR site as a courtesy.

IRIs: Opaque vs Transparent, Label vs Concept

While many recommend an 'opaque' format for the IRI, in which the meaning of the IRI is not apparent from its form (see for example [10]), we recommend the IRI be constructed according to a semantically meaningful format, as described below. While there are definitely some portability and persistence costs to this approach, we believe that when representing semantic terms, usability and social benefits are more important at this stage of semantic web development. We also consider that the costs of this approach over the long term can be mitigated by other means (for example, ensuring the persistence of the IRIs themselves, and creating mappings to previous formats).

Note that this approach implies and explicitly embeds an important semantic approach: defining names rather than concepts. To make clear this difference, let's take a step back.

Concepts in the real world, companies for example, may change their name; for example, the company called Apple Inc. may become Apple, even though the original company hasn't gone away. In that scenario, a unique persistent web resource for the company should continue to represent the company, and only the associated label should change. This behavior of real-world concepts causes many ontologies to recommend divorcing the unique identifier from the real-world name.

However, in the case of the terms represented by an ontology, we find that most ontologies use the meaningful label to help build the identifier of the concept. In other words, the unique part of the string used to identify a given term, say webResource, specifies the object described by the ontology. Therefore, we connect the term we are defining in a given vocabulary to its familiar string identifier, webResource in this case.

Over time, the meaning associated with this object may change; webResource in that community may come to mean resource used to support the internet rather than resource on the internet, and the original concept may acquire the label internetResource. Since the resource identifier is tied to the label itself, not the meaning, in our model the resource (and corresponding IRI) will now relate to a new meaning associated with webResource. If you think of the ontology as representing dictionary concepts, this makes sense; if you think of it as representing "pure" or abstract concepts, this is heretical. (Of course, if the versioned identifier is cited by users—see below—such evolutions of the label are not an issue, as the versioned identifer only refers to the term at that point in time.)

And, if you want your vocabulary to be constructed around codes or opaque strings, rather than meaningful strings, you can still do that in your ontology. Simply use the desired code as the unique component of the term's identifier, and provide a corresponding label in addition to the definition. Then the label can change over time, according to usage, without changing the code or its corresponding.

If disambiguation of a common string is required, it is possible with our approach, using one of several conceptual facets that can be incorporated in the IRI. These are described below.

While we expect this process can support terms that embed http-unfriendly characters (e.g., accents, unicode, or '/'), the ESIP COR (for example) does not claim full functionality for those situations. Therefore, basic ASCII characters are strongly encouraged for the unique identifiers for ontologies and terms.

Basic IRI/Path Construction for Ontology Files and Resources

We recommend organizing the presenting IRIs using several semantically meaningful patterns. We recognize that each semantic component makes the identifier less opaque, therefore less portable, and more subject to change. We consider the tradeoff useful in each case, and believe the monetary and social cost of identifier changes can be mitigated with proper semantic practices. However, we recommend the ontology creator or publisher fully consider their own design goals with respect to each part of the proposed pattern—while recognizing that each element of the path will be represented in some way in the final resolution of the ontology or term, even in a completely opaque IRI.

Standalone File on the Web

The IRI representing a single ontology file should follow the following basic scheme (some modifiers are described later):

https://{hostdomain}/{ontologiesRoot}/{authority}/{resourceIdentifier}

Ideally, the service where the ontology file is served can appropriately serve the file also with an extension according to its type (e..g, RDF, SKOS, or OWL), as in

https://{hostdomain}/{ontologiesRoot}/{authority}/{resourceIdentifier}.owl

so that search engines can classify the content appropriately. As this document is being written in 2007, this approach appears to be relatively uncommon.

The hostdomain and ontologiesRoot components are unavoidable since our goal is to serve the ontology at its resolvable IRI. The use of https as the protocol is now so highly recommended as to be a standard. Note that IRI redirection can be used by ontology providers (i.e., the repositories) to automatically rediect http (insecure) IRIs to their more secure equivalents. While this is semantically slightly dubious (which one is the real IRI?), its user-friendliness motivates our recommendation of this practice, and an OWL sameAs relation can be used to tightly bind the two IRIs for any tool that needs the connection.

The authority component is particularly useful in an environment where multiple organizational or conceptual authorities maintain ontologies, which may on occasion have identical topics; the authority provides a mechanism for disambiguation, by authority or other similar property. For a repository, the authority component helps associate the ontology with the actual group managing the ontology in the repository, though a one-to-one mapping between the two should not be assumed (by repository developer, ontology creator, nor ontology user).

Finally, the resourceIdentifier is a unique name for the entities being documented in the ontology. A name that is narrow enough to encompass all the likely components of the ontology, allowing for possible thematic growth, is ideal.

An example of this IRI scheme proposed here is:

https://sweetontology.net/human/Behavior

Web Resource in a Repository

In a repository, it is possible to serve the same ontology as multiple resources, using a combination of IRI extension and content negotiation.

IRI extensions can be performed (with examples) through file name extension (.owl), path extension (/download), or API query construction (?fileFormat=ttl). The first two suffer from the likelihood of collision between the extension content and the IRIs of ontology terms (see below), so using query construction is the preferred IRI extension approach for specifying alternate resources or resource views.

Content negotiation is a more elegant path, in that it allows the same content in different formats according to the requesting agent's preferred view. If the requesting agent is a browser, it will typically not request OWL or RDF content directly, allowing the repository to choose to present a nicely formatted page to the user. Whereas ontology editors should rather expect RDF/XML format, and so are likely to set the requested content type accordingly. Users who explicitly want to resolve the content as RDF/XML in their browser, for example because of browser extensions that are available, can also configure their browsers to meet that need.

There is another advantage to serving the ontology content as a web resource in a repository: additional features can be provided that are uniquely tailored to the versioning and presentation needs of semantic content. The following sections show how the IRI pattern above can be exploited by a suitably built ontology repository, to the benefit of ontology providers and users.

Historical note: In the early days of ontology repositories, many advanced users of ontologies did not consider ontology or concept versioning as a practical issue for the semantic tools. Their use of ontologies always used the latest version of any ontology, so while the provenance might be useful to have, it was not integrated into the life cycle of the ontology applications that performed advanced reasoning using the ontological content.

IRI Construction for Ontology Files: Additional Recommendations

These recommendations apply to the assignment of IRIs to ontology files. (Although in principal URNs could also be assigned to ontology files, current practice rarely if ever does so.)

  1. Create a version identifier to every variant of every semantic resource (ontologies or terms). A version can be any unique string, but we recommend repositories use an autogenerated unique string (because user-offered versions are routinely not updated), and we recommend in particular a timestamp pattern, as it is easily contextualized. We recommend placing the timestamp just before the ontology name, because it allows the end of the ontology to follow the ontology's name, and also follows the W3C pattern. So the first case is preferred over the second case: https://sweetontology.net/20080701T022342/human/Behavior. We have chosen a full timestamp pattern including hours, minutes and seconds; this simplifies disambiguation of versions when multiple version are submitted in a short period of time. A shorter timestamp string is OK, but you have to make sure every version is unique.
  2. The file name should not contain spaces, and should match the ontology name (and describe its contents). Use lower case for the entire IRI, to eliminate possible confusion or lack of resolution due to case conflicts. Concatenate separate words that make up any component names in the IRI. If you must use a separator, use understore "_". The use of “-“ inside the name is not recommend since it will sometimes confuses search engines. (The character “-“ means exclusion, and even if the string is searched as a quoted string (“word1-word2”) you are not guaranteed to get pages with that exact string. For example, searching for “"moored-buoy" in Google, returns not only pages containing “moored-buoy”, but also pages containing “moored buoy” and “moored.Buoy”. This conflict doesn’t occur if we use the underscore character.)
  3. The authority may be tailored for circumstances as required. For example, if three ontologies from an organization must have the same name, the organization may create a separate authority within the ontology repository (and within the IRI) for each ontology. Even if all 3 ontologies are in fact run by the same organization, the different auth fields can map to the same entity. This technique also can support organizational hierarchies within a single organization, but appreciate that if your organizational hierarchies change, IRIs may need to be changed. (We worry less about organization names changing, but that happens too, as in the example above.)
  4. If an ontology must be replaced by another one, use the Dublin Core [6] element isReplacedBy (http://purl.org/dc/terms/) to inform the user or the program about the new ontology. However, to create an OWL document with that element, the Dublin Core concept term should be present or imported, otherwise the ontology will not be valid. The property should go inside the ontology element. An RDF vocabulary or a SKOS vocabulary can use the Dublin Core property without any problem. If SKOS [9] is used, the property tag should go inside the ConceptScheme element.

IRI Construction for Ontology Terms

IRI for a Term

The IRI representing a a concept should follow the following basic scheme as above:

https://{hostdomain}/{ontologiesRoot}/{authority}/{resourceIdentifier}/{shortName}

As previously noted, the shortName is typically a shortened form of the term's label. As an identifier, the shortened name is part of the unique identifier, so it can't contain spaces when it is defined in the ontology.

An example of this IRI scheme is:

https://sweetontology.net/human/Behavior

IRIs for Ontology Terms: Additional Recommendations

  1. Techniques for resolving duplicate terms in vocabularies are beyond the scope of this proposals. In brief, each term in the ontology should be unique, and in many vocabularies this may be the case even for terms in many different hierarchical branches. However, if the vocabulary has the same term name appear in different hierarchies, it is necessary to disambiguate the different terms. Possible techniques for doing so include (a) specifying the complete path as part of the term name;
  2. specifying path components in reverse order until disambiguation is achieved;
  3. using a specific numbering or appending scheme that distinguishes specific terms, but can be easily disregarded by viewers.

We prefer approach 2, as it is the most intuitive while being the least intrusive. Further, we suggest '__' as a component separator, as it is unlikely to be duplicated in the term itself. In any case, the definition of the term should include the original name of the term in an attribute (attribute name to be specified). Unfortunately, we are not aware of any technique which consistently and unambiguously translates vocabulary term names into graceful unique names, and then reverses the process.

b) Creating consistently formed term names from unique vocabulary strings is also not explicitly addressed by this document. Briefly, we recommend terms be formed as camel case, without underscores or dashes, or other characters outside [A-Za-z0-9], and that they start with a letter. (Spaces and slashes are particularly unfortunate characters to embed in term names, or any other component of the IRI, as these must be escaped in URLs.) These forms are easily translated into URL components and other concepts on the web. If our recommended forms are not followed, substitution is usually necessary, and as noted above, the original name of the term should be preserved in an attribute.

Versioned and Unversioned IRIs

IRIs for the Unversioned Resource (Ontology or Term)

Versioned (timestamped) IRIs point to a specific version of an ontology or term; the description of that resource is nominally fixed throughout time.

However, in many applications, including most mapping and semantic inferencing activities, the identifier for the desired resource should be constant, even as specific aspects of that resource (definition, preferred label, or even its semantics) may change.

Here is a non-semantic example of this behavior: The web site http://nytimes.com always contains the most recent New York Times content, even as the content itself changes over time. (In the real world, "today's New York Times" similarly contains today's content, maybe different in different editions.) If you want to find yesterday's version of the New York Times, this web site location is of no immediate use. (Note: We are not talking here about the actual IRI representing this resource in the semantic web; the IRI of the resource for the concept described as "the latest New York Times content" could be entirely different, like urn:news:publications:papers:newyorktimes:daily. Yes, that can be confusing.)

The equivalent concept in the world of vocabularies is what does this term mean today? This is what an on-line dictionary will tell you—the latest definition of the term. But every time we change the meaning for a term in our repository, we create a new version, with a new (versioned) IRI. So how can we refer to the concept of the current meaning of the term MooredBuoy?

In SWEET we call this the unversioned form of the resource. To create an unversioned form of an ontology or term IRI, simply delete the version string (and slash) from the versioned IRI, and vice-versa. So, for example, the unversioned form of https://sweetontology.net/20080701T022342/human/Behavior is https://sweetontology.net/human/Behavior.

(Note: For this approach to be unambiguous, the version can never begin with an alphabetic character, and the resourceIdentifier can never begin with a digit. ESIP COR enforces this by convention, and hopes no one will break it.)

To say this yet another way: The resource identified by https://sweetontology.net/human/Behavior is the term in the ESIP COR platform ontology whose identifier is spelled Behavior. Even if the definition for this term changes, the unversioned IRI will represent the same old spelling, but now with the new definition. If the spelling of the term changes someday—say, people start calling these things character_trait, or character_trait_a new IRI with the new spelling must be created. If the original spelling is no longer needed and no longer has a meaning, it will be deprecated, but still available as a (historical) term in the ontology, and a corresponding resource. (And all the versioned IRIs with the original spelling still always exist, of course.)

When mappings are made to these unversioned resources, it is understood that the mapping is intended to persist through all versions of the ontology. This can be thought of as mapping the two labels together. It does not map the two meanings that correspond to those labels as of the time of the mappings, since those meanings may change over time. While it is tempting to do mappings to unversioned terms (because of the simplicity of inferencing), this can eventually lead to unpleasant consequences as vocabularies evolve. There may be technical reasons to perform unversioned mappings—for example, to identify related text—but expediency is not an ideal rationale. Unfortunately, it is a very practical strategy, given the limitations of mapping subsequent term versions over time.

Special note about unversioned ontologies: If an unversioned ontology is requested, ideally the user should receive the ontology containing all its terms (including deprecated terms), with the terms presented in unversioned form. This allows convenient mappings to be created to all the unversioned terms of an ontology. However, many repositories, like ESIP's COR, will simply return the most recent version of the ontology.

IRI Construction for Ontologies: Backus-Naur Specification

NOTE: This content has not yet been updated to match the above content.

Backus-Naur of Ontology IRI

In Backus-Naur form, the ontology resource specification is represented as follows:

<MMI-IRI> ::= “http://” <hostdomain> “/” <ontologiesRoot> “/”
<authority> "/" <version> “/” <resourceType> ".owl"

<version> := <ShortISO8601> | <NumberVersion>
<ShortISO8601> := <YYYYMM> | <YYYYMMDD> | <YYYYMMDD.hh> | <YYYYMMDD.hhmm> | <YYYYMMDD.hhmmss>
<NumberVersion> := <MajorVersionNumber> “.” <RevisionNumber>

In these schemes (and the ones in the following section), the following explanations hold:

  • hostdomain is the URL owned by the administering authority of the repository, and typically resolves to the server hosting the ontology. For example: mmisw.org
  • ontologiesRoot is the node of the host that serves as the base of the ontologies; typically it is a directory where the ontologies live. For example: /ont
  • authority is a code representing an administrative authority responsible for the particular set of terms. This may be the repository authority, or may be a different authority on whose behalf the repository is acting. For example: 'mmi' or 'cf' may be appropriate authority values. The authority is meant to provide a reference to the actual authority, not be a one-to-one mapping with the authority. There may be multiple authorities in a given organization: noaa, noaa1, noaa_dif, and so on.
  • version represents the publication timestamp (or part of it) of the resource or a version control number; the MMI default will be a publication date/time in ISO8601 form.
  • resourceType is the type of objects that are being represented by the vocabulary (as categorized by the vocabulary authority). Some examples are parameter, ucum_units, agu_index_terms and mmi_platforms. The authority may or may not be reflected in this term, which often is effectively the name of the ontology file.

Additional disambiguation may be necessary for a resourceType, for example for different types of files dealing with the same resource type, or multiple organizations dealing with that resource:

Backus-Naur of Term IRI

In Backus-Naur form, this is represented as follows:

<MMI-IRI> ::= “http://” <hostdomain> “/” <ontologiesRoot> “/”
<authority> "/" <version> “/” <resourceType> "/" <shortName>

<version> := <ShortISO8601> | <NumberVersion>
<ShortISO8601> := <YYYYMM> | <YYYYMMDD> | <YYYYMMDDThh> | <YYYYMMDDThhmm> | <YYYYMMDDThhmmss>
<NumberVersion> := <MajorVersionNumber> “.” <RevisionNumber>
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.