Skip to content

Commit

Permalink
update semantic annotations primer
Browse files Browse the repository at this point in the history
Edited text in semantic annotations primer.
  • Loading branch information
stevenchong committed Dec 29, 2018
1 parent a7f24f1 commit d922943
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions docs/eml-semantic-annotations-primer.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Semantic Annotations Primer (in progress)

## Introduction
A semantic annotation is the attachment of semantic metadata to a resource. Semantic metadata provides a precise definition of concepts and clarifies the relationships between concepts. Although the process of semantic annotation may seem tedious, the payoff is enhanced information retrieval and discovery. For example, if a dataset is annotated as being about "carbon dioxide flux" and another annotated with "CO2 flux" the information system should recognize that the datasets are about equivalent concepts. In another example, if a user performs a search for datasets about "litter" (as in "plant litter"), the system will disambiguate the term from other meanings of "litter" (as in garbage, the group of animals born at the same time, etc.). Yet another example is if a user searches for datasets about "carbon flux", then datasets about "carbon dioxide flux" will also be returned because "carbon dioxide flux" is considered a type of "carbon flux".
The purpose of this primer is to provide a gentle introduction to how semantic annotations are structured in EML documents. A semantic annotation is the attachment of semantic metadata to a resource. Semantic metadata provides a precise definition of concepts and clarifies the relationships between concepts. Although the process of creating semantic annotations may seem tedious, the payoff is enhanced information retrieval and discovery. For example, if a dataset is annotated as being about "carbon dioxide flux" and another annotated with "CO2 flux" the information system should recognize that the datasets are about equivalent concepts. In another example, if a user performs a search for datasets about "litter" (as in "plant litter"), the system will disambiguate the term from other meanings of "litter" (as in garbage, the group of animals born at the same time, etc.). Yet another example is if a user searches for datasets about "carbon flux", then datasets about "carbon dioxide flux" will also be returned because "carbon dioxide flux" is considered a type of "carbon flux".

A semantic annotation follows the Resource Description Framework (RDF) data model and uses semantic triples. A semantic triple is composed of a **subject**, **object property (predicate)**, and **object**. In general, the subject and object can be thought of as nouns in a sentence and the object property is akin to a verb or relationship that connects the subject and object. The semantic triple expresses a statement about the associated resource. Ideally, these components should be globally unique and should be resolvable uniform resource identifiers (URIs) from controlled vocabularies so that users can look up the definitions and relationships of the terms to other terms. An example is "http://purl.obolibrary.org/obo/ENVO_01001357", which resolves to the term "desert" in the Environment Ontology (ENVO) when entered into an address bar in a web browser. Users can find the definition for "desert" and determine its relationship to other terms in the ontology.
A semantic annotation follows the Resource Description Framework (RDF) data model and uses semantic triples. A semantic triple is composed of a **subject**, **object property (predicate)**, and **object**. In general, the subject and object can be thought of as nouns in a sentence and the object property is akin to a verb or relationship that connects the subject and object. The semantic triple expresses a statement about the associated resource. Ideally, the components should be globally unique and should be resolvable uniform resource identifiers (URIs) from controlled vocabularies so that users can look up the definitions and relationships of the terms to other terms. An example of a URI is "http://purl.obolibrary.org/obo/ENVO_01001357", which resolves to the term "desert" in the Environment Ontology (ENVO) when entered into the address bar of a web browser. Users can find the definition for "desert" and determine its relationship to other terms in the ontology.

### Additional background information
### Supplemental background information
* RDF data model: https://www.w3.org/TR/WD-rdf-syntax-971002/
* Tim Berners-Lee's article on the semantic web: ```Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientific american, 284(5), 34-43.```

## Semantic Annotations in EML 2.2.0
In **EML 2.2.0** there are 5 kinds of semantic annotations that can be made in an EML document: **dataset-level**, **entity-level**, **attribute-level**, **annotations** element and **additionalMetadata** element annotations.
In **EML 2.2.0** there are 5 types of semantic annotations that can be made in an EML document: **dataset-level**, **entity-level**, **attribute-level**, **annotations** element and **additionalMetadata** element annotations.

### Pattern for dataset-level, entity-level, and attribute-level annotations
Semantic annotations made at the **dataset-level**, **entity-level**, and **attribute-level** follow the same pattern. An annotation made at any of these levels involves inserting an `annotation` element containing a `propertyURI` element and a `valueURI` element within the appropriate element. The *subject* of this annotation is the containing element. *It is recommended to give the subject element an* `id` *attribute and make the subject the value of the* `id`. The `propertyURI` is the *object property* and the `valueURI` is the *object* of the annotation. For example, an attribute-level annotation involves an `attribute` element. Nested under the `attribute` element are `propertyURI` and `valueURI` elements. *The URIs should ideally point to terms in controlled vocabularies*. The `propertyURI` and `valueURI` elements can each have a `label` attribute that displays a more readable label suitable for display in application interfaces. *It is recommended that the labels are populated from the preferred labels field (skos:prefLabel) or label field (rdfs:label) from a controlled vocabulary*.
Expand All @@ -32,7 +32,7 @@ Multiple `annotation` elements may be embedded in the same dataset, entity-level

#### Dataset-level annotations

A dataset is defined as all of the information describing a data collection event. This event may take place over some period of time and include many actual collections (e.g. a time series or remote sensing application) or it could be just one actual collection (e.g. a day in the field). The `dataset` element encompasses all information about a single dataset. It is intended to provide overview information about the dataset: broad information such as the title, abstract, keywords, contacts, maintenance history, purpose, and distribution of the data themselves. A dataset can be (and often is) composed of a series of data entities (see 'entity-level annotation' section below) that are linked together by particular integrity constraints. Further information about datasets may be found in chapter 5.3.
A dataset is defined as all of the information describing a data collection event. This event may take place over some period of time and include many actual collections (e.g. a time series or remote sensing application) or it could be just one actual collection (e.g. a day in the field). The `dataset` element encompasses all information about a single dataset. It is intended to provide overview information about the dataset: broad information such as the title, abstract, keywords, contacts, maintenance history, purpose, and distribution of the data themselves. A dataset can be (and often is) composed of a series of data entities (see 'entity-level annotation' section below) that are linked together by particular integrity constraints. Further information about datasets may be found in the "eml-dataset module" section in chapter 5.3.

A dataset-level annotation represents a precisely-defined semantic statement that applies to a dataset. This semantic statement is used to associate precise measurement semantics with the dataset. A dataset-level `annotation` element is embedded in a containing `dataset` element. The subject of the semantic statement is the `dataset` element that contains the annotation. If the `dataset` element contains an `id` attribute, then the subject should be the value of the `id` attribute. Each annotation consists of a `propertyURI` element and `valueURI` element, which respectively define a property and a value (object) that apply to the dataset. Each URI should resolve to a controlled vocabulary that provides a precise definition, relationships to other terms, and multiple labels for displaying the statement. The associated `label` attribute for each URI can be used to display the property and value in a more readable format to users. Ideally, each `label` should be populated by a preferred label or label from a controlled vocabulary.

Expand Down Expand Up @@ -63,7 +63,7 @@ In the following dataset-level annotation (Example 1), the subject of the semant

#### Entity-level annotations

The entity-level elements include the `dataTable`, `spatialRaster`, `spatialVector`, `storedProcedure`, `view`, and `otherEntity` elements, in addition to custom modules. Entities are usually tables of data (`dataTable`). Data tables may be ascii text files, relational database tables, spreadsheets or other type of tabular data with a fixed logical structure. Related to data tables are views (`view`) and stored procedures (`storedProcedure`). Views and stored procedures are produced by a relational database management system or related system. Other types of data such as raster (`spatialRaster`), vector (`spatialVector`) or spatialReference image data are also data entities. An `otherEntity` element should be used to describe types of entities that are not described by any other entity type. The entity-level elements are nested under `dataset` elements. Further information about entities may be found in chapter 6.1.
The entity-level elements include the `dataTable`, `spatialRaster`, `spatialVector`, `storedProcedure`, `view`, and `otherEntity` elements, in addition to custom modules. Entities are usually tables of data (`dataTable`). Data tables may be ascii text files, relational database tables, spreadsheets or other type of tabular data with a fixed logical structure. Related to data tables are views (`view`) and stored procedures (`storedProcedure`). Views and stored procedures are produced by a relational database management system or related system. Other types of data such as raster (`spatialRaster`), vector (`spatialVector`) or spatialReference image data are also data entities. An `otherEntity` element should be used to describe types of entities that are not described by any other entity type. The entity-level elements are nested under `dataset` elements. Further information about entities may be found in the "eml-entity module" section in chapter 6.1.

An entity-level annotation represents a precisely-defined semantic statement that applies to an entity. This semantic statement is used to associate precise measurement semantics with the entity. An entity-level `annotation` element is embedded in a containing entity-level element. The subject of the semantic statement is the entity-level element that contains the annotation. If the entity-level element contains an `id` attribute, then the subject should be the value of the `id` attribute. Each annotation consists of a `propertyURI` element and `valueURI` element, which respectively define a property and a value (object) that apply to the entity. Each URI should resolve to a controlled vocabulary that provides a precise definition, relationships to other terms, and multiple labels for displaying the statement. The associated `label` attribute for each URI can be used to display the property and value in a more readable format to users. Ideally, each `label` should be populated by a preferred label or label from a controlled vocabulary.

Expand All @@ -90,9 +90,9 @@ In the following entity-level annotation (Example 2), the subject of the semanti

#### Attribute-level annotations

An attribute is a characteristic that describes a 'field' or 'variable' in a data entity, such as a column name in a spreadsheet. An attribute annotation represents a precisely-defined semantic statement that applies to an attribute. This semantic statement is used to associate precise measurement semantics with the attribute, such as the property being measured, the entity being measured, and the measurement standard for interpreting values for the attribute. `attribute` elements may be nested in entity-level elements, including the `dataTable`, `spatialRaster`, `spatialVector`, `storedProcedure`, `view`, or `otherEntity` elements, in addition to custom modules. Refer to chapter 6.2 for additional information about attributes.
An attribute is a characteristic that describes a 'field' or 'variable' in a data entity, such as a column name in a spreadsheet. An attribute annotation represents a precisely-defined semantic statement that applies to an attribute. This semantic statement is used to associate precise measurement semantics with the attribute, such as the property being measured, the entity being measured, and the measurement standard for interpreting values for the attribute. `attribute` elements may be nested in entity-level elements, including the `dataTable`, `spatialRaster`, `spatialVector`, `storedProcedure`, `view`, or `otherEntity` elements, in addition to custom modules. Refer to the "eml-attribute module" section in chapter 6.2 for additional information about attributes.

A typical attribute annotation involves an `annotation` element that is embedded in a containing `attribute` element. The subject of the semantic statement is the `attribute` element that contains the annotation. If the `attribute` element contains an `id` attribute, then the subject should be the value of the `id` attribute. Each annotation consists of a `propertyURI` element and `valueURI` element that respectively define the property and value (object) of the semantic statement. Each URI should be resolvable to a controlled vocabulary that provides a precise definition, relationships to other terms, and multiple labels for displaying the statement. Note that when annotating attributes that are measurements contained in tabular formats the preferred "default" object property is "contains measurements of type" from the OBOE ontology (`http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#containsMeasurementsOfType`). The associated `label` attribute for each URI can be used to display the property and value in a more readable format to users. Ideally, each `label` should be populated by a preferred label or label from a controlled vocabulary.
A typical attribute annotation involves an `annotation` element that is embedded in a containing `attribute` element. The subject of the semantic statement is the `attribute` element that contains the annotation. If the `attribute` element contains an `id` attribute, then the subject should be the value of the `id` attribute. Each annotation consists of a `propertyURI` element and `valueURI` element that respectively define the property and value (object) of the semantic statement. Each URI should be resolvable to a controlled vocabulary that provides a precise definition, relationships to other terms, and multiple labels for displaying the statement. Note that when annotating attributes that are measurements contained in tabular formats the suggested "default" object property is "contains measurements of type" from the OBOE ontology (`http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#containsMeasurementsOfType`). The associated `label` attribute for each URI can be used to display the property and value in a more readable format to users. Ideally, each `label` should be populated by a preferred label or label from a controlled vocabulary.

In the following attribute annotation (Example 3), the subject of the semantic statement is the `attribute` element's `id` attribute value, "att.4". The object property of the statement is `http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#containsMeasurementsOfType`. Note that the URI for the object property resolves to a specific term in the OBOE ontology (https://github.com/NCEAS/oboe). Finally, the value (object) in the semantic statement is `http://purl.dataone.org/odo/ECSO_00001197`, which resolves to the "Plant Cover Percentage" term in the ECSO Ontology (https://github.com/DataONEorg/sem-prov-ontologies/tree/master/observation). Taken together, the semantic statement indicates that "att.4 contains measurements of type plant cover percentage".

Expand Down Expand Up @@ -136,7 +136,7 @@ Multiple `annotation` elements can be used to create multiple annotations about

The `annotations` element is nested under the `eml` root element and contains a list of annotations defining precise semantic statements for parts of a resource. An annotation represents a precisely-defined semantic statement that applies to the resource. This statement is used to associate precise semantics with a particular element in the EML document. For additional details, refer to [insert link]

The `annotations` element contains a set of `annotation` elements. Each `annotation` element has a `references` attribute that points to the `id` attribute of the element being annotated. The id of the element being annotated is listed in the `references` attribute, and must point to a unique id within the EML document. In the semantic statement, the subject is implicitly the id that is referenced. Each annotation also consists of a `propertyURI` element and `valueURI` element that respectively define a property and value (object) that apply to the resource. Each URI should ideally resolve to a controlled vocabulary that provides a definition, relationships to other terms, and multiple labels for displaying the statement. Each `propertyURI` and `valueURI` element can have a `label` attribute that displays a label associated with each URI. Labels are intended to provide a more readable format for users and may be displayed in application interfaces. *It is recommended the labels are populated from the preferred labels field (skos:prefLabel) or label field (rdfs:label) from a controlled vocabulary*.
The `annotations` element contains a set of `annotation` elements. Each `annotation` element has a `references` attribute that points to the `id` attribute of the element being annotated. The id of the element being annotated is listed in the `references` attribute, and must point to a unique id within the EML document. Any of the EML modules may be referenced by the `references` attribute. In the semantic statement, the subject is implicitly the id that is referenced. Each annotation also consists of a `propertyURI` element and `valueURI` element that respectively define a property and value (object) that apply to the resource. Each URI should ideally resolve to a controlled vocabulary that provides a definition, relationships to other terms, and multiple labels for displaying the statement. Each `propertyURI` and `valueURI` element can have a `label` attribute that displays a label associated with each URI. Labels are intended to provide a more readable format for users and may be displayed in application interfaces. *It is recommended the labels are populated from the preferred labels field (skos:prefLabel) or label field (rdfs:label) from a controlled vocabulary*.

The following `annotations` element example (Example 4) has 3 different annotations. For the first annotation, the subject of the semantic statement is "CDF-biodiv-table", which is the id of another element in the EML document. The object property of the statement is `http://purl.org/dc/elements/1.1/subject`. Finally, the value (object) in the semantic statement is `http://purl.obolibrary.org/obo/ENVO_01000177`, which resolves to the "grassland biome" term in the ENVO ontology (`http://www.obofoundry.org/ontology/envo.html`). Taken together, the first semantic statement could be read as "CDR-biodiv-table is about the subject grassland biome".

Expand Down

0 comments on commit d922943

Please sign in to comment.