Skip to content

Commit

Permalink
update headings and text
Browse files Browse the repository at this point in the history
Updated the headings and text for the Semantic Annotations in EML 2.2.0 section in the primer.
  • Loading branch information
stevenchong committed Dec 20, 2018
1 parent 434b903 commit e18cc9e
Showing 1 changed file with 37 additions and 14 deletions.
51 changes: 37 additions & 14 deletions docs/eml-semantic-annotations-primer.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,64 @@
# Semantic Annotations Primer (in progress)

## Introduction
A semantic annotation is the attachment of semantic metadata to a resource. Semantic metadata provides a precise definition of concepts and clarifies the relationships between concepts. Although the process of semantic annotation may seem tedious, the payoff is enhanced information retrieval and discovery. For example, if a dataset is annotated as being about "carbon dioxide flux" and another annotated with "CO2 flux" the information system should recognize that the datasets are about equivalent concepts. In another example, if a user performs a search for datasets about "litter" (as in "plant litter"), the system will disambiguate the term from other forms of "litter" (as in garbage, the group of animals born at the same time, etc.). Yet another example is if a user searches for datasets about "carbon flux", then datasets about "carbon dioxide flux" will also be returned because "carbon dioxide flux" is considered a type of "carbon flux".

A semantic annotation follows the Resource Description Framework (RDF) data model and uses semantic triples. A semantic triple is composed of a **subject**, **object property (predicate)**, and **object**. Ideally, these components should be globally unique and should be resolvable uniform resource identifiers (URI) from controlled vocabularies so that users can look up the definitions and relationships of the terms to other terms. An example is "http://purl.obolibrary.org/obo/ENVO_01001357", which resolves to the term "desert" in the Environment Ontology (ENVO) when entered into an address bar in a web browser. Users can find the definition for "desert" and determine its relationship to other terms in the ontology. In general, the subject and object can be thought of as nouns in a sentence and the object property is akin to a verb or relationship that connects the subject and object. The semantic triple expresses a statement about the associated resource.
A semantic annotation is the attachment of semantic metadata to a resource. Semantic metadata provides a precise definition of concepts and clarifies the relationships between concepts. Although the process of semantic annotation may seem tedious, the payoff is enhanced information retrieval and discovery. For example, if a dataset is annotated as being about "carbon dioxide flux" and another annotated with "CO2 flux" the information system should recognize that the datasets are about equivalent concepts. In another example, if a user performs a search for datasets about "litter" (as in "plant litter"), the system will disambiguate the term from other meanings of "litter" (as in garbage, the group of animals born at the same time, etc.). Yet another example is if a user searches for datasets about "carbon flux", then datasets about "carbon dioxide flux" will also be returned because "carbon dioxide flux" is considered a type of "carbon flux".

A semantic annotation follows the Resource Description Framework (RDF) data model and uses semantic triples. A semantic triple is composed of a **subject**, **object property (predicate)**, and **object**. In general, the subject and object can be thought of as nouns in a sentence and the object property is akin to a verb or relationship that connects the subject and object. The semantic triple expresses a statement about the associated resource. Ideally, these components should be globally unique and should be resolvable uniform resource identifiers (URIs) from controlled vocabularies so that users can look up the definitions and relationships of the terms to other terms. An example is "http://purl.obolibrary.org/obo/ENVO_01001357", which resolves to the term "desert" in the Environment Ontology (ENVO) when entered into an address bar in a web browser. Users can find the definition for "desert" and determine its relationship to other terms in the ontology.
* Additional background information on the RDF data model: https://www.w3.org/TR/WD-rdf-syntax-971002/


## Semantic Annotations in EML 2.2.0
A general introduction to creating semantic annotations in an EML document is made here. Users can find additional explanations and examples for each kind of annotation listed in the appropriate section below.
A general introduction to the patterns for creating semantic annotations in an EML document is made here. Users can find additional explanations and examples for each kind of annotation listed in the appropriate section below.

In **EML 2.2.0** there are 5 kinds of semantic annotations that can be made in an EML document. The first 3 kinds are at the **dataset-level**, **entity-level**, and **attribute-level**. Semantic annotations may also be inserted in the `annotations` and `additionalMetadata` elements that are nested under the `eml` root element (**/eml/annotations** and **/eml/additionalMetadata**).
In **EML 2.2.0** there are 5 kinds of semantic annotations that can be made in an EML document: **dataset-level**, **entity-level**, **attribute-level**, **/eml/annotations** and **/eml/additionalMetadata**

### Pattern for dataset-level, entity-level, and attribute-level annotations
Semantic annotations made at the **dataset-level**, **entity-level**, and **attribute-level** follow the same pattern. An annotation made at any of these levels involves inserting an `annotation` element containing a `propertyURI` element and a `valueURI` element within the appropriate element. The *subject* of this annotation is the containing element. The `propertyURI` is the *object property* and the `valueURI` is the *object* of the annotation. For example, an attribute-level annotation involves an `attribute` element. Within the `attribute` element are `propertyURI` and `valueURI` elements.

Multiple `annotation` elements may be embedded in the same dataset, entity-level or attribute element to assert multiple semantic statements.

```
<dataset or entity-level or attribute> <- subject
<dataset or entity-level or attribute> <- subject
<annotation>
<propertyURI></propertyURI> <- object property 1
<valueURI></valueURI> <- object 1
<propertyURI>123</propertyURI> <- object property 1 is "123"
<valueURI>abc</valueURI> <- object 1 is "abc"
</annotation>
<annotation>
<propertyURI></propertyURI> <- object property 2
<valueURI></valueURI> <- object 2
<propertyURI>789</propertyURI> <- object property 2 is "789"
<valueURI>xyz</valueURI> <- object 2 is "xyz"
</annotation>
</dataset or entity-level or attribute>
```

### Pattern for /eml/annotations annotations
Semantic annotations may also be inserted in the `annotations` element that is nested under the `eml` root element. This type of semantic annotation involves an `annotation` element that has a `references` attribute. What is listed in the `references` attribute is the *subject* of the semantic annotation. Within the `annotation` element are `propertyURI` and `valueURI` elements. The `propertyURI` is the *object property* and the `valueURI` is the *object* of the annotation.

```
<eml>
...
<annotations>
<annotation references="test_data"> <- subject is "test_data"
<propertyURI>abc</propertyURI> <- object property is "abc"
<valueURI>123</valueURI> <- object is "123"
</annotation>
</annotations>
...
</eml>
```

Multiple `annotation` elements can be used to create multiple annotations about the same subject. Please see "Example 4" below for an illustrative example.


### Pattern for /eml/additionalMetadata annotations


and `additionalMetadata`

Semantic annotations made for **/eml/annotations** involve an `annotation` element that has a `references` attribute. What is listed in the `references` attribute is the *subject* of the semantic annotation. Within the `annotation` element are `propertyURI` and `valueURI` elements. The `propertyURI` is the *object property* and the `valueURI` is the *object* of the annotation.




### Dataset-level annotation
### Dataset-level annotations

A dataset is defined as all of the information describing a data collection event. This event may take place over some period of time and include many actual collections (e.g. a time series or remote sensing application) or it could be just one actual collection (e.g. a day in the field). The `dataset` element encompasses all information about a single dataset. It is intended to provide overview information about the dataset: broad information such as the title, abstract, keywords, contacts, maintenance history, purpose, and distribution of the data themselves. A dataset can be (and often is) composed of a series of data entities (see 'entity-level annotation' section below) that are linked together by particular integrity constraints. Further information about datasets may be found at:[insert link].

Expand Down Expand Up @@ -66,7 +89,7 @@ In the following dataset-level annotation (Example 1), the subject of the semant
```

### Entity-level annotation
### Entity-level annotations

Entities are usually tables of data (EML element `dataTable`). Data tables may be ascii text files, relational database tables, spreadsheets or other type of tabular data with a fixed logical structure. Related to data tables are views (EML element `view`) and stored procedures (EML element `storedProcedure`). Views and stored procedures are produced by an RDBMS or related system. Other types of data such as: raster (EML element `spatialRaster`), vector (EML element `spatialVector`) or spatialReference image data are also data entities. An `otherEntity` element would be used to describe types of entities that are not described by any other entity type. Entity-level EML elements are nested under `dataset` elements. Further information about entities may be found at: [insert link].

Expand All @@ -93,7 +116,7 @@ In the following entity-level annotation (Example 2), the subject of the semanti
```

### Attribute-level annotation
### Attribute-level annotations

An attribute is a characteristic that describes a 'field' or 'variable' in a data entity, such as a column name in a spreadsheet. An attribute annotation represents a precisely-defined semantic statement that applies to an attribute. This semantic statement is used to associate precise measurement semantics with the attribute, such as the property being measured, the entity being measured, and the measurement standard for interpreting values for the attribute. `attribute` elements may be nested in entity-level elements, including the `dataTable`, `spatialRaster`, `spatialVector`, `storedProcedure`, `view`, or `otherEntity` EML elements, in addition to custom modules. Refer to the Data Structures Modules documentation for additional information about attributes [insert link].

Expand Down

0 comments on commit e18cc9e

Please sign in to comment.