Skip to content

Commit

Permalink
edit introduction
Browse files Browse the repository at this point in the history
Spiced up the introduction
  • Loading branch information
stevenchong committed Jan 22, 2019
1 parent 84d7db1 commit f3235e0
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions docs/eml-semantic-annotations-primer.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Semantic Annotations Primer

## Introduction
The purpose of this primer is to provide a gentle introduction to how semantic annotations are structured in EML documents. A semantic annotation is the attachment of semantic metadata to a resource. Semantic metadata provide a precise definitions of concepts and clarify the relationships between concepts. Although the process of creating semantic annotations may seem tedious, the payoff is enhanced information retrieval and discovery. For example, if a dataset is annotated as being about "carbon dioxide flux" and another annotated with "CO2 flux" the information system should recognize that the datasets are about equivalent concepts. In another example, if a user performs a search for datasets about "litter" (as in "plant litter"), the system will disambiguate the term from the many meanings of "litter" (as in garbage, the grouping of animals born at the same time, etc.). Yet another example is if a user searches for datasets about "carbon flux", then datasets about "carbon dioxide flux" will also be returned because "carbon dioxide flux" is considered a type of "carbon flux".
The purpose of this primer is to provide a gentle introduction to how semantic annotations are structured in EML documents. It is expected that you have some familiarity with EML prior to reading this document. If you aren't knowledgeable about the Resource Description Framework (RDF) data model or semantic web, you can refer to the supplemental readings listed after this introduction for some background information.

A semantic annotation follows the Resource Description Framework (RDF) data model and uses semantic triples. A semantic triple is composed of a **subject**, **object property or data property (predicate)**, and **object**. In general, the subject and object can be thought of as nouns in a sentence and the object property or data property is akin to a verb or relationship that connects the subject and object. The semantic triple expresses a statement about the associated resource. Ideally, the components should be globally unique and should be resolvable uniform resource identifiers (URIs) from controlled vocabularies so that users can look up precise definitions and relationships of the terms to other terms. An example of a URI is "http://purl.obolibrary.org/obo/ENVO_01001357", which resolves to the term "desert" in the Environment Ontology (ENVO) when entered into the address bar of a web browser. Users can find the definition for "desert" and determine its relationship to other terms in the ontology.
A semantic annotation is the attachment of semantic metadata to a resource. Semantic metadata provide precise definitions of concepts and clarify the relationships between concepts. Although the process of creating semantic annotations may seem tedious, the payoff is enhanced information retrieval and discovery. Semantic annotations will make it easier for others to find and reuse your data (and thus give you credit). For example, if a dataset is annotated as being about "carbon dioxide flux" and another annotated with "CO2 flux" the information system should recognize that the datasets are about equivalent concepts. In another example, if you perform a search for datasets about "litter" (as in "plant litter"), the system will disambiguate the term from the many meanings of "litter" (as in garbage, the grouping of animals born at the same time, etc.). Yet another example is if you search for datasets about "carbon flux", then datasets about "carbon dioxide flux" will also be returned because "carbon dioxide flux" is considered a type of "carbon flux".

Semantic annotations follow the RDF data model and use semantic triples. A semantic triple is composed of a **subject**, **object property or data property (predicate)**, and **object**. In general, the subject and object can be thought of as nouns in a sentence and the object property or data property is akin to a verb or relationship that connects the subject and object. The semantic triple expresses a statement about the associated resource. After processing the EML into a semantic web format, such as RDF/XML, the semantic statement becomes interpretable by machines. Ideally, the components of the semantic triple should be globally unique and should consist of resolvable uniform resource identifiers (URIs) from controlled vocabularies so that users can look up precise definitions and relationships of the terms to other terms. An example of a URI is "http://purl.obolibrary.org/obo/ENVO_01001357", which resolves to the term "desert" in the Environment Ontology (ENVO) when entered into the address bar of a web browser. Users can find the definition for "desert" and determine its relationship to other terms in the ontology.

### Supplemental background information
* RDF data model: https://www.w3.org/TR/WD-rdf-syntax-971002/
Expand Down

0 comments on commit f3235e0

Please sign in to comment.