Skip to content

Latest commit

 

History

History
753 lines (561 loc) · 23.8 KB

aref.md

File metadata and controls

753 lines (561 loc) · 23.8 KB

Introduction

This document defines an encoding of RDF graphs called another RDF encoding form (aREF). The encoding combines and simpilfies parts of existing RDF serializations Turtle, JSON-LD, and RDF/JSON. In contrast to these formats, RDF data in aREF is not serialized as a Unicode string but encoded as a list-map-structure, as known from the type system of most programming languages and from data structuring languages such as JSON and YAML.

This specification of aREF is hosted in a public git repository at <{GITHUB}>, written in in Pandoc’s Markdown and managed with makespec. Please add and comment on issues to this specification at https://github.com/gbv/aREF/issues. The most recent version of this document is made available at http://gbv.github.io/aREF/.

Background

Terminology

Terms written in "bold" refer to terms at the place of their definition in this document. Terms written in "italics" refer to terms defined elsewhere in this document. Uppercase keywords (MUST, MAY, RECOMMENDED, SHOULD...) are used as defined in RFC 2119. Syntax rules in this document are expressed in ABNF notation as specified by RFC 5234.

Examples and notes in this document are informative only. YAML syntax is used to express sample aREF documents, unless noted otherwise.

The following syntax rules are referenced later in this document:

string = *( %x0-%x10FFFF )

LOWERCASE = %x61-%x7A ; a-z

The term string in this document always refers to Unicode strings as defined by Unicode. A string can also be defined with syntax rule string.

Strings SHOULD always be normalized to Normal Form C (NFC). Applications MAY restrict strings by disallowing selected Unciode codepoints, such as the 66 Unicode noncharacters or the set of Unicode characters not expressible in XML.

RDF data

RDF is a graph-based data structuring languages defined as abstract syntax by Klyne and Carroll (2004). Several RDF variants exist (in particular see Wood, 2013 for a comparision between RDF 1.0 and RDF 1.1). RDF extensions with named graphs, blank nodes as predicates, and literal nodes as subjects are not covered by this specification nor expressible in aREF.

RDF data as encoded by aREF is defined as following:

  • An RDF graph is a set of triples.
  • A triple (also known as "statement") consists of a subject, a predicate, and an object.
  • A subject is either an IRI or a blank node.
  • A predicate (also known as "property") is an IRI.
  • An object is either an IRI or a blank node or a literal node.
  • An IRI (Internationalized Resource Identifier) is a string that conforms to the IRI syntax defined in RFC 3987.
  • A blank node is neither an IRI nor a literal node.
  • A literal node is a string tagged by either a language tag or by a datatype.
  • A simple literal is a literal node with datatype http://www.w3.org/2001/XMLSchema#string.
  • A datatype is an IRI.
  • A language tag is a well-formed laguage tag as defined in BCP 47.

An RDF graph encoded in aREF can also include blank node identifiers to refer to particular blank nodes within the scope of the same RDF graph.

Ask a Semantic Web or Linked Data evangelist for examples of RDF!

Lists-map-structures

A list-map-structure is an abstract data structure build of

  • strings, which are Unicode strings,
  • lists, which are a sequences of zero or more list-map-structures,
  • and maps, which are sets of strings (the maps' keys) and a mapping from these keys to list-map-structures.

Every aREF document MUST be given as map. Applications MAY restrict aREF documents to non-circular list-map-structures. All non-circular list-map-structures can be serialized in JSON and YAML.

Applications MAY support special null values, disjoint from strings, as element in a list and/or mapped to in a map. These null values MUST be ignored on decoding aREF.

See section [aREF document types](#aref-document-types) and [appendix aREF serializations](#aref-serializations) for examples.

Encoding

IRIs

An IRI in aREF is encoded as string, either as plain IRI, or as explicit IRI, or as qName. The special stringa” can further be used to encode the predicatehttp://www.w3.org/1999/02/22-rdf-syntax-ns#type”.

Plain IRIs

A plain IRI is an IRI, as defined in RFC 3987. If used as object, a plain IRIs MUST conform to the syntax rule IRILike to distinguish from a literal node.

  IRIlike = LOWERCASE *( LOWERCASE / DIGIT / "+" / "." / "-" ) ":" [ string ]

Explicit IRIs

An explicit IRI is an IRI enclosed in in angle brackets (“<” and “>”).

  explicitIRI   = "<" IRI ">"   ; IRI syntax rule from RFC 3987

Applications MAY use the syntax rule IRILike instead of IRI to facilitate decoding aREF.

qNames

A qName consists of a prefix and a localName separated by an underscore (“_”):

  qName  = prefix "_" localName

The prefix is a string starting with a lowercase letter (a-z) optionally followed by a sequence of lowercase letters and digits (0-9).

  prefix = LOWERCASE *( LOWERCASE / DIGIT )    ; a-z *( a-z / 0-9 )

The localName is a string that conforms to the following syntax.

  localName     = nameStartChar *(nameChar)

  nameStartChar = ALPHA / "_" / %x00C0-%x00D6 / %x00D8-%x00F6 /
                  %x00F8-%x02FF / %x0370-%x037D / %x037F-%x1FFF / 
                  %x200C-%x200D / %x2070-%x218F / %x2C00-%x2FEF / 
                  %x3001-%xD7FF / %xF900-%xFDCF / %xFDF0-%xFFFD /
                  %x10000-%xEFFFF

  nameChar      = nameStartChar / '-' / DIGIT / %xB7 / %x0300-%x036F / %x203F-%x2040
The syntax rule `localName` is more restrictive than corresponding definitions in [Turtle] and [JSON-LD].

A qName is mapped to an IRI by appending its localName to the namespace URI that corresponds to its prefix. Applications SHOULD warn about unknown prefixes and/or ignore all triples that include a node with an unknown prefix.

Literal nodes

A literal node is encoded as string in one of three forms:

  literalNode   = languageString / datatypeString / plainLiteral

Literal nodes with language tag

A literal node with language tag is encoded by appending an at sign ("@") followed by the language tag to the node’s string:

  languageString = string "@" languageTag

  languageTag    = 2*8(ALPHA) *( "-" 1*8( ALPHA / DIGIT ) )
```json { "_id": "http://example.com/MyResource", "skos_prefLabel": [ "east@en", "Osten@de" "東@ja", "東@ja-Hani", "ヒガシ@ja-Kana", "higashi@ja-Latn" ] } ```
The syntax rule `languageTag` is slightly more restrictive than the syntax of a language tag in [Turtle] but less restrictive than the syntax of a language tag in JSON-LD, which refers to well-formed language tags as defined in [BCP 47].

Literal nodes with datatype

A literal node with datatype is encoded by appending a caret ("^") followed by the datatype’s IRI either explicit IRI or as qName:

  datatypeString = string "^" ( qName / explicitIRI )
```json { "_id": "http://example.org/", "dct_modified": [ "2010-05-29T14:17:39+02:00^xsd_dateDate", "2010-05-29^" ] } ```
[Turtle] uses the character sequence “`^^`” instead of a single “`^`”.

Simple literals

A simple literal is encoded either as literal node with datatypehttp://www.w3.org/2001/XMLSchema#string” or as string that conforms to the plainLiteral syntax rule. The syntax MUST BE disjoint to the syntax rules languageString and datatypeString and to the syntax rules of IRIs (explicitIRI, IRIlike, qName) and blank nodes (blankNode).

  plainLiteral = string / string "@" ; MUST NOT match any of rules
                                     ; languageString, datatypeString, 
                                     ; explicitIRI, IRIlike, qName
                                     ; blankNode

An at sign ("@") can always be appended to the node’s string to distinguish from other syntax rules. The at sign MUST be appended if the simple literal ends with an at sign.

aREF string RDF literal (Turtle syntax)


@ "" *empty string* "" ^xsd_string "" @@ "@" @^xsd_string "@" alice@en "alice"@en alice@example.com "alice@example.com" 123 "123" 忍者@ja "忍者"@ja Ninja@en "Ninja"@en Ninja@en@ "Ninja@en"

Blank nodes

A blank node is encoded

blankNode      = "_:" 1*( ALPHA / DIGIT )

Within the scope of the same RDF graph, equal blank node identifiers MUST refer to the same blank node. Blank node identifiers SHOULD NOT be shared among different RDF graphs.

In the simplest case, a blank node in aREF can be encoded as an empty map.

```yaml _ns: foaf: http://xmlns.com/foaf/0.1/ _:alice: foaf_knows: _:bob _:bob: foaf_knows: _id: _:alice ```
```yaml _ns: foaf: http://xmlns.com/foaf/0.1/ _:someone foaf_knows: foaf_name: "Bob" ```
The syntax rule `blankNode` is more restrictive than the rule of blank node identifiers in [Turtle] and in [JSON-LD].

Graphs

An RDF graph in aREF is encoded as a list-map-structure that is

Subject maps

A subject map is a map with the following constraints:

  1. The subject map MUST NOT contain the key "_id".

  2. The subject map MAY contain the key key "_ns", mapped to a namespace map.

  3. Additional keys, starting with _ and not with _: SHOULD be ignored.

  4. Every other key is either a plain IRI or a qName or a blank node. These keys encode the subjects of RDF triples.

  5. Every value of a key that encodes a subject MUST BE a predicate map that either does not contain the key "_id" or maps the key "id" to an encoding of the same subject.

```yaml "http://example.org/alice": foaf_knows: http://example.org/bob _id: http://example.org/alice # redundant ```

Predicate maps

A predicate map encodes a set of RDF triples with same subject. The subject is given by context, if the predicate map is part of a subject map, or explicitly with the key "_id", or the subject is a blank node.

A predicate map is a map with the following constraints:

  1. The optional key "_id", if given, MUST be mapped to a plain IRI, a qName, or a blank node.

  2. The optional key "_ns", if given, MUST be mapped to a namespace map.

  3. Additional keys, starting with _ SHOULD be ignored.

  4. Every key, unless it starts with "_", MUST be either a plain IRI or a qName, or the value "a" that stands for the IRI "http://www.w3.org/1999/02/22-rdf-syntax-ns#type". These keys encode predicates of triples.

  5. Every value of a key that encodes a predicate MUST BE an encoded object.

Encoded objects

An encoded object encodes zero or more RDF objects with same subject and same predicate. An encoded object MUST BE one of, or a list of any of the following:

A list as encoded object represents a set of objects, so the order of elements is irrelevant and duplicates SHOULD NOT be included, independent from different encoding forms.

The following encoded objects, expressed in JSON, refer to the same [*IRI*]:
  • http://example.org/
  • <http://example.org/>
  • { "_id": "http://example.org/" }
  • [ "http://example.org/" ]
  • [ "<http://example.org/>" ]
  • [ { "_id": "http://example.org/" } ]

Namespace maps

A namespace map can be specified explicitly with the special key "_ns" in a subject map or in a predicate map. An aREF document MUST NOT contain more than one explicit namespace map.

A namespace map is

  • either a map in which every key conforms to the prefix syntax rule (see qName) and is mapped to an IRI (syntax rule IRI from RFC 3987). The IRIs in a namespac map are also called namespace URIs. The special key underscore (_) can further be used to refer to another predefined namespace map given by a string. This string is also called namespace map identifier. Mappings explicitly given with namespace URI precedence over mappings refered to by a namespace map identifier.

  • or a namespace map identifier that refers to a predefined namespace map.

Applications MAY further assume an implicit namespace map. Mappings from an implicit namespace map can be overriden by explicit namespace maps. The following implicit namespace map or a superset of it SHOULD be assumed by default:

{
  "rdf":  "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "owl":  "http://www.w3.org/2002/07/owl#",
  "xsd":  "http://www.w3.org/2001/XMLSchema#"
}

TODO: should the default namespace map always precede namespace maps given by namespace map identifier, so applications can always assume they are right?

The following namespace maps are equivalent:
  • "example"
  • { "_": "example" }

A commonly used namespace map is listed at http://www.w3.org/2011/rdfa-context/rdfa-1.1. If the the namespace map identifier http://www.w3.org/2013/json-ld-context/rdfa11 refers to this map, it can be used in aREF as following (examples in YAML):

_ns: http://www.w3.org/2013/json-ld-context/rdfa11

Custom prefixes can be added and existing prefixes redefined like this:

_ns: 
  _: http://www.w3.org/2013/json-ld-context/rdfa11
  dc: http://purl.org/dc/elements/1.1/ # instead of http://purl.org/dc/terms/
  dct: http://purl.org/dc/terms/       # additional prefix
This specification does *not* include rules how to resolve *namespace maps identifiers*. The following guidelines are non normative:
  • An URL is expected to refer to a JSON-LD document with a @context element. For instance the default aREF namespace map could be expressed like this:

    {
      "@context": {
        "rdf":  "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
        "owl":  "http://www.w3.org/2002/07/owl#",
        "xsd":  "http://www.w3.org/2001/XMLSchema#"
      }
    }

    Note that JSON-LD context documents for particular ontologies usually define abbreviations for full URIs and/or default vocabularies (@vocab) that cannot be used in aREF documents because a qNames MUST consists of prefix and local name.

  • A string of the form YYYYMMDD is expected to refer to the namespace map defined at this date at http://prefix.cc (see rdfns, available as package librdf-ns-perl in Debian for a related command line tool). For instance the identifier "20140901" maps prefix "fabio" to http://purl.org/spar/fabio/ and the identifier "20120521" maps it to http://purl.org/spar/fabio#.

aREF document types

THIS PART OF THE SPEC IS NOT FINISHED YET

Depending on their structure, aREF documents can be classified as circular or non-circular, as flat, as consistent, and as normalized.

An aREF document is circular iff there is at least one path from a subject map to itself by stepping to a next subject maps that is part of an encoded objects of the previous subject map.

A minimal circular aREF document can be created in JavaScript as following:
var aref = { _id: "http://example.org/alice" };
aref.foaf_knows = alice; # alice knows herself

Circular aREF documents cannot be serialized in JSON but in YAML, for instance this normalized circular aREF document:

http://example.org/alice: &alice
    _id: http://example.org/alice
    foaf_knows: &bob    # alice knows bob
http://example.org/bob: &bob
    _id: http://example.org/bob
    foaf_knows: &alice  # bob knows alice

An aREF document is flat iff all of its encoded objects are encoded as strings. All flat aREF documents are non-circular.

The [*list-map-structure*] of a flat aREF document can at most be nested in two levels, if it is a [*subject map*] and at most one level, if it is a [*predicate map*]:
{
  "http://example.org/": {    # first level: predicate map
    "dct_title": [            # second level: list of encoded objects
      "example@en",
      "Beispiel@de"
    ]
  }
}

An aREF document (or its IRIs) is/are consistent iff ... same IRI should be encoded the same way (but subtle differences is used as subject, predicate, and object)*

...

An aREF document is normalized according to a given namespace map if

  1. The document must be a subject map

  2. The document contains no null values or ignored keys

  3. Its IRIs are encoded consistently

  4. All lists have at least two members

  5. what about _ns?

  6. The document is

    • either flat and no predicate map contains the key _id ("normalized form 1)

    • or normalized form 2:

      • all predicate maps must contain the key _id and at least one more predicate key

      • all predicate maps must directly be mapped from a keys in the subject map.

...better names for the two forms...

References

Normative references

Other references

Appendix

aref-query.md{.include}

serializations.md{.include}