Skip to content

JSON‐LD mapping

miguel76 edited this page Oct 28, 2017 · 18 revisions

Background

The Linked-Data/Semantic-Web initiative aims to share data on the Web in such a way that heterogeneous data sets are linked and interoperable. The Audio Commons ecosystem, specifically the API and the mediator, aims to integrate different existing services and repositories of audio content. The current Audio Commons API needs to be enriched to enable broader integration within the linked data paradigm. For this purpose an ontology has been defined as part of the project and now it has to be integrated in the API.

General Proposed Solution

We analyse the possible integration of the ontology in the API via JSON-LD, a technology designed to represent linked data as JSON. JSON-LD allows to keep a typical JSON structure along with a mapping between JSON and linked data. This mapping is called JSON-LD context. Here we consider adopting the JSON-LD context approach. So, the responses from Audio Commons API would continue being simple JSON, to which an externally defined JSON-LD context is defined. The link to the JSON-LD context can be defined as an HTTP response header.

Problems and Challenges

Related to the Expressivity of JSON-LD 1.0/1.1 Contexts

Clash of Homonymous JSON Keys and References

Problem: In a JSON-LD 1.0 context, keys and references (values having type @id) are mapped to IRIs irrespectively of their position in the JSON structure: e.g., the mapping "@context": {"name": "foaf:name"} applied to the JSON structure {"name": "Bob Marley", "songs": [{"name": "Jamming"}, ...] } will interpret both name keys as foaf:name, although for the inner one another property would have been more appropriate. In the current Audio Commons API, service names (e.g., Freesound) are used as keys to organize both the contents, the warning, and the errors. They are furthermore also used as prefixes to the audio clip identifiers.

Proposed solution (short/medium term): Avoid homonymy in JSON structure. Rename prefixes in order to distinguish their use as such from the general service description: e.g., "Freesound" for the service and "Freesound-sounds" (or just fs-sound) as prefix for sound identifiers.

Proposed solution (medium/long term): Watch out for status and level of support of JSON-LD 1.1, which overcomes this limit with scoped contexts.

JSON Indexes Not Supported

Problem: In JSON can make sense to use the keys in an object to represent an index over a collection rather than single properties. In the existing API the name of the services are used as keys to index the contents, the errors, and the warnings. In JSON-LD 1.0 the only way to deal with this case is to use "@container": "@index". But in this way the actual keys are lost in the mapping to linked data. JSON-LD 1.1 offer two more ways to deal with indexes: using "@container": "@id", the keys become the ids of the contained items; using "@container": "@type", the keys become the types of the contained items. This excludes the cases in which the relationship between the contained node and the key is another one. In the case of the current Audio Commons API, the service names should be connected by a provenance relationship: i.e., in that context the service is where a set of data/warnings/errors come from. A more generic way to deal with indexes has been proposed for JSON-LD 1.1. This improvement would allow a syntax like "@container": "prov:wasGeneratedBy", but did not yet make its way in to the current JSON-LD 1.1 specification.

Proposed solution (short term): The only way to have a sensible representation of the JSON is to avoid the use of keys in objects to index contained objects, opting instead for arrays of objects in which the former key become one of the properties of the contained objects. In practice, it means replacing

"contents": {
  "Freesound": {...},
  "Jamendo": {...}
}

with

"contents": [
  {"service": "Freesound", ...},
  {"service": "Jamendo", ...}
]

Proposed solution (long term): lobby to add the generic case previously described in JSON-LD 1.1

Limited Mapping of Types

Problem: The set of mapping styles that can be used in JSON-LD contexts is extremely limited if compared with other similar mapping languages (see for example R2RML). Apart from specific cases, in the kind of mapping provided by JSON-LD contexts the generated triple is always in the form <variable_subject> constant_property <variable_object>, where <variable_subject> corresponds to some JSON object, the constant_property corresponds to key used in that object, and <variable_object> corresponds to its value. One of the practical effects of this limit is the impossibility of giving types to most generated resources (which could be typed according to their position in the JSON structure).

Proposed solution (short/medium term): A way to add the missing types is to use some sort of inference on the returned graph, by using (a part of) the ontology. RDFS entailment already permits adding types based on domain and ranges of the used properties. We should document a way this entailment can be easily realized in an example client.

Proposed solution (medium/long term): The limit of the previous solution is that the ontology must be linked to the RDF result off-band. We should research ways to associate in-band an RDF content with an ontology.

Further comment: This big limitation cast a doubt on the usefulness of the whole JSON + JSON-LD context mechanism. Considering that, on top of the JSON-LD mapping, the RDF data have still to be transformed, why not to have a native linked data API (i.e., a hydra/ldp API that consumes and produces JSON-LD)?

Related to Audio Commons ontology

Missing Service Description

Problem: The description of the services is not part of the ontology. It has not yet been specified how an existing vocabulary/ontology could be used for that.

Proposed solution (short/medium term): Analyse how to integrate the Audio Commons with existing vocabularies/ontologies for the description of services (e.g., hydra). Map the service description and metadata of Audio Commons API using the chosen vocabulary(ies).

Open Areas

Problem: Some parts of the ontology specification need to be specified more in detail to be useful. For example, ebucore ontology is used for media files, but which ebucore format resources correspond to specific used formats must be still be defined.

Proposed solution (short/medium term): During the mapping these gaps in the ontology specification should be filled in a "pay as you go" approach.

Related to existing API

Flat JSON

Problem: The API /search endpoint returns a flat JSON representation for each audio clip, with depth 1 (all properties have directly primitive type values). This leads to overuse of "shortcut" properties defined only for the purpose of this mapping.

Proposed solution (short term): This JSON structure may be at least partially enriched (see for example the "Richer JSON" examples below). Even with relatively small changes (the maximum depth would be 2, no more), mapping to JSON-LD is greatly simplified. Furthermore, having a slightly richer structure, could increase the readability also for a developer using just JSON.

Flat API

Problem: The AudioCommons API don't offer a way to explore the integrated repository: the only way to get the data about an audio clip is from a search, even if the client already knows the audio clip identifier; there is no way to access data on the author, the collection, etc. even if this information is available in the source repositories APIs. This is a limit considering both REST API principles and Linked Data principles (in the latter case under the lens of JSON-LD mapping).

Proposed solution (long term): The AudioCommons ecosystem (through the API mediator or other means) should provide access to a standard view of all the main categories of concepts (e.g., audio clips, authors, collections). The view should be accessible through a GET HTTP(S) request that includes in the path the identifier and returns a JSON description paired by an appropriate JSON-LD context.

Unspecified JSON Syntax/Semantics

Problem: The JSON returned for each result of a /search query (each audio clip) is not described. Specifically, without a reference, the semantics are potentially ambiguous: e.g., what is the meaning of .timestamp? what are the possible values of .format and what is its meaning in the case of modular formats (container format + encoding formats)?

Proposed solution (short term): The mapping to the ontology can be a means to also clarify the JSON syntax and semantics.

Multiple values disallowed

Problem: For some of the properties of an audio clip it could be useful to allow multiple values (e.g., authors, audio files). While this multiplicity may exist in the original services, is then lost in the JSON offered by the mediator for each audio clip.

Proposed solution (short term): Analyse thoroughly concrete cases and enrich JSON structure by replacing some single-valued properties with array-valued properties (for a partial example see the "Richer JSON" examples below).

Examples: