Skip to content
This repository has been archived by the owner on May 24, 2022. It is now read-only.

Outlook

Thomas Gängler edited this page Sep 4, 2015 · 9 revisions

The following features are planned to support the improvement of data quality and structure:

(note: these features may require a powerful data hub or a processing unit that is applicable for such tasks)

Deduplication

Deduplication comprises two steps:

  1. Finding duplicates (only in the easiest case this can be done via a common identifier)
  2. Applying an appropriate strategy for merging the duplicates

FRBR-ization

FRBR-ization is a process that has its origin in bibliographic domain. It allows to create a graph of connected bibliographic resource at its various levels of abstraction (see also FRBR@Wikipedia). For example, tt can relate concrete manifestations to its abstract works.

Deduplication and FRBR-ization can happen in the data hub. Then we can refer to data from a specific version or with a specific provenance. Cleaned data can be stored in the data hub as well.

Filtering Statements by Qualified Attributes (Context)

The ability to filter statements by qualified attributes (context), such as, provenance, version or trustworthiness, can be utilised when implementing deduplication or FRBR-ization algorithms. For example, a mapping used in a data quality procedure needs to select data based on the source it originates from.

Community Sharing

While most entities in d:swarm are already modelled to support reuse and sharing, we are planning to make sharing a prominent feature that is easily accessible from various views in the d:swarm Back Office. Sharing and discussing projects, transformations and mappings with other users, which facing the same data management tasks, should be possible.

Clone this wiki locally