Outlook

The following features are planned to support the improvement of data quality and structure:

(note: these features may require a powerful data hub or a processing unit that is applicable for such tasks)

Deduplication

Deduplication comprises two steps:

Finding duplicates (only in the easiest case this can be done via a common identifier)
Applying an appropriate strategy for merging the duplicates

FRBR-ization

FRBR-ization is a process that has its origin in bibliographic domain. It allows to create a graph of connected bibliographic resource at its various levels of abstraction (see also FRBR@Wikipedia). For example, tt can relate concrete manifestations to its abstract works.

Deduplication and FRBR-ization can happen in the data hub. Then we can refer to data from a specific version or with a specific provenance. Cleaned data can be stored in the data hub as well.

Filtering Statements by Qualified Attributes (Context)

The ability to filter statements by qualified attributes (context), such as, provenance, version or trustworthiness, can be utilised when implementing deduplication or FRBR-ization algorithms. For example, a mapping used in a data quality procedure needs to select data based on the source it originates from.

Community Sharing

While most entities in d:swarm are already modelled to support reuse and sharing, we are planning to make sharing a prominent feature that is easily accessible from various views in the d:swarm Back Office. Sharing and discussing projects, transformations and mappings with other users, which facing the same data management tasks, should be possible.

For everyone

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outlook

Deduplication

FRBR-ization

Filtering Statements by Qualified Attributes (Context)

Community Sharing

For everyone

For users

For advanced users

For developers

Clone this wiki locally