Skip to content

Latest commit

 

History

History
160 lines (109 loc) · 6.34 KB

metadata.md

File metadata and controls

160 lines (109 loc) · 6.34 KB

Metadata

Data that provides information about other data [wikipedia].

Metadata is structured (and often standardised) information associated with a (data) resource, that provides information about the resource itself.

Metadata provides context and pragmatics (which may be general or domain-specific) for its resource.

{% hint style="success" %} Purpose

Metadata allows you to:

  • Make resources findable, by providing a high-level overview which can be inserted into a search index
  • Make resources reusable, by providing information about how they were generated in the first place.

Thinking about types of metadata allows you to:

  • Decide who should be produce metadata and when that should happen

{% endhint %}

Types of Metadata

A 'type' of metadata is a broad-brush classification of what a metadata element is for and (correspondingly) who might create it and when. The How To Fair project describes three different types of metadata, Administrative, Description and Structural. Wikipedia defines many more!

How To Fair's Structural metadata description incorporates two aspects, we've differentiated them into "provenance" and "form" to help highlight the different roles and times.

{% tabs %} {% tab title="Administrative" %} Administrative metadata is relevant for managing data, for example:

  • Project
  • Resource owner
  • Collaborators
  • Funder
  • Organisation
  • License

These can usually be assigned before you collect or create the data resource itself. {% endtab %}

{% tab title="Descriptive (citation)" %} Descriptive (citation) metadata allows people to discover and identify the resource:

  • Author
  • Title
  • Abstract
  • Keywords
  • Topic
  • Persistent identifier
  • Related resources

These are usually assigned at the point of publication.

{% hint style="info" %} Tip

To facilitate data discovery, descriptive metadata can be made far more powerful than merely a citation.

For example, "find data containing rainfall in Africa last year" requires that a search index or graph is populated with temporal and locational values extracted from the data or its structural metadata.

In such cases "structural" metadata might also be thought of as "descriptive", and the lines blur between them. Extra search fields might include, for example:

  • Datetimes or ranges
  • Geolocations
  • External conditions
  • Other content-derived fields
  • Data type {% endhint %} {% endtab %}

{% tab title="Structural (provenance)" %} Structural (provenance) metadata describes how a resource came about, for example:

  • Collection method
  • Sampling procedure
  • Assumptions made
  • Researcher notes

These metadata have to be gathered by the researchers according to best practice in their research community. They should be added continuously throughout data generation and processing.

{% hint style="info" %} Tip

The semantics used in defining the data and its structural metadata should provide meaning and context to the data in a formal and machine-readable way. However, where richer meaning or context is difficult or impossible to formally capture, this structural metadata should be used to convey such information. {% endhint %} {% endtab %}

{% tab title="Structural (form)" %} Structural (form) metadata describes how a resource is internally structured, for example:

  • Data size
  • Storage details (eg file types, encodings and/or database details)
  • Content and format (data structure)
    • Specified directly, by listing Categories, Variables, Column Names, Types, Relations etc, or
    • Specified indirectly, by referencing an external schema or ontology.

Structural (form) metadata arises from the decisions taken by the data engineering team, who should architect their infrastructure in close collaboration with researchers to anticipate size, content and format.

These details should (ideally) be established a priori to data generation (see "de-risking a project"), but may evolve throughout the lifetime of a resource. {% endtab %} {% endtabs %}

The Dublin Core Metadata Initiative

The Dublin Core Metadata Initiative is an organisation dedicated to metadata. They give a set of fifteen widely used metadata elements. If publishing a data resource on the web, these elements serve as a guide for metadata should be included in order to best facilitate information discovery.

See the Dublin Core elements

See the DCMI elements page for full descriptions.

  • Contributor
  • Coverage
  • Creator
  • Date
  • Description
  • Format
  • Identifier
  • Language
  • Publisher
  • Relation
  • Rights
  • Source
  • Subject
  • Title
  • Type

Whilst metadata need not be limited to this (particularly in cases where rich descriptive metadata can be used to query for datasets, as opposed to search based methods), adhering to the Dublin Core should ensure a good search ranking and help you conform to the FAIR principles.

{% hint style="success" %} Purpose

The Dublin Core allows you to:

  • Quickly define a set of metadata to make a published data resource findable {% endhint %}

{% hint style="info" %} Tip

As discussed in Structural (form) metadata, a schema can be an important element of your metadata, for describing the content of the data resource.

It is also possible to use schema in a different way - not to describe the resource itself, but to describe the associated metadata!

This allows you to check that the provided metadata is correct. The Dublin Core Metadata Initiative publish schema for just this purpose. {% endhint %}

Useful syntaxes for metadata

We think the following syntaxes will be most useful for definition (but this is by no means an exhaustive list):