Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping textual description of distribution encodings to URIs #24

Closed
andrea-perego opened this issue Jan 23, 2021 · 2 comments · Fixed by #26
Closed

Mapping textual description of distribution encodings to URIs #24

andrea-perego opened this issue Jan 23, 2021 · 2 comments · Fixed by #26

Comments

@andrea-perego
Copy link
Collaborator

andrea-perego commented Jan 23, 2021

Currently, the XSLT output includes URIs for formats only if they are present in the source record.

The reason is twofold:

  1. Using URIs for formats is a recommended practice, as it allows the unambiguous identification of the format, and ensures interoperability
  2. Textual descriptions / labels for file formats are extremely heterogeneous, and it is not possible to take into account all the possible variants

On the other hand, there are also reasons for supporting a text-to-URI mapping:

  1. The current version of the DCAT-AP SHACL constraints requires formats to be specified with a URI reference - see point (4) in Check compliance with SHACL definitions #22 (comment)
  2. Distribution format is an important piece of information in data catalogues, for filtering purposes

Looking at the geospatial records available from the European Data Portal, using URIs for file formats is far from being a common practice.

So, the proposal is to revise the XSLT to include a provisional mapping from textual labels to URIs, which can be phased out in the future. For the textual labels to be mapped to URIs, those most frequently used for geospatial metadata in the European Data Portal can be taken into account. The full list can be obtained via the the following SPARQL queries:

Of course, this solution will not ensure that all distributions will have a format specified via a URI. But this is not the purpose of this revision / patch.

@andrea-perego
Copy link
Collaborator Author

andrea-perego commented Jan 23, 2021

The proposed revision has been implemented in PR #26

The adopted approach is as follows:

  1. The reference URI registers used are, in order of precedence:
    • The OP's file types NAL (i.e., the one recommended by DCAT-AP)
    • The IANA Media Types register
    • The INSPIRE Media Types register
  2. If the format specified in the textual label does not correspond to any of the entries in the reference registers, the closest entry from the reference registers is used (e.g., XML for XML-based formats)
  3. When the textual label denotes a service (WMS, WFS, etc.), it is mapped to the primary / default output format of such service from the reference registers (e.g., the format for CSW is XML)

@andrea-perego andrea-perego linked a pull request Jan 24, 2021 that will close this issue
@andrea-perego
Copy link
Collaborator Author

As no objections were raised, I will merge PR #26 and close this issue.

andrea-perego added a commit that referenced this issue Feb 18, 2021
- Remove explicit class specifications (`skos:Concept`) from code list values, as it is not necessary for validation via SHACL - see #22 (comment).
- Map textual descriptions of distribution encodings to URIs - see #24. 
- Add global configuration parameter (`$include-deprecated`) to specify whether the output must or must not include deprecated mappings - see #25.
- Change reference vocabulary for units of measure from OM (Ontology of Units of Measure) to QUDT (Quantity, Units, Dimensions, and Types Ontology).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant