identifiers #223

bertvannuffelen · 2022-03-25T08:54:20Z

This is an broad issue to capture questions and opinions on identifiers. During the webinar of 10 march 2022 the WG discussed on the role of dct:identifier and adms:identifier in identifying datasets throughout harvesting of catalogues.

To streamline the discussion, the WG agreed with the view that dct:identifier is the identifier assigned by the "owner/first publisher" of the dataset. This removes an ambiguity in the definition of dct:identifier which could be also interpreted as the identifier assigned by the catalogue it is currently part of.

This issue is to collect the community feedback on this topic. We will also provide a coherent proposal based on the WG discussion that has taken place.

bertvannuffelen · 2022-04-25T08:02:34Z

Dear community,

a proposal for the guidelines to comment on can be found at:
the https://github.com/SEMICeu/DCAT-AP/blob/2.x.y-draft/releases/2.x.y/usageguide-identifiers.md

As during the last webinar no agreement was on the status of this proposal it is shifted to a future release.
Also this is a new invite to provide comments to the proposal.

jakubklimek · 2023-03-06T11:06:47Z

The Czech data catalog implements what is to be avoided by the guidelines - it mints an IRI for a harvested dataset regardless of its original IRI. If there was an original IRI, it is preserved in dct:identifier.

This is not to argue that the approach is correct, but I would like to take this opportunity to mention arguments that led us to this implementation that I did not find mentioned in the guidelines.

Guaranteed dereferencablity of the IRIs. The source catalog assigns IRIs to datasets, but does not implement their dereferencablility, or the dereferencability of other IRIs - distributions, data services, etc. The national catalog does that, but that only works with IRIs in its domain.
Security (Trustworthiness of the registered catalogs) - By assigning new (publisher-scoped) IRIs and processing the metadata instead of taking it unaltered when harvesting the datasets, we can avoid one publisher stating (intentionally, or by mistake) something about a dataset of another publisher without their knowledge, which could affect query results on the single National Open Data Catalog SPARQL endpoint. Admittedly, this goes against the open-world assumption, but in the context of a public administration system, this is something we want to avoid rather than encourage.

bertvannuffelen · 2023-03-20T16:09:31Z

@jakubklimek, I understand the arguments.

And exactly because of these experiences, the guidelines propose that harvesters and portal owners should ensure that all identifiers are included in adms:identifier.
If every portal would do that, dynamically a list of equivalent identifiers is being created.
And this offers then the potential to implement deduplication algorithms, trusted cross-reference throughout the network of harvesting, ....

It does not impact any portal user experience nor publisher (only technical support to the harvesting community), but the potential is high.

bertvannuffelen · 2024-02-01T23:08:23Z

This issue will be closed as an reference to the assessment/proposal is in the specification. The assessment/proposal has not been included in full but in this way readers of the specification can better find it and take the considerations into account in their implementations.

bertvannuffelen mentioned this issue Mar 31, 2022

adms:Identifier SEMICeu/ADMS-AP#32

Open

bertvannuffelen mentioned this issue Oct 20, 2022

Dataset requires dct:identifier Informatievlaanderen/OSLOthema-DCATAPVlaanderen#15

Open

gabswiersma mentioned this issue May 1, 2023

Gebruik van identifiers Geonovum/dcat2-ap-nl#7

Open

bertvannuffelen added release:3.0.0 https://semiceu.github.io/DCAT-AP/releases/3.0.0 status:fixed This issue has been fixed in a draft. labels Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

identifiers #223

identifiers #223

bertvannuffelen commented Mar 25, 2022

bertvannuffelen commented Apr 25, 2022 •

edited

jakubklimek commented Mar 6, 2023

bertvannuffelen commented Mar 20, 2023

bertvannuffelen commented Feb 1, 2024

identifiers #223

identifiers #223

Comments

bertvannuffelen commented Mar 25, 2022

bertvannuffelen commented Apr 25, 2022 • edited

jakubklimek commented Mar 6, 2023

bertvannuffelen commented Mar 20, 2023

bertvannuffelen commented Feb 1, 2024

bertvannuffelen commented Apr 25, 2022 •

edited