-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Dataset identifier #53
Comments
Looks good and aligns with DCAT's identifier property. |
I support the use of a dataset identifier, but the question is whether this should be a numeric UID or a URL to the dataset. WB data catalogue creates a UID when we upload to that catalogue. The original schema used a URL to account for the possibility this schema would be used in multiple catalogues, therefore the dataset could be found across the Web. Drawback is of course URLs can break or change. I think we should support URL as a dataset identifier, and where data is uploaded to a catalogue which uses its own UID system, that would also be appended. Either way these probably need to be added retrospectively once a dataset is uploaded, unless the catalogue creates it on upload (as is the case for WB data catalog) |
Looking at other standards:
I suggest that we follow a similar approach:
That way:
I figure that a catalog system's own UID be part of that catalog's metadata, to which the RDLS metadata will be added/integrated, so I don't think we need to support multiple identifiers in RDLS. Sound good? |
clarify how this works with model.id - e.g. #85 |
If I understood correctly, this issue is about an identifier for the dataset that the RDLS metadata describes, whilst #85 is about referencing the dataset's source model, which wouldn't be catalogued using RDLS. Did I miss something? |
I note that some (but obviously not all) datasets have a DOI for this purpose. |
@stufraser1 @matamadio can you advise on whether it would be useful to link to listings of the same dataset in other catalogs from within the RDLS metadata? Logging some initial research: In data.gov, there is an There are two IANA Link Relations that might be relevant:
|
For co-hosting datasets we should be able to use the URI. |
In the scenario where a dataset is first listed in a national authority's data catalog and then added to the World Bank's data catalog, which HTTP URI do we want to see in |
What is the context or reason for the change?
There is a need to have a unique id per dataset and per resource. This will act as a parent id per dataset.
What is your proposed change?
Create a dataset identifier
identifier
with description ‘An id for this dataset. This identifier must be unique within the data catalogue it is stored in, and it is recommended that an identifier is chosen with a high likelihood of being globally unique.’Why is this not covered by the existing model?
The current model contains the fields dataset
title
anddescription
The text was updated successfully, but these errors were encountered: