Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing link between checksum class (value) and sum-checked distribution (accessURL?) #49

Open
init-dcat-ap-de opened this issue Oct 26, 2018 · 5 comments

Comments

@init-dcat-ap-de
Copy link

commented Oct 26, 2018

When working on https://www.dcat-ap.de/ we encountered a missing link between the binary file described in the distribution and its checksum value in RDF:
If you serialize into RDF - so if you do not use the XML hierarchical structure - it is not clear which checksum value is associated with which file in which distribution. The problem occurs as order of things cannot be granted by the XML processor and Distributions and Checksums are two independent classes with a currently missing link and the rdf:nodeID is optional and not sure what to put into it to create this linkage.

Perhaps the use of the dataset dcterms:identifier or the distributions accessURL ( following the logic in https://joinup.ec.europa.eu/release/dcat-ap-how-use-identifiers-datasets-and-distributions ) as a checksumClass rdf:nodeID statement might be a workaround for this and can be added when reworking the spdx vocabulary or somehow be considered in the current ISA² DCAT-AP review?

@addragan

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2018

Thank you for pointing this out. Proposal to discuss this issue in the next major release cycle in 2019.

@addragan

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

Proposed resolution: Add property Dataset dcterms:identifier or Distributions dcat:accessURL as a checksumClass rdf:nodeID. Please comment preferred option.

@jakubklimek

This comment has been minimized.

Copy link

commented Jul 23, 2019

rdf:nodeID is a blank-node identifier specific to the RDF/XML serialization, it does not solve the inherent issue in the RDF modeling, which is, that a single dcat:Distribution can point to multiple files using dcat:accessURL and dcat:downloadURL, and there is no further metadata for those files.

In the Czech Republic, for instance, we only allow one value of dcat:downloadURL, so that all metadata attached to the distribution applies to that one file. Then, we have no such problem, as the only checksum attached to the distribution is the checksum of the only file referenced in dcat:downloadURL.

@init-dcat-ap-de Do you have a use case where you need multiple files to be referenced from a single distribution, in contrast to having multiple distributions/datasets each pointing to a single file, therefore allowing for a better metadata description of each cataloged file?

@init-dcat-ap-de

This comment has been minimized.

Copy link
Author

commented Jul 25, 2019

Using rdf:nodeID might help us.
We do not have use cases where accessURL points to several files. If so, we would zip those files and create the hash algorithm value on the ZIP-file.

@jakubklimek

This comment has been minimized.

Copy link

commented Jul 25, 2019

@init-dcat-ap-de OK, that's two EU countries. Then maybe DCAT-AP could define a restriction that dcat:accessURL and dcat:downloadURL can have only one value, which could help even more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.