New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indication dataset being part of NSIP/ERPD, similarly to HVD datasets #260
Comments
@jakubklimek thanks for pointing the initiative out. From the reading of the proposal I do not yet see that data.europa.eu would be chosen as the ESAP implementation. But I agree that probably the same data is also reported on both portals, and in that case I agree that the metadata should be aligned. |
@bertvannuffelen that seems like another, maybe a follow-up initiative for specific domains. Hence the rush to cover the necessary additional metadata by existing DCAT-AP properties, see the proposed |
@jakubklimek thanks for the documentation. But I see it now: it is about the Data Governance Act: https://eur-lex.europa.eu/legal-content/EN/HIS/?uri=CELEX:52020PC0767. This one is final. And it is about another ESAP. And here I was aware that data.europa.eu would take up the role as ESAP (but not the planning). Roughly stated, the DGA applies to all data that falls the scope of the PSI directive (Directive (EU) 2019/1024) (see article 3). This distinction is already part of DCAT-AP: the public accessibility ( http://purl.org/dc/terms/accessRights) has a codelist http://publications.europa.eu/resource/authority/access-right In there, PUBLIC corresponds to the PSI directive, while all other values correspond to the DGA (maybe excluded CONFIDENTIAL).
Essentially addressing the DGA and the PSI comes down to making the accessRights mandatory with this codelist. But I read in the harvesting requirements that both portals should have a separate catalogue (either explicit or implicit). And implicitly it seems that it is assumed they are disjoint. Personally, that last point I would not have made. There might exists combinations/mixtures of both and thus a grey zone. In many cases the world is not black or white. If all datasets have dct:accessRights filled in, then it should be clear which legislation is applicable. And alternative to taking this into account, is that we include a property applicable_legislation with as range ELI . |
@bertvannuffelen the information about the accessRights codelist and its connection to PSI and DGA is new information for me. On a side note, the Either way, it should be established somewhere, if this is the case and that, in fact, this should be used to make the distinction, that PUBLIC means open data and something else DGA, etc. Which is, unfortunately, not the case now, and it seems that the distinction is currently left to the national portals to implement, and then just inform data.europa.eu of how this is indicated. I like the idea with applicable legislation, but that would be also applicable to HVDs, but there the approach is to use the HVD value from the Dataset Type codelist in |
I agree that the DGA is after DCAT-AP exists. The definition is today for this property in DCAT-AP This property refers to information that indicates whether the Dataset is open data, has access restrictions or is not public. In general the PSI directive concerned itself with those that are (or should be) PUBLIC. For those that aren't: they are the candidates to be investigated if they are subject to the PSI. But in general, there was no obligation from the PSI to share any knowledge about them. But that is what the DGA now captures. Here we come to what is the basis to set PUBLIC. There might be differences from one MS to another MS in this interpretation. But in the end, in each usage context, the accessRights will relate to the scoped statements such as in article 3 of the DGA. As MS I would consider each dataset that is not set to PUBLIC as a candidate for the DGA. Because PSI is about data that should be accessible and reusable with no (or minimal) limits. The DGA is about datasets that have some access restrictions. In a simplistic usage of this property it is very binary: PUBLIC or NON_PUBLIC. You cannot be "unknown", because that would mean that the legal usage rights are unknown, and thus one can question at all why to publish that dataset. Another basis for deciding the value could be technical reasons, as a summary of a technical access situation. But then the other codelist values such as SENSITIVE and CONFIDENTIAL are meaningless. They are legislative/business notions and not technical.
It is a Czech decision, to rule out that case. But the choice of that decision is not by the implementer, but by the PSI and DGA legislation. If I have statistical data of my population, the API implementer may want to have a registration key, but the PSI will make it open data by law, and thus, the Open data rules apply, and in the Czech case it means an API without registration key. For that reason, in Flanders, BE, the decision on accessRights is not made on any technical arguments, but on legislative arguments. Does the legislation allow to restrict access to the data, if not it is PUBLIC. Probably the Czech usage and the Belgium usage are very close in the practice. But where the Czech open data implementing guidelines drive a step further than Belgium today. Note that the codelist does not impose Czech should allow that case. But it is allowed.
I read that in the documentation, indeed.
The DCAT-AP annex for HVD has not been finalized, nor adopted and thus this could be included. The dct:type was quicker introduced that discussed with the whole WG. |
OK, now I understand the reasoning behind But my point is elsewhere - wouldn't it be cleaner to use something very specifically aligned with the legislation, saying "this dataset is part of NSIP, follows DGA" (and similarly for the conditions for reuse in #259) rather than re-using the pre-existing accessRights property with a controlled vocabulary, that might have been used for something else, and saying that unless there is "PUBLIC" in that item, it should be interpreted as "NSIP/DGA", even though this connection is not explicitly made anywhere? Even though it might be in some cases a bit redundant with respect to |
This explicit indication whether a dataset is declared in scope of an legislation. Yes, that seems to be a recurring request. So that is a good proposal to include in DCAT-AP. |
OK, and could we try to come up with a specific proposal that could be adopted by NSIPs even though the final version will have to wait for the next DCAT-AP release? <DGA_dataset> m8g:applicableLegislation <http://data.europa.eu/eli/reg/2022/868/oj> # DGA .
<HVD_dataset> m8g:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> # HVD. |
Just adding two aspects into the discussion:
|
This is just to let you know that we have successfully implemented Czech NSIP as part of the Czech open data catalog, accessible via SPARQL endpoint, and it is already being harvested by data.europa.eu. The implemented criterion for distinction of open data and NSIP data is according to this discussion, i.e. @prefix dcatap: <http://data.europa.eu/r5r/> .
<DGA_dataset> dcatap:applicableLegislation <http://data.europa.eu/eli/reg/2022/868/oj> # DGA .
<PSI_dataset> dcatap:applicableLegislation <http://data.europa.eu/eli/dir/2019/1024/oj> # PSI (open data) . and we plan to do the HVDs in the same way: <HVD_dataset> dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> # HVD. |
Thanks for the information @jakubklimek! Just to confirm - do I assume correctly that the element optional and repeatable in the context of the open data portal? Just to add, that some of our agencies have also requested to have "INSPIRE" solved using the same pattern, and we are also thinking about the addition of an "ODD" option. All together this would allow agencies to "tag" their datasets, distributions and services in quite a few different was in parallel. |
applicableLegislation has been included in DCAT-AP 3 as additional property. The update of the guidelines for ERPD will be addressed in collaboration with CNECT. |
In NSIP/ERPD, there is an (urgent - 2 weeks) need identified on how to denote datasets that are not open, but accessible through NSIP/ERPD(DGA implentation).
One suggestion is adding a type to https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/dataset-type, similarly to HVD.
Otherwise, data.europa.eu seems to be set on implementing this in a proprietary way indicated by the individual NSIPs, which will IMHO severly hinder interoperability. The argument is that 2 weeks is not enough time to come up with a proper solution in communication with SEMIC 🤷🏻♂️.
The text was updated successfully, but these errors were encountered: