Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indication dataset being part of NSIP/ERPD, similarly to HVD datasets #260

Open
jakubklimek opened this issue Feb 28, 2023 · 13 comments
Open
Labels
release:3.0.0 https://semiceu.github.io/DCAT-AP/releases/3.0.0 status:fixed This issue has been fixed in a draft. status:resolution-proposed

Comments

@jakubklimek
Copy link
Contributor

jakubklimek commented Feb 28, 2023

In NSIP/ERPD, there is an (urgent - 2 weeks) need identified on how to denote datasets that are not open, but accessible through NSIP/ERPD(DGA implentation).

One suggestion is adding a type to https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/dataset-type, similarly to HVD.

Otherwise, data.europa.eu seems to be set on implementing this in a proprietary way indicated by the individual NSIPs, which will IMHO severly hinder interoperability. The argument is that 2 weeks is not enough time to come up with a proper solution in communication with SEMIC 🤷🏻‍♂️.

@bertvannuffelen
Copy link
Contributor

@jakubklimek thanks for pointing the initiative out.
Is this about this legal initiative: https://eur-lex.europa.eu/legal-content/EN/HIS/?uri=CELEX:52021PC0723 ?
Then according to my reading of process state is still in discussion and not yet finalised.

From the reading of the proposal I do not yet see that data.europa.eu would be chosen as the ESAP implementation. But I agree that probably the same data is also reported on both portals, and in that case I agree that the metadata should be aligned.

@jakubklimek
Copy link
Contributor Author

jakubklimek commented Mar 1, 2023

@bertvannuffelen that seems like another, maybe a follow-up initiative for specific domains.
From the yesterday's webinar on the topic, the generic ESAP is in a much more advanced state. It is being implemented as part of data.europa.eu, reusing its harvesting processes, and by September 23rd, all national catalogs containing NSIP datasets (non-opendata datasets) need to be functional, registered and harvested by data.europa.eu.

Hence the rush to cover the necessary additional metadata by existing DCAT-AP properties, see the proposed
Technical recommendations for member states v1.1

@bertvannuffelen
Copy link
Contributor

@jakubklimek thanks for the documentation. But I see it now: it is about the Data Governance Act: https://eur-lex.europa.eu/legal-content/EN/HIS/?uri=CELEX:52020PC0767. This one is final.

And it is about another ESAP. And here I was aware that data.europa.eu would take up the role as ESAP (but not the planning).

Roughly stated, the DGA applies to all data that falls the scope of the PSI directive (Directive (EU) 2019/1024) (see article 3).
Despite DCAT-AP has been designed from the perspective of the PSI directive, in the practice it does not prevent expressing datasets under DGA.

This distinction is already part of DCAT-AP: the public accessibility ( http://purl.org/dc/terms/accessRights) has a codelist http://publications.europa.eu/resource/authority/access-right

In there, PUBLIC corresponds to the PSI directive, while all other values correspond to the DGA (maybe excluded CONFIDENTIAL).
Thus

  • if your dataset has dct:accessRights PUBLIC then it should follow the PSI directive
  • if it has another value then it should follow the DGA.

Essentially addressing the DGA and the PSI comes down to making the accessRights mandatory with this codelist.

But I read in the harvesting requirements that both portals should have a separate catalogue (either explicit or implicit). And implicitly it seems that it is assumed they are disjoint. Personally, that last point I would not have made. There might exists combinations/mixtures of both and thus a grey zone. In many cases the world is not black or white.

If all datasets have dct:accessRights filled in, then it should be clear which legislation is applicable.

And alternative to taking this into account, is that we include a property applicable_legislation with as range ELI .
And then in the context of the different legislative requirements catalogues should provide a specific ELI.

@jakubklimek
Copy link
Contributor Author

@bertvannuffelen the information about the accessRights codelist and its connection to PSI and DGA is new information for me.
DCAT-AP states in the usage note This property refers to information that indicates whether the Dataset is open data, has access restrictions or is not public., but nowhere in DCAT-AP or the codelist I see a mention about PSI and DGA, nor the connection to accessRights - which, IMHO could have been used in current data portals in a different context, even before DGA was created.

On a side note, the PUBLIC value has a usage note Usage note: Permissible obstacles include registration and request for API keys, as long as anyone can request such registration and/or API keys. which is not compatible with the Czech definition of open data, where we do not permit APIs accessible only using API keys to be labeled as open data.

Either way, it should be established somewhere, if this is the case and that, in fact, this should be used to make the distinction, that PUBLIC means open data and something else DGA, etc. Which is, unfortunately, not the case now, and it seems that the distinction is currently left to the national portals to implement, and then just inform data.europa.eu of how this is indicated.

I like the idea with applicable legislation, but that would be also applicable to HVDs, but there the approach is to use the HVD value from the Dataset Type codelist in dct:type, which seems like an inconsistency.

@bertvannuffelen
Copy link
Contributor

@bertvannuffelen the information about the accessRights codelist and its connection to PSI and DGA is new information for me. DCAT-AP states in the usage note This property refers to information that indicates whether the Dataset is open data, has access restrictions or is not public., but nowhere in DCAT-AP or the codelist I see a mention about PSI and DGA, nor the connection to accessRights - which, IMHO could have been used in current data portals in a different context, even before DGA was created.

I agree that the DGA is after DCAT-AP exists. The definition is today for this property in DCAT-AP This property refers to information that indicates whether the Dataset is open data, has access restrictions or is not public.

In general the PSI directive concerned itself with those that are (or should be) PUBLIC. For those that aren't: they are the candidates to be investigated if they are subject to the PSI. But in general, there was no obligation from the PSI to share any knowledge about them. But that is what the DGA now captures.

Here we come to what is the basis to set PUBLIC.
Using a legislative perspective and applied to datasets, any dataset for which one does not have a legal basis to reduce the access (being personal data, or national security, etc. for a list see https://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess), the dataset must be Open Data. In that case PUBLIC is used.

There might be differences from one MS to another MS in this interpretation. But in the end, in each usage context, the accessRights will relate to the scoped statements such as in article 3 of the DGA.

As MS I would consider each dataset that is not set to PUBLIC as a candidate for the DGA. Because PSI is about data that should be accessible and reusable with no (or minimal) limits. The DGA is about datasets that have some access restrictions.

In a simplistic usage of this property it is very binary: PUBLIC or NON_PUBLIC. You cannot be "unknown", because that would mean that the legal usage rights are unknown, and thus one can question at all why to publish that dataset.

Another basis for deciding the value could be technical reasons, as a summary of a technical access situation. But then the other codelist values such as SENSITIVE and CONFIDENTIAL are meaningless. They are legislative/business notions and not technical.

On a side note, the PUBLIC value has a usage note Usage note: Permissible obstacles include registration and request for API keys, as long as anyone can request such registration and/or API keys. which is not compatible with the Czech definition of open data, where we do not permit APIs accessible only using API keys to be labeled as open data.

It is a Czech decision, to rule out that case.
In that case the API implementer of an Open Dataset in Czech must to apply to the Czech guideline provide an API without registration which is fine.
In the other case when the API implementer does not do that, and an API key enforces, then you consider this dataset as an
dataset outside the PSI directive and thus, now it falls under the DGA.

But the choice of that decision is not by the implementer, but by the PSI and DGA legislation. If I have statistical data of my population, the API implementer may want to have a registration key, but the PSI will make it open data by law, and thus, the Open data rules apply, and in the Czech case it means an API without registration key.
It is the nature of the dataset that decides, not the metadata editors or API implementers.
(In the practice, maybe there is some self declaration aspect here: if I consider it Open Data then I have more work or risks, and less income, so I consider it closed until someone complains. But even that is now covered with the DGA, as it makes this almost a binary choice.)

For that reason, in Flanders, BE, the decision on accessRights is not made on any technical arguments, but on legislative arguments. Does the legislation allow to restrict access to the data, if not it is PUBLIC.
Meaning that is the API would allow a key registration and this registration would introduce possible access restrictions up-front, then this can be subject for a complain.
In Flanders, the label PUBLIC and NON_PUBLIC have been for that reason changed into "access without conditions" and "access with conditions" where conditions mean conditional check is performed by some procedure to give you access. And the last option cannot be used in case the dataservice is the sole one providing access to an dataset that is Open Data.

Probably the Czech usage and the Belgium usage are very close in the practice. But where the Czech open data implementing guidelines drive a step further than Belgium today.

Note that the codelist does not impose Czech should allow that case. But it is allowed.

Either way, it should be established somewhere, if this is the case and that, in fact, this should be used to make the distinction, that PUBLIC means open data and something else DGA, etc. Which is, unfortunately, not the case now, and it seems that the distinction is currently left to the national portals to implement, and then just inform data.europa.eu of how this is indicated.

I read that in the documentation, indeed.

I like the idea with applicable legislation, but that would be also applicable to HVDs, but there the approach is to use the HVD value from the Dataset Type codelist in dct:type, which seems like an inconsistency.

The DCAT-AP annex for HVD has not been finalized, nor adopted and thus this could be included. The dct:type was quicker introduced that discussed with the whole WG.

@jakubklimek
Copy link
Contributor Author

jakubklimek commented Mar 1, 2023

OK, now I understand the reasoning behind accessRights. Since it is an optional property and we did not need it for anything (everything in the Czech Open Data Portal is open data), we do not have this implemented.

But my point is elsewhere - wouldn't it be cleaner to use something very specifically aligned with the legislation, saying "this dataset is part of NSIP, follows DGA" (and similarly for the conditions for reuse in #259) rather than re-using the pre-existing accessRights property with a controlled vocabulary, that might have been used for something else, and saying that unless there is "PUBLIC" in that item, it should be interpreted as "NSIP/DGA", even though this connection is not explicitly made anywhere? Even though it might be in some cases a bit redundant with respect to accessRights?

@bertvannuffelen
Copy link
Contributor

OK, now I understand the reasoning behind accessRights. Since it is an optional property and we did not need it for anything (everything in the Czech Open Data Portal is open data), we do not have this implemented.

But my point is elsewhere - wouldn't it be cleaner to use something very specifically aligned with the legislation, saying "this dataset is part of NSIP, follows DGA" (and similarly for the conditions for reuse in #259) rather than re-using the pre-existing accessRights property with a controlled vocabulary, that might have been used for something else, and saying that unless there is "PUBLIC" in that item, it should be interpreted as "NSIP/DGA", even though this connection is not explicitly made anywhere? Even though it might be in some cases a bit redundant with respect to accessRights?

This explicit indication whether a dataset is declared in scope of an legislation. Yes, that seems to be a recurring request.
I get more and more the feeling we should be rather explicit about which legislation, and thus relying on ELIs could be our answer. In that way, we can be future proof, and are clear to the readers of the metadata which legislation is applied.
And not via-via.

So that is a good proposal to include in DCAT-AP.

@jakubklimek
Copy link
Contributor Author

OK, and could we try to come up with a specific proposal that could be adopted by NSIPs even though the final version will have to wait for the next DCAT-AP release?
Something like

<DGA_dataset> m8g:applicableLegislation <http://data.europa.eu/eli/reg/2022/868/oj> # DGA .
<HVD_dataset> m8g:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> # HVD.

@kuldaraas
Copy link

Just adding two aspects into the discussion:

  • first, EC is rather actively pushing the concept of data spaces, meaning that there will soon be additional legally grounded types (like Health Data Space);
  • however, the facets of DGA (vs PSI and HVD; and dataspace (Health vs Logistics vs Tourism etc) are not exclusive. Any solution should allow to state that we have a dataset that is "Health, PSI-HVD"; or "Health, DGA".

@jakubklimek jakubklimek changed the title Indication of NSIP/ESAP dataset by dct:type, similarly to HVD datasets Indication dataset being part of NSIP/ESAP, similarly to HVD datasets Mar 6, 2023
@jakubklimek jakubklimek changed the title Indication dataset being part of NSIP/ESAP, similarly to HVD datasets Indication dataset being part of NSIP/ERPD, similarly to HVD datasets Sep 15, 2023
@jakubklimek
Copy link
Contributor Author

This is just to let you know that we have successfully implemented Czech NSIP as part of the Czech open data catalog, accessible via SPARQL endpoint, and it is already being harvested by data.europa.eu. The implemented criterion for distinction of open data and NSIP data is according to this discussion, i.e.

@prefix dcatap: <http://data.europa.eu/r5r/> .
<DGA_dataset> dcatap:applicableLegislation <http://data.europa.eu/eli/reg/2022/868/oj> # DGA .
<PSI_dataset> dcatap:applicableLegislation <http://data.europa.eu/eli/dir/2019/1024/oj> # PSI (open data) .

and we plan to do the HVDs in the same way:

<HVD_dataset> dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> # HVD.

@kuldaraas
Copy link

Thanks for the information @jakubklimek! Just to confirm - do I assume correctly that the element optional and repeatable in the context of the open data portal? Just to add, that some of our agencies have also requested to have "INSPIRE" solved using the same pattern, and we are also thinking about the addition of an "ODD" option. All together this would allow agencies to "tag" their datasets, distributions and services in quite a few different was in parallel.

@jakubklimek
Copy link
Contributor Author

@bertvannuffelen
Copy link
Contributor

applicableLegislation has been included in DCAT-AP 3 as additional property. The update of the guidelines for ERPD will be addressed in collaboration with CNECT.

@bertvannuffelen bertvannuffelen added the status:fixed This issue has been fixed in a draft. label Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:3.0.0 https://semiceu.github.io/DCAT-AP/releases/3.0.0 status:fixed This issue has been fixed in a draft. status:resolution-proposed
Projects
None yet
Development

No branches or pull requests

3 participants