Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HVD: What is the exact meaning of the mandatory applicableLegislation (HVD IR) on a mixed Catalog? #282

Closed
jakubklimek opened this issue Sep 7, 2023 · 13 comments

Comments

@jakubklimek
Copy link
Contributor

In DCAT-AP-HVD, applicableLegislation is mandatory on Catalog, and fixed to the HVD IR. The usage note says:

For HVD the value must include the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj.
As multiple legislations may apply to the resource the maximum cardinality is not limited.

Does this indicate that "this catalog may contain some HVD datasets" for the case of one open data catalog where some datasets fall under HVD?
Or does it say that we need a separate HVD catalog that contains only HVD datasets?

I presume it is the first one, also based on the "Scope" part:

To understand these guidelines, it is important to realise that the HVD IR applies to a subset of all the datasets that are collected by (Open) Data Portals in Europe. A single catalogue contains catalogued resources which are within and outside scope of the HVD IR.

but it is not clear to me from the usage note, maybe it could be stated more explicitly.

@matthiaspalmer
Copy link

I also noticed this and think it needs to be clarified.

I suggest that the current definition of applicableLegislation for the catalog:

The legislation that mandates the creation or management of the Catalogue.

is changed to:

A legislation that mandates the creation or managment of some of the resources included in this Catalog.

This assumes you need to provide this "HVD marking" even if only a single dataset / dataservice in the catalog falls under the HVD legislation. The usage note could be changed to clarify this further:

For catalogs that contain HVD resources (as is the scope of this annex) it is mandatory to indicate this by pointing to the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj via applicableLegislation.
As multiple legislations may apply to the resource the maximum cardinality is not limited.

On the other hand, if a "HVD marking" only should be provided if ALL datasets / dataservices are HVD, then I do not think it makes sense to make applicableLegislation mandatory on the catalog level. (Otherwise the annex won't be useful for catalogs that contain some HVD resources and some non-HVD resources.)

@bertvannuffelen bertvannuffelen added the HVD topics releated to the HVD implementation label Oct 11, 2023
@bertvannuffelen
Copy link
Contributor

I think here the specification can be more improved.

Consider your current DCAT-AP catalogue.

ms:catalogue a dcat:Catalogue
    dcat:dataset ms:dataset1, ms:dataset2
    dcat:service ms:service1.

ms:dataset1 a dcat:Dataset

ms:dataset2 a dcat:Dataset

ms:service1 a dcat:DataService
      dcat:servesDataset ms:dataset2.

now the HVD impacts ms:dataset2 and ms:service1. The responsible dataset and dataservice publisher adapt their DCAT-AP specifications resulting in:

ms:catalogue a dcat:Catalogue
    dcat:dataset ms:dataset1, ms:dataset2
    dcat:service ms:service1.

ms:dataset1 a dcat:Dataset
    r5r:applicableLegislation eli:2023/138/oj

ms:dataset2 a dcat:Dataset

ms:service1 a dcat:DataService
    r5r:applicableLegislation eli:2023/138/oj
    dcat:servesDataset ms:dataset2.

According to the HVD IR an improved metadata have been achieved and thus the objective is reached.

As a next step in the process of HVD: the MS policy officer has to provide an official reporting to the EC.
This report is the collection of all metadata that is associated with the datasets that reported by is MS data publishers as HVD.

This can be done by creating an extract from the MS DCAT-AP catalogue by selecting only the entities that are relevant for the HVD reporting: lets call this extract the MS HVD catalogue.
It is this catalogue that is represented in the DCAT-AP HVD.

@bertvannuffelen
Copy link
Contributor

The improvement that could be made is thus to show that the DCAT-AP catalogue and the DCAT-AP HVD catalogue are distinct entities.

@jakubklimek
Copy link
Contributor Author

now the HVD impacts ms:dataset2 and ms:service1

Did you omit ms:dataset1 (it has applicableLegislation) or did you mean ms:dataset2 because it is served by ms:service1, even though it does not have applicableLegislation?

Anyway, it seems that applicableLegislation on catalogue is meant to be used with catalogues that contain ONLY (and ALL?) HVD datasets and services, not SOME.

This means that such extract MS HVD catalogues always need to be materialized for reporting purposes (and compliance with DCAT-AP HVD), only then they can be tagged with applicableLegislation pointing to HVD IR, and existing catalogues cannot be directly used for that purpose, even if they contain all the necessary HVD datasets among others.

I would have to actually create:

ms:HVDcatalogue a dcat:Catalogue ;
    r5r:applicableLegislation eli:2023/138/oj ;
    dcat:dataset ms:dataset1;
    dcat:service ms:service1.

ms:service1 a dcat:DataService ;
    r5r:applicableLegislation eli:2023/138/oj ;
    dcat:servesDataset ms:dataset2.

This seems somehow inconsistent with how other open data (PSI) and protected data (DGA) are being harvested by data.europa.eu (and here, the purpose of harvesting and reporting seems similar to me), where there is an option to either create separate catalogs for PSI and DGA or provide a filtering mechanism (e.g. based on applicableLegislation) to distinguish those, and therefore it is the receiver of the report who would do the filtering.

Similarly, it seems inconsistent with (at least how I interpret it) the rules on HVD scope, Datasets and Distributions, where it seems that the agreement is that an HVD dataset can contain non-HVD distributions, so why an HVD catalog cannot contain non-HVD datasets?

Or do we view this case in a way where only the HVD distributions of HVD datasets are in the scope of DCAT-AP HVD and the non-HVD distributions are not in the scope of DCAT-AP HVD, even though they are distributions of datasets that are in the scope of DCAT-AP HVD? And therefore, in the reporting catalogue, only distributions and datasets in the HVD scope would be extracted, and that is why DCAT-AP HVD does not concern itself with datasets that contain both HVD and non-HVD distributions?

Additionally, If for reporting purposes, I need to create a separate catalogue and tell someone that that one is the reporting one, the presence of the applicableLegislation on it seems a bit ambiguous - it would only confirm in a machine-readable way that it indeed is the reporting one.

On the other hand, if having the applicableLegislation pointing to HVD IR would indicate that the catalog contains (may contain) some HVD datasets (not only/all HVD datasets), it may be viewed as a weak statement. So maybe applicableLegislation can be omitted altogether on the catalogue level?

@MattGroenewald
Copy link

I would agree dropping applicableLegislation on catalogue level as mandatory to allow for catalogues to contain HVD and non-HVD datasets to be be in the scope of DCAT-AP HVD. It could be used optionally on catalogue level to confirm this catalogue only contains HVD resources.
Creating a MS HVD catalogue for reporting purposes and optionally confirming it as such in a machine readable fashion by adding applicableLegislation would then still be possible. I see no benefit in stating a catalogue contains some HVD resources, less so it being mandatory.

@bertvannuffelen
Copy link
Contributor

I think there is more agreement than it maybe seems.

There are two cases here:
a) A general DCAT-AP catalogue can contain both HVD and non-HVD catalogued resources.
b) A reporting catalogue is a DCAT-AP catalogue that only contains information that is subject to the HVD reporting.

This yields the following questions:

  • Do we represent both in DCAT-AP HVD?
  • If we represent only one, how is then the other represented?

I personally would prefer we could find a representation where the reporting catalogue is part of the specification. The motivation for this is, that it should be clear that both are distinct yet connected.

That is the reason why I wrote this could be made more clear.
Today the current document DCAT-AP HVD represents a reporting catalogue.

So I questioning myself: what would then be the adequate representation for this?

<Reporting Catalogue> owl:subclassOf <DCAT-AP Catalogue> : A Reporting Catalogue will follow the general property requirements as specified by DCAT-AP.

<DCAT-AP Catalogue> dcat:catalog <Reporting Catalogue>: A Reporting catalogue is a part of a DCAT-AP catalogue.

@jakubklimek
Copy link
Contributor Author

jakubklimek commented Oct 12, 2023

With <DCAT-AP Catalogue> dcat:catalog <Reporting Catalogue> based on this discussion the HVD datasets could not be also in <DCAT-AP Catalogue>, maybe more appropriate relation (or an additional alternative) would be <DCAT-AP Catalogue> dct:hasPart <Reporting Catalogue>.

I presume that by owl:subclassof you actually mean something like r5r:HVDReportingCatalog rdfs:subClassOf dcat:Catalog? i.e. relation on the class level, not instance level? I quite like this solution, however, if applicableLegislation would remain mandatory for the instance, the information would be there twice - once as

r5r:HVDReportingCatalog rdfs:subClassOf dcat:Catalog

and then as

<reporting catalog> a r5r:HVDReportingCatalog;
    r5r:applicableLegislation eli:2023/138/oj .

right?

And lets not forget the original question of whether

<dcat-ap catalog> a dcat:Catalog;
    r5r:applicableLegislation eli:2023/138/oj .

means that this is the reporting catalog, or that some of the datasets are in scope of HVD IR.

@oystein-asnes
Copy link

Many datasets in public sector has applicable legislations in some sense. To be able to express this, is useful beyond HVDs. I whish for a core DCAT-AP feature for this. By hiding it in the HDV profile we suggest it does not apply for non-HVDs.

Over at BRegDCAT-AP it looks like this:
| cpsv:follows | range=cpsv:Rule | This property links a Dataset to the Rule that defines its legal basis. |

  1. Is there a reason why the BRegDCAT-AP approach is not adopted here ?
  2. Can r5r:applicableLegislation be used on non-HVDs without breaking the scope?

@bertvannuffelen
Copy link
Contributor

Many datasets in public sector has applicable legislations in some sense. To be able to express this, is useful beyond HVDs. I whish for a core DCAT-AP feature for this. By hiding it in the HDV profile we suggest it does not apply for non-HVDs.

Over at BRegDCAT-AP it looks like this: | cpsv:follows | range=cpsv:Rule | This property links a Dataset to the Rule that defines its legal basis. |

1. Is there a reason why the BRegDCAT-AP approach is not adopted here ?

follows is a "similar" property for a Public Service as domain. That is a whole different settting. See https://semiceu.github.io/CPSV-AP/releases/3.1.0/#Public%20Service%3Afollows .

2. Can `r5r:applicableLegislation` be used on non-HVDs without breaking the scope?

Actually option 2. is not in action. It have been lifted in the draft proposal https://semiceu.github.io/DCAT-AP/releases/3.0.0-draft/#Dataset.applicablelegislation. It has also be discussed in #286.

@oystein-asnes If you have any question regarding the lifting of this property I would like that you use the issue #286 or #260.
I would like to keep the discussion here focussed on the nature of the catalogues we are considering: Reporting catalogues versus the MS Data catalogue.

@oystein-asnes
Copy link

My bad @bertvannuffelen . I was not aware that applicableLegislation is included in DCAT-AP. I rest my case.

@bertvannuffelen
Copy link
Contributor

I quite like this solution, however, if applicableLegislation would remain mandatory for the instance, the information would be there twice

<reporting catalog> a r5r:HVDReportingCatalog;
    r5r:applicableLegislation eli:2023/138/oj .

right?

yes, that is the consequence. But that is the consequence of subclassing where the nature corresponds to the value of a single property. (See our other discussion on entity profiles.) I am as such not to much concerned about it. And I would leave this discussion for that abstract challenge, where this is another example.

And lets not forget the original question of whether

<dcat-ap catalog> a dcat:Catalog;
    r5r:applicableLegislation eli:2023/138/oj .

means that this is the reporting catalog, or that some of the datasets are in scope of HVD IR.

Given that the second reading is actually covered by a DCAT-AP catalogue, I would consider it as the "reporting catalogue".
For me the creation of a reporting catalogue is either an outcome of an additional effort for HVD implementation, either it is actively managed entity within a MS catalogue ecosystem, and in both cases a single unique value is exactly identifying this case.
Note that I rely on this interpretation that the datasets and other catalogued resources have a PURI to reference to.

While writing this, I do not think there will be soon a case where a catalogue has multiple values for applicableLegislation. Because it would mean that the catalogued resources in that catalogue must satisfy eli:1 and eli:2. Such subsets of catalogues will probably not be reflected in a legal context.

@jakubklimek
Copy link
Contributor Author

While writing this, I do not think there will be soon a case where a catalogue has multiple values for applicableLegislation. Because it would mean that the catalogued resources in that catalogue must satisfy eli:1 and eli:2. Such subsets of catalogues will probably not be reflected in a legal context.

Here we are getting to the core of my original question. Again, you indicate that the applicableLegislation on an instance of dcat:Catalog means that all datasets (and resources in general) within that catalog also have the same applicableLegislation, which would make it somehow redundant. But this is not how I understood the usage of applicableLegislation on a catalog up to now, not even based on the definition in the DCAT-AP 3.0.0 draft

To illustrate, I used this for the Czech catalog:

@prefix dcat:    <http://www.w3.org/ns/dcat#> .
@prefix dcatap:     <http://data.europa.eu/r5r/> .
<https://data.gov.cz/zdroj/katalog/NKOD> a dcat:Catalog ;
    dcatap:applicableLegislation
      <http://data.europa.eu/eli/reg_impl/2023/138/oj> , # HVD
      <http://data.europa.eu/eli/dir/2019/1024/oj>, # PSI
      <http://data.europa.eu/eli/reg/2022/868/oj> ; # DGA
   dcat:dataset <datasetHVD>, <datasetDGA> .

<datasetHVD> a dcat:Dataset ;
   dcatap:applicableLegislation 
    <http://data.europa.eu/eli/reg_impl/2023/138/oj> , # HVD
    <http://data.europa.eu/eli/dir/2019/1024/oj> . # PSI

<datasetDGA> a dcat:Dataset ;
  dcatap:applicableLegislation <http://data.europa.eu/eli/reg/2022/868/oj> . # DGA

What I mean by that is that this catalog contains some open data (PSI), some High-Value datasets (HVD) and some protected data (DGA), i.e. PSI, DGA and HVD mandate the creation of a data catalog, and it is this one for all three cases. Not that all datasets in that catalog are created because of all the three legislations at once.

<datasetHVD> is an open dataset and also an HVD one, and <datasetDGA> is part of the Czech NSIP (protected data).

However, based on the presumption that all applicableLegislations on a catalog need to apply to all registered datasets (or resources in general), this would mean that if I use <https://data.gov.cz/zdroj/katalog/NKOD> dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> #HVD, this is the reporting catalog, and therefore all datasets in that catalog need to be HVDs and therefore also need to have <datasetHVD> <http://data.europa.eu/eli/reg_impl/2023/138/oj> . # HVD.

Then this seems redundant to me, and also it seems that

  1. I can never mix PSI, HVD, DGA and other datasets in a single catalog annotated by applicableLegislation
  2. I can never annotate a catalog with applicableLegislation if it mixes different kinds of datasets, i.e. unless it is a "single-applicable-legislation" catalog
  3. It is also different from how I understand HVD Datasets should be denoted in a DCAT-AP catalog, i.e. they are tagged with dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> , # HVD, but that should not mean that all its distributions are the ones mandated by HVD IR.

So, shouldn't <DCAT-AP catalog> dcatap:applicableLegislation <http://data.europa.eu/eli/reg_impl/2023/138/oj> , # HVD rather indicate that there are SOME HVD datasets in the catalog? We could leave the indication of THE reporting catalog to subclassing, i.e.:

r5r:HVDReportingCatalog rdfs:subClassOf dcat:Catalog .
<reporting catalog> a r5r:HVDReportingCatalog .

Which would make the subclassing approach also not redundant with

<reporting  catalog> a r5r:HVDReportingCatalog;
    r5r:applicableLegislation eli:2023/138/oj .

@MattGroenewald
Copy link

I could also see the use case in a mixed catalog as described by @jakubklimek

On a national level we already have dcatde:legalBasis which in past similarly allowed to indicate a legal scope of a resource, eg. PSI.

Though to me having the reporting catalogue as a subset via <DCAT-AP Catalogue> dct:hasPart <Reporting Catalogue> seems preferable. Through <Reporting catalog> r5r:applicableLegislation eli:2023/138/oj it could be marked which legislation is covered by the catalog, still catalogs could be easily combine and merged on need via dcat:catalog.
I do not see a use case for explicitly identifying a reporting catalogue via subclassing.

@bertvannuffelen bertvannuffelen added release-2.2.0-hvd-dec2023 and removed HVD topics releated to the HVD implementation labels Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants