Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Achieve compatibility with DCAT 2 #109

Open
clange opened this issue Oct 10, 2019 · 16 comments · Fixed by #270
Open

Achieve compatibility with DCAT 2 #109

clange opened this issue Oct 10, 2019 · 16 comments · Fixed by #270
Assignees
Labels
enhancement ticket proposing an improvement, extension of exisitng or new features status:open issue has been submitted or re-opened recently, waiting for assigment to owner

Comments

@clange
Copy link
Member

clange commented Oct 10, 2019

The W3C Data Exchange Working Group (https://www.w3.org/2017/dxwg/) is working on DCAT 2. DCAT 2 made it to the Candidate Recommendation stage on 3 Oct 2019 (https://www.w3.org/TR/vocab-dcat-2/). Considering the

we should make sure that our information model is compatible with DCAT 2.

Some relevant features of DCAT 2 include:

  • more expressive description of temporal and spatial extent of a data resource (e.g., DCAT itself now implements what we modelled on our own using ids:temporalCoverage and ids:begin etc.
  • vocabulary to talk about relations of a resource to another resource (e.g. its original version)
  • a way to talk about the data quality of a resource, in terms of the W3C Data Quality Vocabulary
@clange clange added enhancement ticket proposing an improvement, extension of exisitng or new features status:open issue has been submitted or re-opened recently, waiting for assigment to owner labels Oct 10, 2019
@JohannesLipp
Copy link
Member

JohannesLipp commented May 12, 2020

Additions: We need to verify that...

  • state that the IDS infomodel conforms to DCAT (via PROF?)
  • all usages are indeed compatible with DCAT
  • both domains and ranges of all our classes are consistent w/ DCAT
  • (more roughly, because of the current status quo) reasoning and axioms are correct
  • if we use the same semantics, then we should add "equivalent" statements for clarification
  • double-check correct usage of import and copy; be consistent with explicit and implicit cases
  • semantic meaning to the presence of these axioms - they identify a subset recommended for use. Its general stronger to use formal constraints with something like SHACL to make statements about mandatory presence of properties in instance data, without declaring axioms on the underlying class itself (a frame based profile making a commitment about what data is accessible - as opposed to the open world assumption in OWL)
  • (possibly) study DCAT-AP and decide if we want to use it and how

@JohannesLipp JohannesLipp self-assigned this Jun 5, 2020
@JohannesLipp JohannesLipp linked a pull request Jun 5, 2020 that will close this issue
@JohannesLipp
Copy link
Member

JohannesLipp commented Jun 5, 2020

My first investigation is on where we currently use the dcat prefix. This is in 19 files (open tasks marked in bold):

  • Host
    • Status: Declaring the prefix, but not actually using it
    • Action: Remove unused prefix
      .
  • Artifact
    • Status: Using dcat:byteSize, which is fully compatible with DCAT-2.
    • Action: None
  • DigitalContent
    • Status: Using dcat:Dataset, dcat:theme, and dcat:keyword
    • Action: None (cf. the first bullet point in the comment below)
  • Representation
    • Status: Using dcat:Distribution, dcat:mediaType
    • Action: None
  • Resource
    • Status: ids:Resource rdfs:seeAlso dcat:Dataset, and is logically based on it
    • Action: None
  • VocabularyData
    • Status: VocabularyData rdfs:seeAlso dcat:Dataset
    • Action: None
      .
  • Standard
    • Status: Declaring the prefix, but not actually using it
    • Action: Remove unused prefix
      .
  • adms-20130801
  • csvw-20170523
  • dcat-20140116
    .
  • Catalog
    • Status: Extending dcat:Catalog as subClass
    • Action: None (see comment below)
  • Connector
    • Status: Declaring the prefix, but not actually using it
    • Action: Remove unused prefix
      .
  • HostShape, ArtifactShape, DigitalContentShape, RepresentationShape, ResourceShape, CatalogShape, and ConnectorShape
    • Status: Declaring the prefix, but not actually using it
    • Action: Remove unused prefix

Please note that for comparison, I refer to the DCAT-1 and DCAT-2 Turtle files [1,2] as well as their documentations [3,4].

[1] https://www.w3.org/ns/dcat2014.ttl
[2] https://www.w3.org/ns/dcat2.ttl
[3] https://www.w3.org/TR/2014/REC-vocab-dcat-20140116/
[4] https://www.w3.org/TR/vocab-dcat-2/

@JohannesLipp
Copy link
Member

JohannesLipp commented Jun 5, 2020

Detailed investigations and explanations:

  • dcat:theme: The range skos:Concept is the same for DCAT-2 and DCAT-1. DCAT-2 does not define the domain dcat:Dataset anymore, but still inherits the domain dct:subject from its super property. We define in ids:theme the domain ids:DigitalContent, which is a subclass of dcat:Dataset and a range ids:Concept, which is a subclass of skos:Concept. Decision via video conference: Because dcat:Dataset still matches our ids:DigitalContent best, we will not update it to extend dcat:Resource.
  • dcat:keyword: Same change as with dcat:theme - DCAT-2 refers to describing resources instead of datasets. No action needed, because dataset extends resource.
  • dcat:mediaType: Domain remains dcat:Distribution, but the range changed from MediaTypeOrExtent to the more restrictive dct:MediaType. If no IANA media can be referred, they still suggest to use dct:format instead. Seems like we do not take any actions here.
  • dcat:Dataset: DCAT-2 introduces the new class dcat:Resource, which is the "class of all cataloged resources" and a superclass of dcat:Dataset.
  • ids:Resource extends ids:DigitalContent, which again extends dcat:Dataset. Following the decision in the first bullet point, we will stick with this model.
  • dcat:Distribution and dcat:distribution: There is no major change in DCAT-2, and it still is dcat:Dataset-->dcat:Distribution. Due to materialization aspects, we do not properly extend this class in ids:representation and ids:Artifact. This is because we also have a more abstract resource (Representation as well as individual materialization (Artifact).
  • dcat:Catalog Changed in DCAT-2 from "a collection of metadata about datasets" to "[...] about resources". Please note an issue in NOTES.md. ids:Catalog is a subclass of dcat:Catalog. No action required, because dataset is a subclass of resource and we follow our decision from the first bullet point.

@JohannesLipp
Copy link
Member

JohannesLipp commented Jun 9, 2020

Coming to the major changes from DCAT-1.1 to DCAT-2, which are of interest for us:

  • Space: Using dct:spatial to point to a dct:Location, which has three new properties locn:geometry, dcat:bbox, dcat:centroid that can also be used concurrently. Our ids:spatialCoverage already extends dct:spatial, which does not yet extend dct:Location
    Suggested action: Make it extend that
  • Time: A new class dct:PeriodOftime supports the temporal coverage of a resource via dcat:startDate, dcat:endDate, time:hasBeginning, and time:hasEnd
    Suggested action: Use it.
  • Time (in general): DCAT-2 suggest five temporal properties:
    • dct:issued
      Status: Extended in Message/ids:issued and DigitalContent/ids:created
      Action: Add xsd:date to the domain of ids:created
    • dct:modified
      Status: Extended in ids:created in DigitalContent
      Action: Add xsd:date to the domain
    • dct:accrualPeriodicity
      Status: Currently not used, but there is ids:accrualPeriodicity in DigitalContent
      Action: Make it extend dct:accrualPeriodicity and cascade the changes, which particularly include the usage of the dct:Frequency domain - in order to align it with DCAT-2.
    • dcat:temporalResolution
      Status: Currently not used. DCAT-2 uses this to define a minimum time period resolvable in a dataset.
      Action: Add to either DigitalContent (cf. example 19 and usage notes)
    • dct:temporal
      Status: ids:temporalCoverage in DigitalContent extends it.
      Action: Make ids:Interval extend dct:PeriodOfTime cf. DCAT-2 examples

@JohannesLipp
Copy link
Member

The work for DCAT-2 is done. Compatibility with DCAT-AP is done in issue #277

@clange
Copy link
Member Author

clange commented Jul 22, 2020

@JohannesLipp currently reviewing your investigations. Re ids:VocabularyData I would suggest (could you please do it if it's not yet done?) that we open a separate issue for getting rid of that class? It was a workaround for adding some of the domain-specific structure/semantics features at a time when CodeGen was not yet able to handle terms from non-IDS namespaces.

@clange
Copy link
Member Author

clange commented Jul 22, 2020

After reviewing, the following questions remain to be asked to DCAT experts.

  • Is it OK to have ids:theme rdfs:subPropertyOf dcat:theme; rdfs:domain [ rdfs:subClassOf dcat:Dataset ]? I think it is, because we are talking about a more specific property, and that property is not mandatory.

@clange
Copy link
Member Author

clange commented Jul 22, 2020

Re. dcat:mediaType I think we shall take the right decision in the context of our ongoing discussion on how to replace our media type code lists by something more lightweight that can take any standard or non-standard string of the form "type/subtype". @JohannesLipp could you please link to that issue from here, or in any case make sure we have an issue for that discussion? (The discussion would be similar to #296.) My input to that discussion is that I think we should not represent media types simply as string literals but indeed continue to represent them as instances of ids:MediaType, but make sure that additional types can be used easily: the most lightweight representation would be ex:MyDataResource ids:mediaType [ rdfs:label "foo/bar" ], such that the blank node would implicitly be of type ids:MediaType and thus also of dct:MediaType. It does make sense to remain compatible with dct:MediaType, and the good thing is that its specification is so vague that it doesn't restrict us to anything other than modelling media types as resources.

@clange
Copy link
Member Author

clange commented Jul 22, 2020

@JohannesLipp in #270 (How do you easily/directly link to a pull request?) I did not see anything about the first bullet point in the comment about dct:spatial. Did you also cover that?

@clange
Copy link
Member Author

clange commented Jul 22, 2020

This comment is a placeholder for some more DCAT2 features I'd like to request to be supported by the IDS infomodel. At the very least we should go through the full list of changes from DCAT 1 to 2 once more. I think at least dcat:DataService is related to ids:Endpoint in a way that we have not yet considered here (see https://www.w3.org/TR/vocab-dcat-2/#Class:Data_Service), and there may be further terms.

@JohannesLipp
Copy link
Member

After reviewing, the following questions remain to be asked to DCAT experts.

  • Is it OK to have ids:theme rdfs:subPropertyOf dcat:theme; rdfs:domain [ rdfs:subClassOf dcat:Dataset ]? I think it is, because we are talking about a more specific property, and that property is not mandatory.

I would say yes. Currently, the domain is ids:DigitalContent, which is a subclass of dcat:Dataset. Your suggestion using a blank node would therefore replace the range ids:DigitalContent with the more generalize one "anything extending dcat:Dataset

@JohannesLipp
Copy link
Member

Re. dcat:mediaType I think we shall take the right decision in the context of our ongoing discussion on how to replace our media type code lists by something more lightweight that can take any standard or non-standard string of the form "type/subtype". @JohannesLipp could you please link to that issue from here, or in any case make sure we have an issue for that discussion? (The discussion would be similar to #296.) My input to that discussion is that I think we should not represent media types simply as string literals but indeed continue to represent them as instances of ids:MediaType, but make sure that additional types can be used easily: the most lightweight representation would be ex:MyDataResource ids:mediaType [ rdfs:label "foo/bar" ], such that the blank node would implicitly be of type ids:MediaType and thus also of dct:MediaType. It does make sense to remain compatible with dct:MediaType, and the good thing is that its specification is so vague that it doesn't restrict us to anything other than modelling media types as resources.

IMHO there is no action needed from our side. dcat:mediaType has range dct:MediaType, and ids:mediaType and ids:MediaType extend these, respectively. We discussed this in #224 and the compact result (following DCAT2 is the following:

:Foo ids:mediaType <http://www.iana.org/assignments/media-types/text/csv> ;

@JohannesLipp
Copy link
Member

@JohannesLipp in #270 (How do you easily/directly link to a pull request?) I did not see anything about the first bullet point in the comment about dct:spatial. Did you also cover that?

You just did that direct link to a pull request in that comment 😃
Thank you for the info, I have not covered that indeed. I solved it via the most recent commit, which we agreed on in today's Infomodel call.

@clange
Copy link
Member Author

clange commented Feb 4, 2021

@JohannesLipp in investigating the reuse of the IDS infomodel for the Agricultural Information Model of https://h2020-demeter.eu/, where DCAT was given as the baseline, I identified the following missing points:

  • I did actually see DigitalContent temporalResolution Frequency in IDS. It would be good to align this with DCAT's temporalResolution.
  • Interestingly, DCAT's spatialResolutionInMeters is not reflected in IDS.
  • Also I think my earlier comment on thinking about the relation between dcat:DataService and ids:Endpoint got lost.

@clange clange reopened this Feb 4, 2021
@JohannesLipp
Copy link
Member

JohannesLipp commented Feb 4, 2021

  • I did actually see DigitalContent temporalResolution Frequency in IDS. It would be good to align this with DCAT's temporalResolution.
    Yes, it is:
ids:temporalResolution
    rdfs:subPropertyOf dcat:temporalResolution ;

cf. https://github.com/International-Data-Spaces-Association/InformationModel/blob/develop/model/content/DigitalContent.ttl#L160

ids:spatialCoverage extends dct:spatial. We however do not use this particular resolution in meters yet.

  • Also I think my earlier comment on thinking about the relation between dcat:DataService and ids:Endpoint got lost.

@lcomet
Copy link
Member

lcomet commented Dec 15, 2023

Related to #593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ticket proposing an improvement, extension of exisitng or new features status:open issue has been submitted or re-opened recently, waiting for assigment to owner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants