Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to model dct:temporal for continously evolving Datasets? #201

Open
init-dcat-ap-de opened this issue Aug 9, 2021 · 7 comments
Open
Labels
future-work this topic will be dealt in the future status:waitingForDecisionW3C The issue is handled at W3C

Comments

@init-dcat-ap-de
Copy link

init-dcat-ap-de commented Aug 9, 2021

In GovDataOfficial/DCAT-AP.de#17 we are discussing a real usecase where I am surprised to find no obvous answer. Maybe I am missing something.

There is a Dataset which is updated constantly (dcterms:accrualPeriodicity) with a resolution of one hour (dcat:temporalResolution). But you can only get the data of the last 10 days. (Something that's probably pretty common for sensor data.)

How would you model this? Neither xsd:date nor dcterms:PeriodOfTime allows this. We would need a xsd:duration:

_:ds  a dcat:Dataset ;
  dcterms:accrualPeriodicity <http://publications.europa.eu/resource/authority/frequency/UPDATE_CONT> ;
  dcat:temporalResolution "PT1H"^^xsd:duration ;
  dcterms:temporal "P10D"^^xsd:duration .

But that would not be allowed. (And it would only be implicit, that you get the last 10 days.)

@jakubklimek
Copy link
Contributor

First of all, the accrualPeriodicity should use the Frequency EU vocabulary, i.e. http://publications.europa.eu/resource/authority/frequency/UPDATE_CONT, right?

As to modelling "last 10 days", you could have also continuously updated metadata, and change startDate and endDate. Otherwise, I think there is no solution to this currently using DCAT(-AP).

@init-dcat-ap-de
Copy link
Author

Hm, updating the meta data every hour, just because time moved an hour, is not our desired solution...

Doesn't anyone else sees this usecase. Maybe we need standard solution?

@jze
Copy link

jze commented Aug 10, 2021

Continuously updating metadata will not work. Even if I update the the values in my local open data portal the national portal will take a (daily) snapshot and soon after the temporal information will be incorrect. The error will be even bigger until the European data portal has taken over the data. Therefore, we need a solution to specify these continuously changing datasets.

Here is a real world example: A weather observation of the Deutscher Wetterdienst always covers the last 24 hours: https://opendata.dwd.de/weather/weather_reports/poi/10015-BEOB.csv

@bertvannuffelen
Copy link
Contributor

This is indeed not possible to express as such in DCAT(-AP). And as @jze explains there is no guarantee that the metadata you find at the harvested dataportal is the most accurate one.

Both are connected issues but also distinct. If you connect them as @jze, then a loosely coupled distributed cross-organisational system shall/cannot work. For this distribution scheme, temporal delays and information skew are part of the game. However one can compensate this by using proper PURI handling: namely a visitor of the EDP might find the German Weather reports dataset and considers them to use. In that case the visitor has to go to the source of the metadata to find all information natively. Through which the most recent info can be found e.g. that this dataset is now obsolete, and replaced with a JSON REST API.

This example is to illustrate that we should keep the objectives of the Open Data Portals clear. If you want to have machines connected to your endpoints though your catalogue then very precise and up-to-date meta data is required. However, that is not the objective for most Open Data Portals. They are a human browseable interface to (governmental) data.
So if your local catalogue is intended to be a part in a machine to machine data processing system, my advice is to ensure that all data for that purpose is there. But do not expect that you can replace your catalogue with the EDP one, just by changing the domain name of the catalogue.

One can also look to this topic from the human consumer perspective: knowing it is continuously updating data is probably a criterion I am going to use when looking for appropriate data sources. But knowing I only get a window of 10 days is probably less important at first. I would consider that a technical implementation restriction. From that perspective it is less problematic that this information is not machine processable available, but described in some textual notes. I have encountered datastreams with windows of 1 day, 1hour, 10 years. Independent of my intended usage, the question remains then how to express this window.

Expressing a window could be done via temporal coverage (https://www.w3.org/TR/vocab-dcat-2/#Property:dataset_temporal). But the window expression is hard to construct. I have no direct answer for that. Probably we could define based on https://www.w3.org/TR/owl-time/#link-interval-meets, the notion of a coverage window

window(Period) = only data in the interval [ NOW-Period , NOW ] 

But I did not find yet the notion of NOW.

@jze
Copy link

jze commented Sep 16, 2021

It is a pity that this problem was marked as wont-fix. In practice it is very relevant. Now there is no way to express these records DCAT-AP compliant.

Especially when forwarding to other portals, it is important not to have to specify fixed times. Without "floating" time data, we will often have incorrect time metadata.

@bertvannuffelen
Copy link
Contributor

@jze, I tagged it as won't fix because there will be no resolution in the near future in DCAT-AP. If you believe this should be future work, I will tag it as that.

@bertvannuffelen
Copy link
Contributor

On your sentence:

Now there is no way to express these records DCAT-AP compliant.

I think you want to say that "I have no formal way to express that only the data of the last days is available".

Note that a landingspage in which you explain to the potential reuser this situation, is always possible.

Especially when forwarding to other portals, it is important not to have to specify fixed times. Without "floating" time data, we will often have incorrect time metadata.

As mentioned in my previous answer, you could explore temporal expressions e.g. build from OWLTime. But no guarantee this allows to express this.

To a certain level, your usecase is similar as legal information. Before the existence of ODRL, there was no other way to express legal information as in a document. In your usecase there must be a formal language (suggestion OWLTime) that is able to express the situation and then it is easy to adopt it in DCAT-AP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future-work this topic will be dealt in the future status:waitingForDecisionW3C The issue is handled at W3C
Projects
None yet
Development

No branches or pull requests

4 participants