Avoid having too many datasets in a dataset series #276

matthiaspalmer · 2023-09-05T10:45:04Z

It is highly likely that dataportals will make the assumption that the amount of datasets in a dataset series can be listed directly (that there are quite few of them). That means that there won't be a searchable UI to find datasets inside a dataset series. This assumption is supported by scenarios like one datasets is added to a dataset series per year. It should be stated if this assumption is wrong at least via an example.

We believe this assumption should be supported by the specification by providing guidance on how to solve situations where there is a drift towards providing many more datasets in a series. For example:

If you find that you need to create hundreds of datasets in a dataset series it would be a better choice to consider other strategies, e.g.:

Providing downloadable files as a single distribution that point to a Web accessible folder or a zip file.
Provide access via a data service.
Introduce a hierarchy of data series

Note that dataportals will likely hide datasets that are part of dataset series and only give access via their dataset series.

jakubklimek · 2023-09-05T11:06:55Z

For example, the Czech catalog presumes there can be a large number of datasets in the series and has a search functionality through the members (basically, it filters all datasets for the ones in the series and re-uses the normal faceted search functionality).

So, I would say it should be expected that there can be as many datasets in a series as there are datasets in a catalog.

Specific examples is a daily time series going for multiple years. Downloading all files is trivial, as you can search for them in the catalog using SPARQL and then download them.

matthiaspalmer · 2023-09-05T14:21:44Z

Ok, thanks for the example, I did not expect that!

Still I think it is important to clarify that this may happen. @jakubklimek have you checked if this is handled gracefully by data.europa.eu?

jakubklimek · 2023-09-05T14:28:07Z

Actually, I don't see dataset series handled by data.europa.eu at all, for example this dataset series looks like an empty dataset 🤷🏻‍♂️

bertvannuffelen · 2023-10-23T15:37:39Z

So far there are no restrictions on the size or organisation of DatasetSeries.
Also the community has not requested a specific restriction that must be followed.

So unless there are specific requests I would not restrict sizes.

bertvannuffelen · 2024-02-01T14:44:58Z

During webinar 21 Nov 2023 the working group agreed to not impose any restrictions of this kind.

bertvannuffelen added status:resolution-proposed release:3.0.0 https://semiceu.github.io/DCAT-AP/releases/3.0.0 labels Oct 23, 2023

bertvannuffelen added status:fixed This issue has been fixed in a draft. and removed status:resolution-proposed labels Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid having too many datasets in a dataset series #276

Avoid having too many datasets in a dataset series #276

matthiaspalmer commented Sep 5, 2023 •

edited

jakubklimek commented Sep 5, 2023 •

edited

matthiaspalmer commented Sep 5, 2023

jakubklimek commented Sep 5, 2023

bertvannuffelen commented Oct 23, 2023

bertvannuffelen commented Feb 1, 2024 •

edited

Avoid having too many datasets in a dataset series #276

Avoid having too many datasets in a dataset series #276

Comments

matthiaspalmer commented Sep 5, 2023 • edited

jakubklimek commented Sep 5, 2023 • edited

matthiaspalmer commented Sep 5, 2023

jakubklimek commented Sep 5, 2023

bertvannuffelen commented Oct 23, 2023

bertvannuffelen commented Feb 1, 2024 • edited

matthiaspalmer commented Sep 5, 2023 •

edited

jakubklimek commented Sep 5, 2023 •

edited

bertvannuffelen commented Feb 1, 2024 •

edited