Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid having too many datasets in a dataset series #276

Open
matthiaspalmer opened this issue Sep 5, 2023 · 5 comments
Open

Avoid having too many datasets in a dataset series #276

matthiaspalmer opened this issue Sep 5, 2023 · 5 comments
Labels
release:3.0.0 https://semiceu.github.io/DCAT-AP/releases/3.0.0 status:fixed This issue has been fixed in a draft.

Comments

@matthiaspalmer
Copy link

matthiaspalmer commented Sep 5, 2023

It is highly likely that dataportals will make the assumption that the amount of datasets in a dataset series can be listed directly (that there are quite few of them). That means that there won't be a searchable UI to find datasets inside a dataset series. This assumption is supported by scenarios like one datasets is added to a dataset series per year. It should be stated if this assumption is wrong at least via an example.

We believe this assumption should be supported by the specification by providing guidance on how to solve situations where there is a drift towards providing many more datasets in a series. For example:

If you find that you need to create hundreds of datasets in a dataset series it would be a better choice to consider other strategies, e.g.:

  1. Providing downloadable files as a single distribution that point to a Web accessible folder or a zip file.
  2. Provide access via a data service.
  3. Introduce a hierarchy of data series

Note that dataportals will likely hide datasets that are part of dataset series and only give access via their dataset series.

@jakubklimek
Copy link
Contributor

jakubklimek commented Sep 5, 2023

For example, the Czech catalog presumes there can be a large number of datasets in the series and has a search functionality through the members (basically, it filters all datasets for the ones in the series and re-uses the normal faceted search functionality).

So, I would say it should be expected that there can be as many datasets in a series as there are datasets in a catalog.

Specific examples is a daily time series going for multiple years. Downloading all files is trivial, as you can search for them in the catalog using SPARQL and then download them.

@matthiaspalmer
Copy link
Author

Ok, thanks for the example, I did not expect that!

Still I think it is important to clarify that this may happen. @jakubklimek have you checked if this is handled gracefully by data.europa.eu?

@jakubklimek
Copy link
Contributor

Actually, I don't see dataset series handled by data.europa.eu at all, for example this dataset series looks like an empty dataset 🤷🏻‍♂️

@bertvannuffelen
Copy link
Contributor

So far there are no restrictions on the size or organisation of DatasetSeries.
Also the community has not requested a specific restriction that must be followed.

So unless there are specific requests I would not restrict sizes.

@bertvannuffelen bertvannuffelen added status:resolution-proposed release:3.0.0 https://semiceu.github.io/DCAT-AP/releases/3.0.0 labels Oct 23, 2023
@bertvannuffelen
Copy link
Contributor

bertvannuffelen commented Feb 1, 2024

During webinar 21 Nov 2023 the working group agreed to not impose any restrictions of this kind.

@bertvannuffelen bertvannuffelen added status:fixed This issue has been fixed in a draft. and removed status:resolution-proposed labels Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:3.0.0 https://semiceu.github.io/DCAT-AP/releases/3.0.0 status:fixed This issue has been fixed in a draft.
Projects
None yet
Development

No branches or pull requests

3 participants