Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues when uploading files to a dataset containing thousands of files #9557

Closed
ErykKul opened this issue Apr 25, 2023 · 0 comments · Fixed by #9558
Closed

Performance issues when uploading files to a dataset containing thousands of files #9557

ErykKul opened this issue Apr 25, 2023 · 0 comments · Fixed by #9558

Comments

@ErykKul
Copy link
Collaborator

ErykKul commented Apr 25, 2023

What steps does it take to reproduce the issue?
Upload a few thousand files (e.g., 2.5 MB each) and try adding a file to it with the API (e.g., Readme.md). Also, try adding multiple files one by one with the API to the same dataset. Each upload takes minutes i.s.o. seconds (on a single machine deployment, e.g., the docker deployment with Solr, DB and Dataverse on the same machine).

  • When does this issue occur?
    It happens for all update operations for datasets containing thousands of files.

  • Which page(s) does it occur on?
    I have tested it with API, UI experience may be different, but it also may be the same because of the reused update command (I did not test that in the UI).

  • What happens?
    The operation waits for the dataset to be indexed before returning an HTTP response. Indexing takes longer each time a file is added (it is a cumulative problem). When there are already many files in the dataset, the problem gets very bad.

  • To whom does it occur (all users, curators, superusers)?
    All users.

  • What did you expect to happen?
    I would expect a linear behavior, uploading a file to a dataset (or any other update operation) should take the same time, independently of the number of files already present in the dataset.

Which version of Dataverse are you using?
5.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants