Performance issues when uploading files to a dataset containing thousands of files #9557

ErykKul · 2023-04-25T09:41:04Z

What steps does it take to reproduce the issue?
Upload a few thousand files (e.g., 2.5 MB each) and try adding a file to it with the API (e.g., Readme.md). Also, try adding multiple files one by one with the API to the same dataset. Each upload takes minutes i.s.o. seconds (on a single machine deployment, e.g., the docker deployment with Solr, DB and Dataverse on the same machine).

When does this issue occur?
It happens for all update operations for datasets containing thousands of files.
Which page(s) does it occur on?
I have tested it with API, UI experience may be different, but it also may be the same because of the reused update command (I did not test that in the UI).
What happens?
The operation waits for the dataset to be indexed before returning an HTTP response. Indexing takes longer each time a file is added (it is a cumulative problem). When there are already many files in the dataset, the problem gets very bad.
To whom does it occur (all users, curators, superusers)?
All users.
What did you expect to happen?
I would expect a linear behavior, uploading a file to a dataset (or any other update operation) should take the same time, independently of the number of files already present in the dataset.

Which version of Dataverse are you using?
5.13

ErykKul mentioned this issue Apr 25, 2023

async indexing after update command #9558

Merged

kcondon closed this as completed in #9558 Jun 15, 2023

pdurbin added this to the 5.14 milestone Jun 15, 2023

ErykKul mentioned this issue Jun 28, 2023

Large amount of queries when getting a dataset with API #9683

Closed

pdurbin added the Feature: Performance & Stability label Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues when uploading files to a dataset containing thousands of files #9557

Performance issues when uploading files to a dataset containing thousands of files #9557

ErykKul commented Apr 25, 2023

Performance issues when uploading files to a dataset containing thousands of files #9557

Performance issues when uploading files to a dataset containing thousands of files #9557

Comments

ErykKul commented Apr 25, 2023