Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8 hours to index a dataset with thousands of files #8097

Closed
pdurbin opened this issue Sep 15, 2021 · 1 comment · Fixed by #8152
Closed

8 hours to index a dataset with thousands of files #8097

pdurbin opened this issue Sep 15, 2021 · 1 comment · Fixed by #8152

Comments

@pdurbin
Copy link
Member

pdurbin commented Sep 15, 2021

At standup this morning we talked about a dataset that takes eight hours to index (as part of "index all"). It has 25 thousand files: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3CTMKP

More context:

@pdurbin pdurbin changed the title six hours to index a dataset with thousands of files 8 hours to index a dataset with thousands of files Sep 16, 2021
@PaulBoon
Copy link
Contributor

PaulBoon commented Sep 22, 2021

The same thing happend to me with a dataset having 33,870 files while upgrading from 4.20 to 5.6 with a new Solr version.
Because the logs do not mention any progress while it is being indexed you start indexing 'manually' for the remaining datasets that you can discover with curl http://localhost:8080/api/admin/index/status.
Otherwise the users won't see those other datasets on the GUI.

Maybe we should change the indexing order and end with the datasets that have many files.

@sekmiller sekmiller moved this from Up Next 🛎 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Sep 23, 2021
@sekmiller sekmiller self-assigned this Sep 23, 2021
sekmiller added a commit that referenced this issue Sep 28, 2021
sekmiller added a commit that referenced this issue Oct 7, 2021
sekmiller added a commit that referenced this issue Oct 8, 2021
sekmiller added a commit that referenced this issue Oct 13, 2021
remove debug code
sekmiller added a commit that referenced this issue Oct 14, 2021
sekmiller added a commit that referenced this issue Oct 14, 2021
@sekmiller sekmiller moved this from IQSS Team - In Progress 💻 to Review 🦁 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Oct 14, 2021
@sekmiller sekmiller removed their assignment Oct 14, 2021
sekmiller added a commit that referenced this issue Oct 28, 2021
sekmiller added a commit that referenced this issue Nov 9, 2021
landreev added a commit that referenced this issue Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants