Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download certain guestbooks, possibly related to number of responses (~150,000 responses) #7767

Closed
jggautier opened this issue Apr 5, 2021 · 2 comments · Fixed by #7931
Assignees

Comments

@jggautier
Copy link
Contributor

jggautier commented Apr 5, 2021

In the Harvard Dataverse Repository, visiting certain Dataverse collections' Manage Guestbook pages and trying to download all responses results in a "504 Gateway Time-out". Harvard Dataverse Repository is running version 5.3 as of this writing.

I'm thinking the issue is related to the number of responses, and that number is ~150,000, because I was able to download guestbooks in collections with fewer than 150,000 responses (e.g. Gary King Dataverse and Patent Network Dataverse, whose guestbooks each have almost 130,000 responses).

But I got the error when I tried to download guestbooks that have over 150,000 responses, like:

  • "A multi-source dataset of urban life in the city of Milan and the Province of Trentino Dataverse" with 162,702 responses
  • "OSMnx Street Networks Dataverse" with 167,218 responses

Please feel free to close this GitHub issue if it's already a known problem. I couldn't find it documented anywhere. There's a 2017 issue "Support Large Guestbooks" (#3609) but it seems to be mostly about downloading responses after first displaying them in the browser (clicking the "View Responses" button for a particular user-made guestbook), and it doesn't seem like back then this problem was happening when clicking "Download All Responses" for Dataverse collections with responses greater than a certain number.

@jggautier
Copy link
Contributor Author

One possible wrinkle with this theory is that the admin of a Dataverse collection tells me that as recently as last month he was able to download a collection with over 400,000 responses, even though the guestbook has had over 150,000 responses since late May 2020.

@djbrooke
Copy link
Contributor

djbrooke commented May 5, 2021

  • Optimization is related to API/Download and not display in the application - backend optimizations should be the priority with this, we will need to revisit if this if there are front end changes
  • We've done previous optimizations here - have been optimizations to the DB queries previously, so should we investigate caching or pagination (ex. DataCite get relationships API) as a way to show all the information

@djbrooke djbrooke added the Medium label May 5, 2021
@sekmiller sekmiller self-assigned this May 19, 2021
sekmiller added a commit that referenced this issue Jun 2, 2021
sekmiller added a commit that referenced this issue Jun 2, 2021
sekmiller added a commit that referenced this issue Jun 3, 2021
sekmiller added a commit that referenced this issue Jun 8, 2021
sekmiller added a commit that referenced this issue Jun 8, 2021
sekmiller added a commit that referenced this issue Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants