Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination beyond a few results returns 502 gateway error #237

Closed
richardhallett opened this issue Apr 10, 2019 · 7 comments
Closed

Pagination beyond a few results returns 502 gateway error #237

richardhallett opened this issue Apr 10, 2019 · 7 comments
Assignees
Labels

Comments

@richardhallett
Copy link
Contributor

richardhallett commented Apr 10, 2019

Summary:
When attempting to page through result sets using the "next" either with page numbers or cursors, generates an incomplete response and a 502 error in the logs.

Reproduce:
Example URL:
https://api.datacite.org/dois?page%5Bcursor%5D=1212942&page%5Bsize%5D=500

Or attempt to page through several pages following next links.

Possible reason:
The request is taking too long to response and the load balancer is just hte one that causes a 502, not an actual error in Lupo but due to performance of certain queries.

Front logo Front conversations

@richardhallett
Copy link
Contributor Author

So far this is proving difficult to reproduce outside a production environment and appears only in some situations (beyond the first example)

One possibility:
The docs here for ES: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html#CO58-1
Suggest that using a cursor based on _id might cause high memory usage for the sort.
Solution to this is to use another unique incremental column, either a copy of _id or perhaps a timestamp.

@richardhallett
Copy link
Contributor Author

Another possibility might be general perfomance of the dois endpoint.
One possibly speedup is to remove the ability check for including landing page results on the DOI Serializer.

@richardhallett
Copy link
Contributor Author

Further performance investigation and logging on the live server appears to suggest there is a significant time request taken on the elasticsearch request:

[Benchmark] Elasticsearch request 7512 ms

Although this logs suggests it does complete, if this for example includes a large result set, this might be taking the serializer for rendering too long to complete, this may be in combination with the individual ability checks for the landing page.

Next steps investigate the ES query that is taking tame and if possible the result set retrieved.

@richardhallett
Copy link
Contributor Author

The error that is returned actually looks like it's coming from Passenger web server, some research around various posts this suggests that it can be caused running of memory, this would match above theory that too much data is attempting to be serialised in memory.

@richardhallett
Copy link
Contributor Author

This has been improved with the fix done in #307 but it appears there is still a problem going through all results.

@richardhallett
Copy link
Contributor Author

This continues to be an issue in relation to deep paging through large record sets, as users have reported see: datacite/datacite#851
This appears to be when working beyond around 6 million records, it might be worth however specifically investigating date periods to ensure it is just deep paging and not older recordsets.

Continued investigation is required.

@mfenner
Copy link
Contributor

mfenner commented Dec 13, 2019

Made a number of improvements, including more memory for the REST API docker container (up from 4096 to 8192), a fix in cursor pagination (datacite/datacite#897), and a fix of the affiliation facet (datacite/datacite#898).

The cursor is not using the _id field, but uid.

@mfenner mfenner closed this as completed Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants