New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only return aggregations on the first page with scroll and forbidden with scan #7497
Conversation
For reference, I plan to add documentation for it, but I'm first looking for feedback to know whether this is the correct approach. |
@jpountz This approach looks good to me. I think this issue should be marked as breaking as well, since for normal scroll we did include the aggs also in subsequent responses. |
There is one open question about the behavior with search_type=scan: scan works by collecting just enough documents on each round while aggs work by collecting all documents in one go. So I don't think there is some way around collecting matches twice (once for aggs and once for the scan). So potentially there are two options:
I like the fact that the first option doesn't do any magic and makes sure that scan is cheap in all cases but on the other hand, 2. could make scan/scroll more consistent with normal scroll (aggs returned in the first page and ignored in subsequent pages). What do you think? |
Side note: assuming 1. is implemented, 2. could be done from client side with negligible overhead (one round trip) by running first a count request to compute aggs and then a scan request to get hits back. |
As a directly interested client :), I'll like to understand how the two requests would be an alternative to 2 (ideally considering the |
I personally like the fact that |
@costin I think you can have two requests by having one that is a SCAN and only fetches hits while another query would be of type COUNT and would compute aggregations. I don't think the preference API raises particular issues: if you use it to go to particular shards, that actually acts as a filter on the documents that match, so I think that is fine? |
I'm happy with scan not supporting aggs, and scroll-without-scan just returning aggs on the first request. |
Aggregations are collection-wide statistics so they would always be the same. In order to save CPU/bandwidth, we can just return them on the first page. Same as elastic#1642 but for aggregations.
…_type=SCAN. Aggregations are collection-wide statistics, which is incompatible with the collection mode of search_type=SCAN since it doesn't collect all matches on calls to the search API. Close elastic#7429
b2852b9
to
fc0748f
Compare
Aggregations are collection-wide statistics over one or several indices so:
scan
should not be allowed to use aggregations since it never collects all documents at once,scroll
should only return aggregations on the first page (see Facets incorrect when scrolling #1642)