Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only return aggregations on the first page with scroll and forbidden with scan #7497

Closed
wants to merge 2 commits into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 28, 2014

Aggregations are collection-wide statistics over one or several indices so:

  • scan should not be allowed to use aggregations since it never collects all documents at once,
  • scroll should only return aggregations on the first page (see Facets incorrect when scrolling #1642)

@jpountz jpountz added the review label Aug 28, 2014
@jpountz
Copy link
Contributor Author

jpountz commented Aug 28, 2014

For reference, I plan to add documentation for it, but I'm first looking for feedback to know whether this is the correct approach.

@martijnvg
Copy link
Member

@jpountz This approach looks good to me. I think this issue should be marked as breaking as well, since for normal scroll we did include the aggs also in subsequent responses.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 28, 2014

There is one open question about the behavior with search_type=scan: scan works by collecting just enough documents on each round while aggs work by collecting all documents in one go. So I don't think there is some way around collecting matches twice (once for aggs and once for the scan). So potentially there are two options:

  1. prevent the usage of aggs with scan (what the current PR does)
  2. or collect matches twice: once in the first call purely for aggregations, and then in a streaming fashion for the scan.

I like the fact that the first option doesn't do any magic and makes sure that scan is cheap in all cases but on the other hand, 2. could make scan/scroll more consistent with normal scroll (aggs returned in the first page and ignored in subsequent pages).

What do you think?

@jpountz
Copy link
Contributor Author

jpountz commented Aug 28, 2014

Side note: assuming 1. is implemented, 2. could be done from client side with negligible overhead (one round trip) by running first a count request to compute aggs and then a scan request to get hits back.

@costin
Copy link
Member

costin commented Aug 28, 2014

As a directly interested client :), I'll like to understand how the two requests would be an alternative to 2 (ideally considering the preference api as well).

@s1monw
Copy link
Contributor

s1monw commented Aug 29, 2014

I personally like the fact that scan is simply a stream and I think we should keep it that way. IMO you can do the entire request in 2 steps (one for aggs and one for scan) this essentially means that you can potentially get slightly outdated aggregations but I think that is an acceptable solution. Scan requests should not support aggregations at all.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 29, 2014

@costin I think you can have two requests by having one that is a SCAN and only fetches hits while another query would be of type COUNT and would compute aggregations. I don't think the preference API raises particular issues: if you use it to go to particular shards, that actually acts as a filter on the documents that match, so I think that is fine?

@clintongormley
Copy link

I'm happy with scan not supporting aggs, and scroll-without-scan just returning aggs on the first request.

@jpountz jpountz added v2.0.0 and removed discuss labels Sep 3, 2014
Aggregations are collection-wide statistics so they would always be the same.
In order to save CPU/bandwidth, we can just return them on the first page.

Same as elastic#1642 but for aggregations.
…_type=SCAN.

Aggregations are collection-wide statistics, which is incompatible with the
collection mode of search_type=SCAN since it doesn't collect all matches on
calls to the search API.

Close elastic#7429
@jpountz jpountz added enhancement and removed review labels Sep 3, 2014
@jpountz jpountz closed this Sep 3, 2014
@jpountz jpountz changed the title Aggregations: fix behavior when used in conjunction with scroll or scan Aggregations: fixed behavior when used in conjunction with scroll or scan Sep 3, 2014
@jpountz jpountz changed the title Aggregations: fixed behavior when used in conjunction with scroll or scan Aggregations: only returned on the first page with scroll and forbidden with scan Sep 3, 2014
@clintongormley clintongormley changed the title Aggregations: only returned on the first page with scroll and forbidden with scan Aggregations: Only returned on the first page with scroll and forbidden with scan Sep 11, 2014
@clintongormley clintongormley changed the title Aggregations: Only returned on the first page with scroll and forbidden with scan Only return aggregations on the first page with scroll and forbidden with scan Jun 6, 2015
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Analytics/Aggregations Aggregations labels Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >enhancement :Search/Search Search-related issues that do not fall into other categories v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants