New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return the sum of the doc counts of other buckets in terms aggregations. #8213
Conversation
This commit adds a new field to the response of the terms aggregation called `sum_other_doc_count` which is equal to the sum of the doc counts of the buckets that did not make it to the list of top buckets. It is typically useful to have a sector called eg. `other` when using terms aggregations to build pie charts. Example query and response: ```json GET test/_search?search_type=count { "aggs": { "colors": { "terms": { "field": "color", "size": 3 } } } } ``` ```json { [...], "aggregations": { "colors": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 4, "buckets": [ { "key": "blue", "doc_count": 65 }, { "key": "red", "doc_count": 14 }, { "key": "brown", "doc_count": 3 } ] } } } ```
If we go with sum of counts rather than a bucket approach. Could we configure this field to sum not just counts but values in another field. So if we build a chart of sales $ that other segment in your example will show other sales $ total rather than count |
@roytmana We want to see how we can make this feature more flexible in 2.0 (eg. with sub aggregations) but it is challenging but for now (1.4), we only plan to return document counts. |
@jpountz thanks makes sense I should not be too greedy :-) count is helpful. Will missing bucket be implemented in upcomimg 1.x versions? |
@roytmana Missing is also something we are thinking about. The issue is a bit more complicated than "other" because there are more options: should we add a bucket for missing, allow to replace the missing value with arbitrary values (like sorting does) or both? And also, how to implement it in a unified way across all aggregations since this is something that would make sense on nearly all aggregations. |
assertSearchResponse(resp); | ||
terms = resp.getAggregations().get("terms"); | ||
assertEquals(Math.min(size, totalNumTerms), terms.getBuckets().size()); | ||
assertEquals(sumOfDocCounts, sumOfDocCounts(terms)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not also check that terms.getSumOtherDocCounts() == totalHits - sumOfDocCounts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not work in the multivalued case: imagine that your size is 1 and you have 2 documents that have colors:
doc1: red, blue
doc2: red
The number of hits would be 2, the sum of doc counts would be 2 (counts for red) and the sum of other doc counts would be 1 (blue) so it doesn't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see that now. My concern was that the test is checking the value of the others doc count in an indirect way (from a readability point of view) since we are checking sum of all the doc counts rather than directly checking the other doc count, but actually there probably isn't another way to do it.
@jpountz Code looks good, left some comments on the tests |
Ok, thank you @jpountz. I was hoping to get some sense on when these features may become available. I need to start moving my BI framework off facets but without buckets for _other and _missing and agg filtering on array of terms including numeric it would be very challenging and pure waste if missing/other is finally supported |
@colings86 I replied/addressed your comments |
@jpountz Looks good |
This commit adds a new field to the response of the terms aggregation called `sum_other_doc_count` which is equal to the sum of the doc counts of the buckets that did not make it to the list of top buckets. It is typically useful to have a sector called eg. `other` when using terms aggregations to build pie charts. Example query and response: ```json GET test/_search?search_type=count { "aggs": { "colors": { "terms": { "field": "color", "size": 3 } } } } ``` ```json { [...], "aggregations": { "colors": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 4, "buckets": [ { "key": "blue", "doc_count": 65 }, { "key": "red", "doc_count": 14 }, { "key": "brown", "doc_count": 3 } ] } } } ``` Close #8213
This commit adds a new field to the response of the terms aggregation called `sum_other_doc_count` which is equal to the sum of the doc counts of the buckets that did not make it to the list of top buckets. It is typically useful to have a sector called eg. `other` when using terms aggregations to build pie charts. Example query and response: ```json GET test/_search?search_type=count { "aggs": { "colors": { "terms": { "field": "color", "size": 3 } } } } ``` ```json { [...], "aggregations": { "colors": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 4, "buckets": [ { "key": "blue", "doc_count": 65 }, { "key": "red", "doc_count": 14 }, { "key": "brown", "doc_count": 3 } ] } } } ``` Close #8213
Will you be pushing a 1.4.2 with this fix? |
@niemyjski I am confused, what bug are you talking about? |
We were getting a null reference exception when we upgraded ( elastic/elasticsearch-net#1042) |
Hey @niemyjski this is an issue with NEST, not elasticsearch. As mentioned in elastic/elasticsearch-net#1041, we'll be releasing a new version next week. In the meantime you can pick up the CI package here. |
This commit adds a new field to the response of the terms aggregation called `sum_other_doc_count` which is equal to the sum of the doc counts of the buckets that did not make it to the list of top buckets. It is typically useful to have a sector called eg. `other` when using terms aggregations to build pie charts. Example query and response: ```json GET test/_search?search_type=count { "aggs": { "colors": { "terms": { "field": "color", "size": 3 } } } } ``` ```json { [...], "aggregations": { "colors": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 4, "buckets": [ { "key": "blue", "doc_count": 65 }, { "key": "red", "doc_count": 14 }, { "key": "brown", "doc_count": 3 } ] } } } ``` Close elastic#8213
This commit adds a new field to the response of the terms aggregation called
sum_other_doc_count
which is equal to the sum of the doc counts of the bucketsthat did not make it to the list of top buckets. It is typically useful to have
a sector called eg.
other
when using terms aggregations to build pie charts.Example query and response: