Incorrect bar ordering for unique count with terms sub aggregation #3314

antoinebaudoux · 2015-03-11T00:48:48Z

stormpython · 2015-03-11T01:23:07Z

Adding notes to the above issue which was brought up at Elastic{ON}. Essentially, the issue is with the ordering of values in the bar chart for sub aggregations on unique count. The order should be descending by value, but due to the split, the bars are unordered by unique count.

I need to dive into the issue to debug.

stormpython · 2015-03-11T02:04:08Z

So this seems to be a bug in the vislib. Just reproduced. The response from elasticsearch seems to return the results in the correct order, however, the chart displays the data out of order.

zaakiy · 2015-03-11T02:19:05Z

+1. I have reproduced this also.

/* sent while mobile */

From: Antoine Baudouxmailto:notifications@github.com
Sent: ý11/ý03/ý2015 11:49 AM
To: elastic/kibanamailto:kibana@noreply.github.com
Subject: [kibana] Incorrect ordering of terms sub agg (#3314)

[screen shot 2015-03-10 at 17 45 15]https://cloud.githubusercontent.com/assets/5154448/6588348/a74418d8-c74d-11e4-8ca2-5e7283a67845.png
[screen shot 2015-03-10 at 17 45 58]https://cloud.githubusercontent.com/assets/5154448/6588347/a730c67a-c74d-11e4-9d9e-933dd8a4e6eb.png

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/3314.

antoinebaudoux · 2015-03-11T12:52:28Z

If you look at both screenshot you can see that the ordering seems to be good with the split, since it is identical to the ordering without the split. Its more the bars heights that are messed up.

stormpython · 2015-03-11T17:58:30Z

@ab-taktik yes, that is what I was referring to when I titled it ordering. By default, the bars should be ordered on the x axis in descending fashion.

blop · 2015-03-16T14:46:45Z

+1

ajrasch · 2015-03-24T13:02:51Z

+1

antoinebaudoux · 2015-03-24T13:21:42Z

Hello, any news on this? Do you have an idea what is the root cause?

antoinebaudoux · 2015-03-24T16:07:23Z

Maybe this has to do with the approximate nature of count/cardinality aggregations, and also the fact that we take only the top X terms and not all terms

stormpython · 2015-03-25T20:42:09Z

@ab-taktik I think you may be right. By default Elasticsearch sends the documents in descending order by doc_count of buckets returned. Therefore, we have been rendering bar charts with this assumption. However this is not always the case.

Take for example this dataset and this chart:

As you can see, the second set of stacked bars in this example should go first. The reason it is not returned first is because the total doc_count is higher in the first bar, but when you subtract the sum_other_doc_count from the doc_count to get the value that is actually displayed, then its clear why the first set of stacked bars is smaller than the second set of stacked bars.

Best solution: Re-order the buckets returned from elasticsearch based on doc_count - sum_other_doc_count. I will add the appropriate time table for a fix.

spalger · 2015-04-02T18:44:26Z

@stormpython @ab-taktik this is just the way that aggregations work. Here is a hypothetical step-by-step of what's happening in elasticsearch:

the x-axis agg defines that the following happen
1. takes the entire result set and splits it into buckets based on scheduleFull.raw
2. the "the unique count of user.ids" is calculated for each bucket
3. the buckets are sorted in descending order based on the "unique count of user.ids"
4. the first 50 buckets are considered the source for the next phase
a copy of the the split-bars agg begins to execute inside of each bucket from step 1(i). individually
1. the bucket is split up into sub-buckets based on language.raw
2. each sub-bucket calculates it's "unique count of user.ids"
3. the sub-buckets are sorted descending based on the "unique count of user.ids"
4. the first 10 buckets are selected and returned in the elasticsearch response.

This process is precisely what we are visualizing in the second screenshot, and why we can't just subtract the sum_other_doc_count.

In the outlined steps, "unique count of user.ids" can be replaced with any metric, even "99.99th percentile", and therefore the sum_other_doc_count would not have any relevance.

@ab-taktik I think what you really want is for step 1(ii). to happen in a third phase, and for it to go more like "the sum of the 'unique count of user.ids' from the selected child buckets is calculated for each bucket" and then for 1(iii). and 1(vi). to use this new metric in order to sort and select the top 50 buckets. This functionality is something that the elasticsearch 2.0 feature bucket reducers is aiming to solve. Until it is available, I don't think this is a feature Kibana 4 will support.

spalger · 2015-04-02T18:58:15Z

Another way to think of this problem is that the buckets that create the bars are sorted based on the ordering parameters in the x-axis aggregation:

and the value used to do that sorting include documents that are excluded by the sub aggregation (grey area added to illustrate the excluded documents)

bradvido · 2015-05-01T21:39:35Z

FWIW I've reproduced this issue without using unique count metrics in #3734

driskell · 2015-06-11T07:55:46Z

Reading what @spalger says, it seems to me that the ordering is actually correct. But that the problem is the Terms Sub Aggregation for Split Bars is incorrectly excluding data, creating what is unarguably a misleading representation of the data.
*Sorry about the "what @spalger is saying" - it was rude and badly phrased - I've rephrased! 👍 *

I just did a graph like this with Top 5 browser across operating systems, and all of a sudden it looked like iOS was the top operating system, but it wasn't... Windows was, it just had so many variations of browser it only showed the top 5.

There should be a part of the bar, which @spalger showed in grey, to show "Other" - this would fix both the ordering (which in my opinion is correct actually) and would fix the misleading representation of data. In my case Windows would jump up with a huge "Other" area, and the iOS would still be there at the end but much much tinier.

Summary: Ordering is fine, but what's happening is "Split Bar" + "Terms" is actually doing a "Filtered Split Bar" and filtering data, taking away all meaning from the original X-Axis aggregation. I can't see why somebody would only want to compare bars containing only the Top 5 entries...

spalger · 2015-06-11T17:11:27Z

@driskell I totally agree that we should be able to produce "other" buckets, but the feature must be implemented in elasticsearch first (see elastic/elasticsearch#5324 for progress). Once that is implemented this will be a far less confusing experience. For now, I recommend setting the size of the aggregation to something that makes the most sense for your data.

spalger · 2015-09-09T03:22:53Z

Looks like elastic/elasticsearch#11042, so we can move forward with #1961.

stormpython self-assigned this Mar 11, 2015

stormpython changed the title ~~Incorrect ordering of terms sub agg~~ Incorrect ordering of terms for unique count with sub aggregation Mar 11, 2015

stormpython added the bug Fixes for quality problems that affect the customer experience label Mar 11, 2015

stormpython changed the title ~~Incorrect ordering of terms for unique count with sub aggregation~~ Incorrect bar ordering for unique count with terms sub aggregation Mar 13, 2015

stormpython removed their assignment Mar 25, 2015

stormpython added the v4.1.0 label Mar 25, 2015

striglia mentioned this issue Apr 1, 2015

Unique count wrong when table includes terms query sub-aggregation #3494

Closed

stormpython assigned spalger Apr 1, 2015

spalger added release_note:enhancement and removed v4.1.0 bug Fixes for quality problems that affect the customer experience labels Apr 2, 2015

spalger removed their assignment Apr 2, 2015

bradvido mentioned this issue May 1, 2015

Bar chart sort order is not as expected with 2 terms aggregations #3734

Closed

stormpython mentioned this issue Dec 4, 2015

Bar graph "order by" incorrect with double split terms #5512

Closed

jamesharr mentioned this issue Feb 12, 2016

Vertical bar visualisation split bar sort issue #5264

Closed

mdeo-iden mentioned this issue Apr 26, 2016

Unique count wrong when visualization includes X-axis split line filters. #7056

Closed

panda01 mentioned this issue May 26, 2016

Sorting by specific term order #7282

Closed

ppisljar mentioned this issue Oct 3, 2016

sorting chart xValues by metric sum #8397

Merged

ppisljar closed this as completed in #8397 Nov 9, 2016

tbragin added bug Fixes for quality problems that affect the customer experience v5.1.1 and removed release_note:enhancement labels Dec 8, 2016

mkurtak mentioned this issue Aug 21, 2018

Incorrect bar ordering for sub-bucket term aggregation and custom metric as "Order by" #22207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect bar ordering for unique count with terms sub aggregation #3314

Incorrect bar ordering for unique count with terms sub aggregation #3314

antoinebaudoux commented Mar 11, 2015

stormpython commented Mar 11, 2015

stormpython commented Mar 11, 2015

zaakiy commented Mar 11, 2015

antoinebaudoux commented Mar 11, 2015

stormpython commented Mar 11, 2015

blop commented Mar 16, 2015

ajrasch commented Mar 24, 2015

antoinebaudoux commented Mar 24, 2015

antoinebaudoux commented Mar 24, 2015

stormpython commented Mar 25, 2015

spalger commented Apr 2, 2015

spalger commented Apr 2, 2015

bradvido commented May 1, 2015

driskell commented Jun 11, 2015

spalger commented Jun 11, 2015

spalger commented Sep 9, 2015

Incorrect bar ordering for unique count with terms sub aggregation #3314

Incorrect bar ordering for unique count with terms sub aggregation #3314

Comments

antoinebaudoux commented Mar 11, 2015

stormpython commented Mar 11, 2015

stormpython commented Mar 11, 2015

zaakiy commented Mar 11, 2015

antoinebaudoux commented Mar 11, 2015

stormpython commented Mar 11, 2015

blop commented Mar 16, 2015

ajrasch commented Mar 24, 2015

antoinebaudoux commented Mar 24, 2015

antoinebaudoux commented Mar 24, 2015

stormpython commented Mar 25, 2015

spalger commented Apr 2, 2015

spalger commented Apr 2, 2015

bradvido commented May 1, 2015

driskell commented Jun 11, 2015

spalger commented Jun 11, 2015

spalger commented Sep 9, 2015