New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terms aggregations order wrong when sorting NaN's #5236
Labels
Comments
Thanks for opening such a detailed bug report, your observation and the way you propose to fix this issue sound good to me so a pull request would be highly welcome! |
I opened the pull request, If you need me to change things just let me know. |
jpountz
pushed a commit
to jpountz/elasticsearch
that referenced
this issue
Feb 27, 2014
jpountz
pushed a commit
that referenced
this issue
Mar 4, 2014
jpountz
pushed a commit
that referenced
this issue
Mar 4, 2014
mute
pushed a commit
to mute/elasticsearch
that referenced
this issue
Jul 29, 2015
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a strong believe there is an issue in the sorting of term aggregations.
Have a look here. If we look at the comment above it indicates that it would like to push NaN's to the bottom of the list (which would be the correct behaviour according to me). But when I test this out it does not work
Loading the following test data:
You can start running some aggregations:
When we run this aggregations if turns out that a term without value's in the
c
field ends in the top.When running the same analysis with facetting you would see that the term without the values in
c
are pushed to the bottom of the listThis looks to me as correct behaviour.
I have done some research on why this is happening, and in fact the if statement referenced above is comparing the aggregated metric to
Double.NaN
. In java it turns out that NaN is not equal to NaN :), luckily the guys working at java thought of this and added a function to check for NaN valuesDouble.isNaN
. Changing the line accordingly makes the return statement next work since it is skipped always at the moment. But...On line 216 it returns 1 or -1 depending on the ordering provided. This would result in NaN floating to the top of the list when ordering descending. Which has the strange effect that NaN's would be at the top of the list. My believe is that is should always
return 1
ifv1
isNaN
.Last part of the bug is that only
v1
is being checked to beNaN
. You would also need to checkv2
for beingNaN
andreturn -1
(!) if so. This would, as the comment suggest always push 'NaN' values to the bottom of the sorted list. This resembles the most to how facets sort at the moment.Concrete effects of this bug is that we are not able to use aggregations for a table like view (which is the main benefit of using aggs, since you can get multiple columns at once) to show terms sorted descending on the avg of a sparse field we have in our collection of documents which was able when using facets.
Please have a look, and note that this error is made on two spots in the same file. (second spot is on line 230).
I tested (not via the test suite, but by hand) out a fix locally and that seems works for us. If you would like to share my patch by opening a pull request I could, although I need to check my code against your guidelines, and add some automated tests :)
The text was updated successfully, but these errors were encountered: