Terms aggregations order wrong when sorting NaN's #5236

thanodnl · 2014-02-24T13:09:53Z

I have a strong believe there is an issue in the sorting of term aggregations.

Have a look here. If we look at the comment above it indicates that it would like to push NaN's to the bottom of the list (which would be the correct behaviour according to me). But when I test this out it does not work

Loading the following test data:

$ curl -XDELETE 'localhost:9200/sorting?pretty=true'
$ curl -XPOST 'localhost:9200/_bulk?pretty=true' -d '
{ index: { _index: "sorting", _type: "object" }}
{ t: "a", a:1, b:1 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "a", a:2, b:4 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "a", a:3, b:9 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "b", a:4, b:16 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "b", a:5, b:25, c: 42 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "b", a:6, b:36, c: 50 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "c", a:7, b:49 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "c", a:8, b:64 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "c", a:9, b:81 }
{ index: { _index: "sorting", _type: "object" }}
{ t: "c", a:10, c:100 }
'

You can start running some aggregations:

$ curl 'localhost:9200/sorting/_search?pretty=true' -d '
{
  "size": 0,
  "aggs": {
    "t": {
      "terms": {
        "field": "t",
        "order": {
          "aStats.min": "desc"
        }
      },
      "aggs": {
        "aStats": {
          "stats": {
            "field": "c"
          }
        }
      }
    }
  }
}
'

When we run this aggregations if turns out that a term without value's in the c field ends in the top.

{
  "aggregations": {
    "t": {
      "buckets": [
        {
          "key": "a",
          "doc_count": 3,
          "aStats": {
            "count": 0,
            "min": null,
            "max": null,
            "avg": null,
            "sum": null
          }
        },
        {
          "key": "c",
          "doc_count": 4,
          "aStats": {
            "count": 1,
            "min": 100,
            "max": 100,
            "avg": 100,
            "sum": 100
          }
        },
        {
          "key": "b",
          "doc_count": 3,
          "aStats": {
            "count": 2,
            "min": 42,
            "max": 50,
            "avg": 46,
            "sum": 92
          }
        }
      ]
    }
  }
}

When running the same analysis with facetting you would see that the term without the values in c are pushed to the bottom of the list

$ curl 'localhost:9200/sorting/_search?pretty=true' -d '
{
  "size": 0,
  "facets": {
    "t": {
      "terms_stats": {
        "key_field": "t",
        "value_field": "c",
        "order": "min"
      }
    }
  }
}
'

This looks to me as correct behaviour.

I have done some research on why this is happening, and in fact the if statement referenced above is comparing the aggregated metric to Double.NaN. In java it turns out that NaN is not equal to NaN :), luckily the guys working at java thought of this and added a function to check for NaN values Double.isNaN. Changing the line accordingly makes the return statement next work since it is skipped always at the moment. But...

On line 216 it returns 1 or -1 depending on the ordering provided. This would result in NaN floating to the top of the list when ordering descending. Which has the strange effect that NaN's would be at the top of the list. My believe is that is should always return 1 if v1 is NaN.

Last part of the bug is that only v1 is being checked to be NaN. You would also need to check v2 for being NaN and return -1(!) if so. This would, as the comment suggest always push 'NaN' values to the bottom of the sorted list. This resembles the most to how facets sort at the moment.

Concrete effects of this bug is that we are not able to use aggregations for a table like view (which is the main benefit of using aggs, since you can get multiple columns at once) to show terms sorted descending on the avg of a sparse field we have in our collection of documents which was able when using facets.

Please have a look, and note that this error is made on two spots in the same file. (second spot is on line 230).

I tested (not via the test suite, but by hand) out a fix locally and that seems works for us. If you would like to share my patch by opening a pull request I could, although I need to check my code against your guidelines, and add some automated tests :)

The text was updated successfully, but these errors were encountered:

jpountz · 2014-02-24T13:34:42Z

Thanks for opening such a detailed bug report, your observation and the way you propose to fix this issue sound good to me so a pull request would be highly welcome!

Closes elastic#5236.

thanodnl · 2014-02-24T13:49:00Z

I opened the pull request, If you need me to change things just let me know.

Closes elastic#5236.

Closes #5236.

Closes elastic#5236.

dadoonet assigned uboness Feb 24, 2014

jpountz assigned jpountz and unassigned uboness Feb 24, 2014

thanodnl added a commit to thanodnl/elasticsearch that referenced this issue Feb 24, 2014

Changing the sorting for terms aggs,

051ed64

Closes elastic#5236.

thanodnl mentioned this issue Feb 24, 2014

Changing the sorting for terms aggs, #5237

Closed

jpountz pushed a commit to jpountz/elasticsearch that referenced this issue Feb 27, 2014

Changing the sorting for terms aggs,

8055118

Closes elastic#5236.

jpountz pushed a commit that referenced this issue Mar 4, 2014

Fix sorting on NaN values in aggregations.

6b143a9

Closes #5236.

jpountz closed this as completed in 0e57915 Mar 4, 2014

jpountz added v1.1.0 labels Mar 4, 2014

jpountz pushed a commit that referenced this issue Mar 4, 2014

Fix sorting on NaN values in aggregations.

a9f195c

Closes #5236.

clintongormley added the bug label Mar 21, 2014

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Fix sorting on NaN values in aggregations.

c98213e

Closes elastic#5236.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terms aggregations order wrong when sorting NaN's #5236

Terms aggregations order wrong when sorting NaN's #5236

thanodnl commented Feb 24, 2014

jpountz commented Feb 24, 2014

thanodnl commented Feb 24, 2014

Terms aggregations order wrong when sorting NaN's #5236

Terms aggregations order wrong when sorting NaN's #5236

Comments

thanodnl commented Feb 24, 2014

jpountz commented Feb 24, 2014

thanodnl commented Feb 24, 2014