Bucket script aggregation returns invalid value for missing docs #27377

shaharmor · 2017-11-14T12:29:48Z

Elasticsearch version (bin/elasticsearch --version): 6.0.0-rc1

JVM version (java -version):
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
OS version (uname -a if on a Unix-like system):
Linux elasticsearch-data-hot-003 4.11.0-1013-azure #13-Ubuntu SMP Mon Oct 2 17:59:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
When doing a bucket script aggregation that depends on a cumulative sum aggregation of another sum aggregation, if the sum aggregation returns null values (Because there are no docs in that time interval bucket), the bucket script aggregation will also return null, instead of relying on the cumulative sum value that was gathered so far.

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

Add docs that span over 5 minutes that look like this:

{
  @timestamp: '',
  bytes: 100
}

Run a query that spans after the 5m end (meaning that there will be date histogram buckets without docs), with this aggregation:

{
"aggs": {
    "timeseries": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1m",
        "min_doc_count": 0,
        "time_zone": "UTC"
      },
      "aggs": {
        "sum_bytes": {
          "sum": {
            "field": "bytes"
          }
        },
        "cumulative_bytes": {
          "cumulative_sum": {
            "buckets_path": "sum_bytes"
          }
        },
        "bucket": {
          "bucket_script": {
            "buckets_path": {
              "bytes": "cumulative_bytes"
            },
            "script": {
              "source": "params.bytes",
              "lang": "painless"
            }
          }
        }
      }
    }
  }
}

Check the response and see that in the date histogram without buckets, the bucket aggregation does not show the value that its supposed to (The cumulative_bytes value)

The text was updated successfully, but these errors were encountered:

martijnvg · 2017-11-14T16:41:51Z

@shaharmor Did you try setting a missing value on the date_histogram aggregation? Github issues are meant for bugs and feature requests and this sounds more like a question that should be asked on the forum first.

shaharmor · 2017-11-14T16:45:54Z

I'm not sure how a missing value on the date_histogram would help, as there are no docs to add the missing field to...

Anyway, I created a forum thread as well: https://discuss.elastic.co/t/bucket-script-fails-when-some-docs-are-missing/107592/1

colings86 · 2017-11-20T10:08:42Z

This is actually a bug.

It seems that the bucket_script aggregation is not executed on buckets that have a doc count of zero. The following recreation script highlights this:

PUT test
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "doc": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "bytes": {
          "type": "long"
        }
      }
    }
  }
}

PUT test/doc/1
{
  "@timestamp": "2017-01-01T00:00:00",
  "bytes": 100
}

PUT test/doc/2
{
  "@timestamp": "2017-01-01T00:05:00",
  "bytes": 100
}

GET test/_search
{
  "size": 0,
  "aggs": {
    "timeseries": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1m",
        "min_doc_count": 0,
        "time_zone": "UTC"
      },
      "aggs": {
        "sum_bytes": {
          "sum": {
            "field": "bytes"
          }
        },
        "cumulative_bytes": {
          "cumulative_sum": {
            "buckets_path": "sum_bytes"
          }
        },
        "bucket": {
          "bucket_script": {
            "buckets_path": {
              "bytes": "cumulative_bytes"
            },
            "script": {
              "source": "params.bytes",
              "lang": "painless"
            }
          }
        }
      }
    }
  }
}

The response from the search request is:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "timeseries": {
      "buckets": [
        {
          "key_as_string": "2017-01-01T00:00:00.000Z",
          "key": 1483228800000,
          "doc_count": 1,
          "sum_bytes": {
            "value": 100
          },
          "cumulative_bytes": {
            "value": 100
          },
          "bucket": {
            "value": 100
          }
        },
        {
          "key_as_string": "2017-01-01T00:01:00.000Z",
          "key": 1483228860000,
          "doc_count": 0,
          "sum_bytes": {
            "value": 0
          },
          "cumulative_bytes": {
            "value": 100
          }
        },
        {
          "key_as_string": "2017-01-01T00:02:00.000Z",
          "key": 1483228920000,
          "doc_count": 0,
          "sum_bytes": {
            "value": 0
          },
          "cumulative_bytes": {
            "value": 100
          }
        },
        {
          "key_as_string": "2017-01-01T00:03:00.000Z",
          "key": 1483228980000,
          "doc_count": 0,
          "sum_bytes": {
            "value": 0
          },
          "cumulative_bytes": {
            "value": 100
          }
        },
        {
          "key_as_string": "2017-01-01T00:04:00.000Z",
          "key": 1483229040000,
          "doc_count": 0,
          "sum_bytes": {
            "value": 0
          },
          "cumulative_bytes": {
            "value": 100
          }
        },
        {
          "key_as_string": "2017-01-01T00:05:00.000Z",
          "key": 1483229100000,
          "doc_count": 1,
          "sum_bytes": {
            "value": 100
          },
          "cumulative_bytes": {
            "value": 200
          },
          "bucket": {
            "value": 200
          }
        }
      ]
    }
  }
}

In the response above the buckets with keys 2017-01-01T00:01:00.000Z to 2017-01-01T00:04:00.000Z should have a sub-aggregation bucket whose value should be 100 (the same as the cumulative_sum aggregation since the script just outputs that value).

liketic · 2017-11-22T08:41:54Z

Hi @colings86 Could you please provide some instructions to fix this? Thanks in advance!

colings86 · 2017-11-23T08:59:50Z

I just look a look at the code around this and I think its going to need some thought around how we can fix this bug without adversely affecting other pipeline aggregations. The problem arises when we resolve the value for the buckets_path for the empty bucket in BucketHelpers.resolveBucketValue here (https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/search/aggregations/pipeline/BucketHelpers.java#L174):

if (Double.isInfinite(value) || Double.isNaN(value) || (bucket.getDocCount() == 0 && !isDocCountProperty)) {
                    switch (gapPolicy) {
                    case INSERT_ZEROS:
                        return 0.0;
                    case SKIP:
                    default:
                        return Double.NaN;
                    }
                } else {
                    return value;
                }

That if statement return Double.NaN (because the gap policy is SKIP) for any empty buckets, where an empty bucket is defined as a bucket with a doc count of zero.

In BucketScriptPipelineAggregator.reduce() we don't execute on buckets which return Double.NaN so the bucket is skipped.

BucketHelpers.resolveBucketValue() is used by all of the pipeline aggregations so we need to be careful that any changes do not adversely affect the other pipeline aggregations.

We could potentially change the bucket_script aggregation so it executes on all buckets regardless of the retrieved value but we also pass the doc count of the current bucket to the script so it can determine what value to output but I'm worried that this might make this aggregation too unwieldly for the user.

I'll mark this issue as discuss for now as I think we need to decide on an approach before it is implemented.

colings86 · 2017-11-24T14:13:48Z

Discussed in FIxItFriday and we decided that this is requires more in depth discussion within the Search and Aggregations team

colings86 · 2017-11-27T16:32:31Z

Discussed with the search and aggs team and we decided that we should pass the actual value to the script (instead of converting it to NaN or 0.0 using the gap policy and also pass the doc count of the bucket to the script so the user writing the script can know if the bucket is empty and decide how to interpret the value

colings86 · 2017-11-28T09:02:43Z

We should ensure that the solution to this issue also fixes the issue in #27544

shaharmor · 2017-12-18T15:05:28Z

@colings86 Is this something that is being actively worked on? Any ETA?

colings86 · 2017-12-19T11:06:39Z

@shaharmor its not being actively worked on right now but since we now have a way forward on this it is now available to be picked up and worked on. There is no ETA for this currently.

colings86 · 2018-03-13T11:27:20Z

@elastic/es-search-aggs

smanolloff · 2019-01-15T10:32:14Z

For anyone stumbling upon the same issue with bucket_selector aggregations, here is my workaround:

I will use the example from the docs:

{
  "size": 0,
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "interval": "month"
      },
      "aggs": {
        "total_sales": {
          "sum": {
            "field": "price"
          }
        },
        "sales_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "totalSales": "total_sales"
            },
            "script": "params.totalSales < 200"
          }
        }
      }
    }
  }
}

The issue: the script will select only non-empty buckets. If you try something like params.totalSales == null || params.totalSales < 200, it still won't select the empty ones.
Workaround is to make use of the special _count path:

        "sales_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "totalSales": "total_sales",
              "bucketCount": "_count"
            },
            "script": "params.bucketCount == 0 || params.totalSales < 200"
          }
        }

At least in 6.3, this works for bucket_path, but not for bucket_script, unfortunately. Can't say for other pipeline aggs.

shaharmor · 2019-03-07T06:48:56Z

Is this being fixed in 7?

polyfractal · 2019-03-07T15:19:12Z

Hi @shaharmor, I'm afraid there is still no progress on this issue. It's still on our radar, but we don't currently have anyone working on it at this time. We'll update this ticket when there's movement.

Adds a new keep_values gap policy that works like skip, except if the metric calculated on an empty bucket provides a non-null non-NaN value, this value is used for the bucket. Fixes elastic#27377

Adds a new keep_values gap policy that works like skip, except if the metric calculated on an empty bucket provides a non-null non-NaN value, this value is used for the bucket. Fixes #27377 Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>

martijnvg closed this as completed Nov 14, 2017

colings86 reopened this Nov 20, 2017

colings86 added :Analytics/Aggregations Aggregations help wanted adoptme >bug labels Nov 20, 2017

colings86 mentioned this issue Nov 20, 2017

bucket_script aggregation is not executed on empty buckets #27453

Closed

colings86 added discuss and removed help wanted adoptme labels Nov 23, 2017

colings86 removed the discuss label Nov 27, 2017

colings86 mentioned this issue Nov 28, 2017

Cumilative Sum doesn't work with other pipeline aggregations #27544

Closed

colings86 mentioned this issue Nov 30, 2017

bucket_script & bucket_selector fail to get reference from bucket_path with meaningful cumulative_sum,serial_diff and moving_avg aggregtaion #27602

Closed

colings86 added the help wanted adoptme label Dec 19, 2017

fred84 mentioned this issue Jan 4, 2018

Gap policy for scripted subaggregates #28077

Closed

colings86 mentioned this issue Apr 5, 2018

bucket_selector not working with bucket_script or derivative as variable #29391

Closed

$@polyfractal$ polyfractal added the high hanging fruit label Jul 16, 2018

This was referenced Nov 13, 2018

SQL: Handle NaN returned for aggs in case of nulls #35164

Closed

Introduce contract to know whether InternalAggs have computed results or not #34903

Closed

$@polyfractal$ polyfractal mentioned this issue Jan 25, 2019

Allow Bucket Script aggregation to reference on null #37862

Closed

$@polyfractal$ polyfractal mentioned this issue May 20, 2019

bucket_script and gap_policy #42196

Closed

$@polyfractal$ polyfractal self-assigned this Jul 17, 2019

$@polyfractal$ polyfractal mentioned this issue Jul 17, 2019

Add "none" gap_policy to pipeline aggs #44516

Open

rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020

$@polyfractal$ polyfractal removed their assignment Mar 18, 2021

imotov self-assigned this May 12, 2021

imotov mentioned this issue May 20, 2021

Add keep_values gap policy #73297

Merged

imotov closed this as completed in #73297 Jun 8, 2021

imotov mentioned this issue Jun 8, 2021

[7.x] Add keep_values gap policy (#73297) #73927

Merged

stevejgordon mentioned this issue Jul 1, 2021

7.14.0 Meta Ticket elastic/elasticsearch-net#5776

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bucket script aggregation returns invalid value for missing docs #27377

Bucket script aggregation returns invalid value for missing docs #27377

shaharmor commented Nov 14, 2017 •

edited

martijnvg commented Nov 14, 2017

shaharmor commented Nov 14, 2017 •

edited

colings86 commented Nov 20, 2017 •

edited

liketic commented Nov 22, 2017

colings86 commented Nov 23, 2017

colings86 commented Nov 24, 2017

colings86 commented Nov 27, 2017

colings86 commented Nov 28, 2017

shaharmor commented Dec 18, 2017

colings86 commented Dec 19, 2017 •

edited

colings86 commented Mar 13, 2018

smanolloff commented Jan 15, 2019 •

edited

shaharmor commented Mar 7, 2019

polyfractal commented Mar 7, 2019

Bucket script aggregation returns invalid value for missing docs #27377

Bucket script aggregation returns invalid value for missing docs #27377

Comments

shaharmor commented Nov 14, 2017 • edited

martijnvg commented Nov 14, 2017

shaharmor commented Nov 14, 2017 • edited

colings86 commented Nov 20, 2017 • edited

liketic commented Nov 22, 2017

colings86 commented Nov 23, 2017

colings86 commented Nov 24, 2017

colings86 commented Nov 27, 2017

colings86 commented Nov 28, 2017

shaharmor commented Dec 18, 2017

colings86 commented Dec 19, 2017 • edited

colings86 commented Mar 13, 2018

smanolloff commented Jan 15, 2019 • edited

shaharmor commented Mar 7, 2019

polyfractal commented Mar 7, 2019

shaharmor commented Nov 14, 2017 •

edited

shaharmor commented Nov 14, 2017 •

edited

colings86 commented Nov 20, 2017 •

edited

colings86 commented Dec 19, 2017 •

edited

smanolloff commented Jan 15, 2019 •

edited