min_doc_count=0 doesn't work with a date_histogram with a filter #4843

cmaitchison · 2014-01-22T02:00:14Z

I'm trying to create a date_histogram for recent events, where days where no events happen are still shown.

{
  "aggs": {
    "events_last_week": {
      "filter": {
        "range": {
          "@timestamp": {
            "from": "2014-01-10"
          }
        }
      },
      "aggs": {
        "events_last_week_histogram": {
          "date_histogram": {
            "min_doc_count": 0,
            "field": "@timestamp",
            "format": "yyyy-MM-dd",
            "interval": "1d"
          }
        }
      }
    }
  }
}

I get a response like this

"aggregations":  {
  "events_last_week": {
    "doc_count": 33861,
    "events_last_week_histogram": [
      {
        "key_as_string": "2014-01-10",
        "key": 1389744000000,
        "doc_count": 2120
      }, {
        "key_as_string": "2014-01-16",
        "key": 1389830400000,
        "doc_count": 3823
      }, {
        "key_as_string": "2014-01-17",
        "key": 1389916800000,
        "doc_count": 27918
      }
    ]
  }
}

The empty days are not returned. If I construct the query without the filter, the empty days are returned correctly.

There is also an issue even when the empty days are returned correctly without the filter. If, for example, today is "2014-01-22", and the latest timestamp in my data is "2014-01-17", then the 5 days between these two dates are not returned as empty buckets, though all the empty buckets prior to "2014-01-17" are returned correctly.

The text was updated successfully, but these errors were encountered:

uboness · 2014-01-22T10:27:13Z

@cmaitchison

I can't really reproduce it, I ran the same queries as you and I get the right responses. What es version are you working with? we introduced min_doc_count on 1.0.0.RC1

There is also an issue even when the empty days are returned correctly without the filter. If, for example, today is "2014-01-22", and the latest timestamp in my data is "2014-01-17", then the 5 days between these two dates are not returned as empty buckets, though all the empty buckets prior to "2014-01-17" are returned correctly.

the gaps that are filled are based on the dates in the documents you're aggregating... so the first histogram bucket will be based on the earliest date in the document set and the last bucket will be based on the latest date in the set... then we fill in all gaps between these two buckets.

we can consider adding a "range" settings to the histograms which will enable to define the value range (or date range in case of date_histogram) on which the buckets will be created. In your case, that'll mean that if you define a range of the form "range": { "to" : "now" } along with "min_doc_count" : 0 we'll return all the empty buckets until now (beyond the dates in the document set)

uboness · 2014-01-22T10:31:54Z

@cmaitchison scratch that... I finally managed to reproduce it (it happens when you have a single shard)... will work on a fix

cmaitchison · 2014-01-22T10:53:55Z

Wow, nice find! I would never have thought to have mentioned that.

On 22 Jan 2014, at 21:32, uboness notifications@github.com wrote:

@cmaitchison scratch that... I finally managed to reproduce it (it happens when you have a single shard)... will work on a fix

—
Reply to this email directly or view it on GitHub.

cmaitchison · 2014-01-23T05:17:41Z

Also related to this title, I've found that min_doc_count=0 does not work if all of the buckets would be empty after applying the filter. I can reproduce this issue on an index with 2 shards.

{
  "aggs": {
    "filtered_events": {
      "filter": {
        "and": [
          {
            "range": {
              "@timestamp": {
                "from": 1390267500000,
                "to":   1390267560000
              }
            }
          }
        ]
      },
      "aggs": {
        "filtered_events_histogram": {
          "date_histogram": {
            "min_doc_count": 0,
            "field": "@timestamp",
            "interval": "1s"
          }
        }
      }
    }
  }
}

The above query should return 60 results, 1 for each second in the minute. If any events are found in that minute then 60 results are returned. If no events are found in that minute then 0 results are returned, when you would expect 60 empty buckets.

My use case is zooming in on a series on a chart. The zero value results are very helpful to know where to plot the zeros on the x-axis.

cmaitchison · 2014-01-23T05:59:27Z

Another related issue I am finding is that sometimes the intervals do not go back far enough.

{
  "aggs": {
    "events_last_week": {
      "filter": {
        "and": [
          {
            "range": {
              "@timestamp": {
                "from": 1390267432894,
                "to": 1390267547037
              }
            }
          }
        ]
      },
      "aggs": {
        "events_last_week_histogram": {
          "date_histogram": {
            "min_doc_count": 0,
            "field": "@timestamp",
            "interval": "second"
          }
        }
      }
    }
  }
}

returns exactly

{
  "aggregations": {
    "events_last_week": {
      "doc_count": 1099,
      "events_last_week_histogram": [
        {
          "key": 1390267526000,
          "doc_count": 12
        },
        {
          "key": 1390267527000,
          "doc_count": 0
        },
        {
          "key": 1390267528000,
          "doc_count": 29
        },
        {
          "key": 1390267529000,
          "doc_count": 32
        },
        {
          "key": 1390267530000,
          "doc_count": 58
        },
        {
          "key": 1390267531000,
          "doc_count": 64
        },
        {
          "key": 1390267532000,
          "doc_count": 35
        },
        {
          "key": 1390267533000,
          "doc_count": 36
        },
        {
          "key": 1390267534000,
          "doc_count": 43
        },
        {
          "key": 1390267535000,
          "doc_count": 52
        },
        {
          "key": 1390267536000,
          "doc_count": 58
        },
        {
          "key": 1390267537000,
          "doc_count": 62
        },
        {
          "key": 1390267538000,
          "doc_count": 76
        },
        {
          "key": 1390267539000,
          "doc_count": 70
        },
        {
          "key": 1390267540000,
          "doc_count": 53
        },
        {
          "key": 1390267541000,
          "doc_count": 72
        },
        {
          "key": 1390267542000,
          "doc_count": 81
        },
        {
          "key": 1390267543000,
          "doc_count": 48
        },
        {
          "key": 1390267544000,
          "doc_count": 88
        },
        {
          "key": 1390267545000,
          "doc_count": 45
        },
        {
          "key": 1390267546000,
          "doc_count": 83
        },
        {
          "key": 1390267547000,
          "doc_count": 2
        }
      ]
    }
  }
}

But it is missing all of the empty buckets between 1390267432894 and 1390267526000. Again, this is with a 2 shard index on 1.0.0RC1.

uboness · 2014-01-23T15:41:03Z

@cmaitchison as I mentioned above, the histogram operates on the dataset and extracts the min/max of the histogram from the documents (the earliest/latest). There is no direct relations between the filter aggregation and the histogram aggregations (aggregations are unaware of other aggregations in their hierarchy). We could potentially add a range feature to histogram, but if we do it'll have to be post 1.0.

In the first example you gave, there are no documents in that minute, there are no buckets (as we can't determine the min/max values). For the second example, it might be that the first document in the doc set has a later timestamp than the from one in the filter.

… shard, the reduce call was not propagated properly down the agg hierarchy. Closes #4843

cmaitchison · 2014-01-23T21:13:08Z

Thanks, @uboness, for your help and excellent explanation. range on histogram is definitely a feature I would use. For now I can fill in the gaps on the client-side. Thanks again.

uboness · 2014-01-23T21:18:54Z

@cmaitchison no worries... thank you for the bug report! important one!

erikvanzijst · 2014-01-28T18:22:17Z

I'm interested in hard range boundaries (returning empty buckets to fill gaps between from and to in the case of missing documents) as well. Is there an issue tracking this, or shall I raise one?

deanchen · 2015-04-28T02:13:54Z

For anyone who arrived to this thread via Google, hard ranges is supported via the extended_bounds param. http://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-histogram-aggregation.html

taf2 · 2015-06-18T14:40:27Z

I'm now experiencing the same issue as reported running es 1.6.0

histogram = {
  invervals: {
    date_histogram: {
      field: 'called_at',
      interval: 'day',
      order: { _key: "asc" },
      min_doc_count: 0 # doesn't appear to have any impact on the final result.
    },
    aggs: stats
  }
}

taf2 · 2015-06-18T15:47:14Z

it looks like when nesting a date_histogram within a term aggregation there is no way for the min_doc_count to auto fill the zero results.

aggs: {
   groups: {
     terms: {
       min_doc_count: 0
       script: '...'
    },
   aggs: {
   invervals: {
    date_histogram: {
      field: 'called_at',
      interval: 'day',
      order: { _key: "asc" },
      min_doc_count: 0 # doesn't appear to have any impact on the final result.
    },
    aggs: stats
  }
  }
}

clintongormley · 2015-06-18T19:49:24Z

@taf2 please could you open an issue with a complete recreation which explains the problem?

… shard, the reduce call was not propagated properly down the agg hierarchy. Closes elastic#4843

quillan86 · 2020-04-08T18:45:03Z

Is this bug still there? I am trying to do the same exact thing as the OP right now.

vicapow · 2020-04-20T23:03:30Z

me too! :)

mashahabi15 · 2020-08-12T12:34:30Z

And me either. :)

Crijavi4 · 2020-11-30T12:37:25Z

Hi, i found the same issue but it could be workaround adding the object extended_bounds to the date_histogram aggregation, something like this:

{"extended_bounds":{"min":"+timeInit+","max":"+timeFin+"}} where timeInit and timeFin are the same period specified in the range filter in miliseconds

I hope this can help somebody.

uboness mentioned this issue Jan 23, 2014

Fixed an issue where there are sub aggregations executing on a single shard #4869

Closed

uboness closed this as completed in da95370 Jan 23, 2014

uboness added a commit that referenced this issue Jan 23, 2014

Fixed an issue where there are sug aggregations executing on a single…

471c025

… shard, the reduce call was not propagated properly down the agg hierarchy. Closes #4843

uboness added a commit that referenced this issue Jan 23, 2014

Fixed an issue where there are sug aggregations executing on a single…

cba1a31

… shard, the reduce call was not propagated properly down the agg hierarchy. Closes #4843

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Fixed an issue where there are sug aggregations executing on a single…

3fbce4d

… shard, the reduce call was not propagated properly down the agg hierarchy. Closes elastic#4843

adriaandotcom mentioned this issue Nov 29, 2020

When filtering by path the chart moves simpleanalytics/roadmap#476

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

min_doc_count=0 doesn't work with a date_histogram with a filter #4843

min_doc_count=0 doesn't work with a date_histogram with a filter #4843

cmaitchison commented Jan 22, 2014

uboness commented Jan 22, 2014

uboness commented Jan 22, 2014

cmaitchison commented Jan 22, 2014

cmaitchison commented Jan 23, 2014

cmaitchison commented Jan 23, 2014

uboness commented Jan 23, 2014

cmaitchison commented Jan 23, 2014

uboness commented Jan 23, 2014

erikvanzijst commented Jan 28, 2014

deanchen commented Apr 28, 2015

taf2 commented Jun 18, 2015

taf2 commented Jun 18, 2015

clintongormley commented Jun 18, 2015

quillan86 commented Apr 8, 2020

vicapow commented Apr 20, 2020

mashahabi15 commented Aug 12, 2020

Crijavi4 commented Nov 30, 2020

min_doc_count=0 doesn't work with a date_histogram with a filter #4843

min_doc_count=0 doesn't work with a date_histogram with a filter #4843

Comments

cmaitchison commented Jan 22, 2014

uboness commented Jan 22, 2014

uboness commented Jan 22, 2014

cmaitchison commented Jan 22, 2014

cmaitchison commented Jan 23, 2014

cmaitchison commented Jan 23, 2014

uboness commented Jan 23, 2014

cmaitchison commented Jan 23, 2014

uboness commented Jan 23, 2014

erikvanzijst commented Jan 28, 2014

deanchen commented Apr 28, 2015

taf2 commented Jun 18, 2015

taf2 commented Jun 18, 2015

clintongormley commented Jun 18, 2015

quillan86 commented Apr 8, 2020

vicapow commented Apr 20, 2020

mashahabi15 commented Aug 12, 2020

Crijavi4 commented Nov 30, 2020