Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations: date_histogram aggregation DST bug #8339

Closed
thanodnl opened this issue Nov 4, 2014 · 3 comments
Closed

Aggregations: date_histogram aggregation DST bug #8339

thanodnl opened this issue Nov 4, 2014 · 3 comments

Comments

@thanodnl
Copy link
Contributor

thanodnl commented Nov 4, 2014

Hi,

Since it is the time of year where we adjust our clocks from daylight savings time to normal time again a bug in the date histogram struck us.

When we run a date_histogram in a timezone other than UTC you will see some buckets being wrongfully combined. In the first example you see a date_histogram aggregate in UTC followed by the same aggregate in CET;

UTC query:

{
    "size": 0,
    "query": {
        "range": {
           "published": {
              "gte": 1414274400000,
              "lt": 1414292400000
           }
        }
    },
    "aggs": {
        "vot": {
            "date_histogram": {
                "field": "published",
                "interval": "hour",
                "min_doc_count": 0,
                "time_zone": "UTC"
            }
        }
    }
}

UTC result:

{
   "took": 12,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1130,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "vot": {
         "buckets": [
            {
               "key_as_string": "2014-10-25T22:00:00.000Z",
               "key": 1414274400000,
               "doc_count": 260
            },
            {
               "key_as_string": "2014-10-25T23:00:00.000Z",
               "key": 1414278000000,
               "doc_count": 216
            },
            {
               "key_as_string": "2014-10-26T00:00:00.000Z",
               "key": 1414281600000,
               "doc_count": 222
            },
            {
               "key_as_string": "2014-10-26T01:00:00.000Z",
               "key": 1414285200000,
               "doc_count": 200
            },
            {
               "key_as_string": "2014-10-26T02:00:00.000Z",
               "key": 1414288800000,
               "doc_count": 232
            }
         ]
      }
   }
}

CET query:

{
    "size": 0,
    "query": {
        "range": {
           "published": {
              "gte": 1414274400000,
              "lt": 1414292400000
           }
        }
    },
    "aggs": {
        "vot": {
            "date_histogram": {
                "field": "published",
                "interval": "hour",
                "min_doc_count": 0,
                "time_zone": "CET"
            }
        }
    }
}

UTC result:

{
   "took": 9,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1130,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "vot": {
         "buckets": [
            {
               "key_as_string": "2014-10-25T22:00:00.000Z",
               "key": 1414274400000,
               "doc_count": 260
            },
            {
               "key_as_string": "2014-10-25T23:00:00.000Z",
               "key": 1414278000000,
               "doc_count": 0
            },
            {
               "key_as_string": "2014-10-26T00:00:00.000Z",
               "key": 1414281600000,
               "doc_count": 216
            },
            {
               "key_as_string": "2014-10-26T01:00:00.000Z",
               "key": 1414285200000,
               "doc_count": 422
            },
            {
               "key_as_string": "2014-10-26T02:00:00.000Z",
               "key": 1414288800000,
               "doc_count": 232
            }
         ]
      }
   }
}

Points of interest in these queries are the result buckets for key: 1414278000000. When ran in UTC it give a doc_count of 216. When ran in CET it has a doc_count of 0.

Further more, you will find a doc_count of 216 in the CET bucket of 1414281600000 (an hour to late). Last to point out is the next CET bucket, it contains the value of 422 which is the sum of 222 + 200. As you can see these values can be found in the corresponding bucket and the previous bucket in UTC.

Attached you will find a patch file adding some basic tests to the org.elasticsearch.common.rounding package. Here we test key rounding for the CET and America/Chicago timezones on the days of the DST switch.

I tried diggin into the issue and found that it has to do with the recalculation of preTz.getOffset on the following lines TimeZoneRounding.java:156 and TimeZoneRounding.java:163.

Running this step by step on my first CET test case results the first time (line 156) in 7200000 (2 hours) and the second time in 3600000 (1 hour). This difference is causeing 2014-10-26T01:01:01 GMT+0200 to resolve to 2014-10-26T02:00:00 GMT+0200. Explaining why the 1414278000000 (2014-10-26T01:01:01 GMT+0200) bucket to be empty, the next bucket containing the contents of this bucket, and the bucket after that a sum of two buckets.

0001-Add-tests-for-timezone-problems.patch:

From 4e022035b25a141027be6aaae78c8ad2454e673f Mon Sep 17 00:00:00 2001
From: Nils Dijk <me@thanod.nl>
Date: Tue, 4 Nov 2014 07:56:05 -0600
Subject: [PATCH 1/1] Add tests for timezone problems.

---
 .../common/rounding/TimeZoneRoundingTests.java     | 39 ++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/src/test/java/org/elasticsearch/common/rounding/TimeZoneRoundingTests.java b/src/test/java/org/elasticsearch/common/rounding/TimeZoneRoundingTests.java
index a3d70c7..e79ad1e 100644
--- a/src/test/java/org/elasticsearch/common/rounding/TimeZoneRoundingTests.java
+++ b/src/test/java/org/elasticsearch/common/rounding/TimeZoneRoundingTests.java
@@ -83,6 +83,45 @@ public class TimeZoneRoundingTests extends ElasticsearchTestCase {
         assertThat(tzRounding.round(utc("2009-02-03T01:01:01")), equalTo(time("2009-02-03T01:00:00", DateTimeZone.forOffsetHours(+2))));
         assertThat(tzRounding.nextRoundingValue(time("2009-02-03T01:00:00", DateTimeZone.forOffsetHours(+2))), equalTo(time("2009-02-03T02:00:00", DateTimeZone.forOffsetHours(+2))));
     }
+    
+    @Test
+    public void testTimeTimeZoneRoundingDST() {
+        Rounding tzRounding;
+        // testing savings to non savings switch 
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("UTC")).build();
+        assertThat(tzRounding.round(time("2014-10-26T01:01:01", DateTimeZone.forID("CET"))), equalTo(time("2014-10-26T01:00:00", DateTimeZone.forID("CET"))));
+        
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("CET")).build();
+        assertThat(tzRounding.round(time("2014-10-26T01:01:01", DateTimeZone.forID("CET"))), equalTo(time("2014-10-26T01:00:00", DateTimeZone.forID("CET"))));
+        
+        // testing non savings to savings switch
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("UTC")).build();
+        assertThat(tzRounding.round(time("2014-03-30T01:01:01", DateTimeZone.forID("CET"))), equalTo(time("2014-03-30T01:00:00", DateTimeZone.forID("CET"))));
+        
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("CET")).build();
+        assertThat(tzRounding.round(time("2014-03-30T01:01:01", DateTimeZone.forID("CET"))), equalTo(time("2014-03-30T01:00:00", DateTimeZone.forID("CET"))));
+        
+        // testing non savings to savings switch (America/Chicago)
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("UTC")).build();
+        assertThat(tzRounding.round(time("2014-03-09T03:01:01", DateTimeZone.forID("America/Chicago"))), equalTo(time("2014-03-09T03:00:00", DateTimeZone.forID("America/Chicago"))));
+        
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("America/Chicago")).build();
+        assertThat(tzRounding.round(time("2014-03-09T03:01:01", DateTimeZone.forID("America/Chicago"))), equalTo(time("2014-03-09T03:00:00", DateTimeZone.forID("America/Chicago"))));
+        
+        // testing savings to non savings switch 2013 (America/Chicago)
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("UTC")).build();
+        assertThat(tzRounding.round(time("2013-11-03T06:01:01", DateTimeZone.forID("America/Chicago"))), equalTo(time("2013-11-03T06:00:00", DateTimeZone.forID("America/Chicago"))));
+        
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("America/Chicago")).build();
+        assertThat(tzRounding.round(time("2013-11-03T06:01:01", DateTimeZone.forID("America/Chicago"))), equalTo(time("2013-11-03T06:00:00", DateTimeZone.forID("America/Chicago"))));
+        
+        // testing savings to non savings switch 2014 (America/Chicago)
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("UTC")).build();
+        assertThat(tzRounding.round(time("2014-11-02T06:01:01", DateTimeZone.forID("America/Chicago"))), equalTo(time("2014-11-02T06:00:00", DateTimeZone.forID("America/Chicago"))));
+        
+        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("America/Chicago")).build();
+        assertThat(tzRounding.round(time("2014-11-02T06:01:01", DateTimeZone.forID("America/Chicago"))), equalTo(time("2014-11-02T06:00:00", DateTimeZone.forID("America/Chicago"))));
+    }

     private long utc(String time) {
         return time(time, DateTimeZone.UTC);
-- 
1.9.3 (Apple Git-50)
@thanodnl
Copy link
Contributor Author

I have been fiddling around with this issue and came up with this fix: thanodnl@1cedf3e

The existing tests still pass (I only ran the timezone rounding tests though) and the tests I submitted in the patch above also pass.

If you are happy with the result feel free to take this solution. If you need me to open a pull request let me know!

@jpountz
Copy link
Contributor

jpountz commented Nov 24, 2014

@thanodnl Your fix and tests look good to me.

If you need me to open a pull request let me know!

Please do! (and ping me when you open it so that I don't miss it)

@thanodnl
Copy link
Contributor Author

@jpountz I just opened the pull request, also mentioned your handle there but I do not know if you get alerted for that. So: PING! 😄

jpountz pushed a commit that referenced this issue Nov 25, 2014
jpountz pushed a commit that referenced this issue Nov 25, 2014
jpountz pushed a commit that referenced this issue Nov 25, 2014
@clintongormley clintongormley changed the title date_histogram aggregation DST bug Aggs: date_histogram aggregation DST bug Nov 25, 2014
@clintongormley clintongormley changed the title Aggs: date_histogram aggregation DST bug Aggregations: date_histogram aggregation DST bug Nov 25, 2014
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants