Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_histogram issue when using "pre_zone_adjust_large_interval" and a timezone with DST #9491

Closed
orenash opened this issue Jan 29, 2015 · 11 comments

Comments

@orenash
Copy link
Contributor

orenash commented Jan 29, 2015

Hey there.

Since we upgraded to version 1.3.7, we noticed that we sometimes get weird double buckets when running our date_histogram aggregations.

After trying some things we figured out it only happens when we set pre_zone_adjust_large_interval to true, pre_zone to our local time zone (which uses DST), and interval is 'day' or bigger.

In the aggregration's result we see that we get two buckets corresponding to the same interval, but with an hour difference between their keys.

Here is a simple example that should reproduce this issue:

POST /test/t
{
    "d": "2014-10-08T13:00:00Z"
}

POST /test/t
{
    "d": "2014-11-08T13:00:00Z"
}

GET /test/_search?size=0
{
   "aggs": {
      "test": {
         "date_histogram": {
            "field": "d",
            "interval": "year",
            "pre_zone": "Asia/Jerusalem",
            "pre_zone_adjust_large_interval": true
         }
      }
   }
}

And the result:

"aggregations": {
      "test": {
         "buckets": [
            {
               "key_as_string": "2013-12-31T21:00:00.000Z",
               "key": 1388523600000,
               "doc_count": 1
            },
            {
               "key_as_string": "2013-12-31T22:00:00.000Z",
               "key": 1388527200000,
               "doc_count": 1
            }
         ]
      }
   }

As you can see, although the two timestamps unarguably belong to the same year, they're put into different buckets. As we figured out, it happens because the timezone offset is different for the two timestamps as one of them occurs when DST is on and the other is not.

We use Asia/Jerusalem timezone but the same problem happens in every timezone with DST (eg CET), as long as pre_zone_adjust_large_interval is used (When it's not used both documents will go to the same bucket "2014-01-01T00:00:00.000Z").

This problem never happened before we upgraded to 1.3.7. Moreover, while digging into the code we figured out that this bug is a direct result of the proposed fix for issue #8339. While that fix indeed solved the timezone problem in 'hour' intervals around DST switch, it caused the new problem with bigger intervals.

The problem is also in version 1.4.2.

@orenash
Copy link
Contributor Author

orenash commented Feb 10, 2015

I had the time to look into this issue and figure out the exact reasons for this to happen and how it might be fixed.

The issue is with the TimeTimeZoneRoundingFloor strategy in TimeZoneRounding (used when pre_zone is not UTC, interval<=hour or pre_zone_adjust_large_interval=true).
Rounding the key in this strategy is done as follows: shifting the key from UTC to local time in pre_zone, rounding the key, then shifting it back to UTC.

The problem is to determine the offset to subtract from the rounded key in order to take it back to the correct UTC time. The desired offset might be different from the original added offset due to DST switch (either the original shift or the rounding resulted in a timestamp that occurs in different DST configuration).

Prior to fix #8655, the offset used to shift back was the offset at the rounded key. This worked well in most cases for large intervals, but not for hour intervals around the DST switch.
After the fix, the offset used to shift back is always same offset used to switch from UTC to local time. This always gives a correct result for hour intervals (assuming DST switch always occurs in round hours).

But as we saw there is a problem with large intervals now - if the timezone offset at the original key is different than what it is at rounded key (which points to midnight), then we'll end up with a rounded key one hour away from midnight. For example 2014-08-08T00:00:00Z has +3 offset in Asia/Jerusalem and after shifting and rounding we get 2014-01-01T00:00:00Z. Subtracting 3 hours from that results in 2013-12-31T21:00:00Z which is not the expected result (which is 2013-12-31T22:00:00Z because the offset is +2 in winter). Moreover, if two timestamps occur in same month/year but with different DST, they are mapped into different keys (one is correct and one is not), and that's why we get the double buckets.

I figured out that in order to solve this, we can use getOffsetFromLocal method of the timezone class. This method is the opposite of getOffset and it always gives the correct offset we need to subtract in order to get back from shifted time to UTC.

I changed the code to use this method and all the tests pass, including tests I added for that issue (that fail on the latest code).

But then, I found another rare case that fails when using getOffsetFromLocal (but succeeds with latest code). This is the case immediately after switching DST on->off, the two hours before and after the switch are "ambiguous" in local time. If interval>=day, this is irrelevant because rounding takes us to midnight anyway. Nevertheless, if interval=hour, there is no way for getOffsetFromLocal to distinguish between them so this means they both go into the same bucket. That's probably OK for most applications, because hour histogram around DST switch is confusing to display whatsoever. But since I am pedantic, I want to solve this case as well... ^_^

After looking at the work done on TimeZoneRounding in #9637 (planned for 2.0), I saw the use of the methods convertUTCToLocal and convertLocalToUTC which basically do the same as adding/subtracting the offset returned by getOffset/getOffsetFromLocal. Except that convertLocalToUTC can get the original UTC timestamp as an optional parameter and it uses it to solve this exact issue (the original UTC time helps to solve the ambiguity). I tried using these methods and it seems to work well in all cases (all tests pass).

I will post a pull request with my fix and my additional tests as soon as I figure out if anything should also be changed in the strategies that deal with intervals such as "2d" (I think this problem is irrelevant there).

Here are the tests I added in order to cover pre_zone_adjust_large_interval functionality and the double buckets bug:

    @Test
    public void testAdjustPreTimeZone() {
        Rounding tzRounding;

        // Day interval
        tzRounding = TimeZoneRounding.builder(DateTimeUnit.DAY_OF_MONTH).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
        assertThat(tzRounding.round(time("2014-11-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-11-11T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
        // DST on
        assertThat(tzRounding.round(time("2014-08-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-08-11T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
        // Day of switching DST on -> off
        assertThat(tzRounding.round(time("2014-10-26T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-10-26T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
        // Day of switching DST off -> on
        assertThat(tzRounding.round(time("2015-03-27T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2015-03-27T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));

        // Month interval
        tzRounding = TimeZoneRounding.builder(DateTimeUnit.MONTH_OF_YEAR).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
        assertThat(tzRounding.round(time("2014-11-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-11-01T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));
        // DST on
        assertThat(tzRounding.round(time("2014-10-10T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-10-01T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));

        // Year interval
        tzRounding = TimeZoneRounding.builder(DateTimeUnit.YEAR_OF_CENTURY).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
        assertThat(tzRounding.round(time("2014-11-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))), equalTo(time("2014-01-01T00:00:00", DateTimeZone.forID("Asia/Jerusalem"))));

        // Two timestamps in same year ("Double buckets" bug in 1.3.7)
        tzRounding = TimeZoneRounding.builder(DateTimeUnit.YEAR_OF_CENTURY).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
        assertThat(tzRounding.round(time("2014-11-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem"))),
                equalTo(tzRounding.round(time("2014-08-11T17:00:00", DateTimeZone.forID("Asia/Jerusalem")))));
    }

Here is a test that covers the ambiguous hours bug:

    @Test
    public void testAmbiguousHoursAfterDSTSwitch() {
        Rounding tzRounding;

        tzRounding = TimeZoneRounding.builder(DateTimeUnit.HOUR_OF_DAY).preZone(DateTimeZone.forID("Asia/Jerusalem")).preZoneAdjustLargeInterval(true).build();
        assertThat(tzRounding.round(time("2014-10-25T22:30:00", DateTimeZone.UTC)), equalTo(time("2014-10-25T22:00:00", DateTimeZone.UTC)));
        assertThat(tzRounding.round(time("2014-10-25T23:30:00", DateTimeZone.UTC)), equalTo(time("2014-10-25T23:00:00", DateTimeZone.UTC)));
    }

@synhershko
Copy link
Contributor

I can confirm I've seen this issue before - back in the facet days with 0.90. I recall opening an issue on this but can't find it now. Good catch!

@cbuescher
Copy link
Member

Thanks for the great test case. Since I'm currently trying to clean up the time zone management for 2.0 I included your test case from the first comment. I can reproduce the issue on 1.4 and current 2.0 branch and I was able to make it pass using the following changes to TimeZoneRounding.TimeTimeZoneRoundingFloor:

public long roundKey(long utcMillis) {
            long local = preTz.convertUTCToLocal(utcMillis);
            return preTz.convertLocalToUTC(field.roundFloor(local), true, utcMillis);
}

This basically follows your suggestions and the work in #9637. I will also add your other Rounding tests there and see if they pass with my intended implementation. However, the pre_zone_adjust_large_interval will hopefully be gone for 2.0.

@orenash
Copy link
Contributor Author

orenash commented Feb 11, 2015

Yes, the code for roundKey that you included looks exactly like my final solution, that's great.
Can this fix also be included in 1.x? (As I said I can post a PR for this myself, but if you plan to include it in 2.0 I guess it's better you manage it).

About removing pre_zone_adjust_large_interval, from what I understood, you plan to always return times in UTC on buckets. Which is equivalent to setting pre_zone_adjust_large_interval=true now. Am I right? If so, it means you can use the tests I provided for pre_zone_adjust_large_interval=true to test 2.0 (instead of the existing test that assume pre_zone_adjust_large_interval=false and will fail on your new code).

@cbuescher
Copy link
Member

Which is equivalent to setting pre_zone_adjust_large_interval=true

Yes, I think that is what it comes down to. I was able to use your test cases above with minor modifications to test on 2.0 branch. I think the plan is to backport the parts that are bug fixes to 1.x branch, but you can also propose a PR for that. This will still have to work with pre/postZone etc... because those will only be cleaned up later.

@cbuescher
Copy link
Member

@orenash I'm going to try to backport some of the changes I did in #9637 on 2.0 to the 1.x branch which will likely adress this issue here. So if you'd like to post a PR for your tests I could include these in the branches as run the backported changes against them as well.

@cbuescher cbuescher added the 1.x label Feb 17, 2015
@orenash
Copy link
Contributor Author

orenash commented Feb 17, 2015

Alright, thanks a lot! I will do so in the next days.

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Feb 20, 2015
This fix enhances the internal time zone conversion in the
TimeZoneRounding classes that were the cause of issues with
strange date bucket keys in elastic#9491 and elastic#7673.

Closes elastic#9491
Closes elastic#7673
@cbuescher
Copy link
Member

Hi @orenash, I included an integration tests based on your first comment in #9790, still would like to include your aditional rounding tests if you want to put them into a separate PR.

@orenash
Copy link
Contributor Author

orenash commented Feb 20, 2015

Don't you prefer to have the tests in the same PR with your fix? If you do not, I will open a PR later today.

@cbuescher
Copy link
Member

No, this is great, will merge that after the fix is on the branch. Many thanks.

cbuescher pushed a commit that referenced this issue Feb 23, 2015
This fix enhances the internal time zone conversion in the
TimeZoneRounding classes that were the cause of issues with
strange date bucket keys in #9491 and #7673.

Closes #9491
Closes #7673
cbuescher pushed a commit that referenced this issue Feb 23, 2015
This fix enhances the internal time zone conversion in the
TimeZoneRounding classes that were the cause of issues with
strange date bucket keys in #9491 and #7673.

Closes #9491
Closes #7673
@cbuescher
Copy link
Member

Fixed on 1.x with 78f0202 and on 1.4 with b7dbf1e

@cbuescher cbuescher added v1.4.5 and removed v1.4.0 labels Mar 18, 2015
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
This fix enhances the internal time zone conversion in the
TimeZoneRounding classes that were the cause of issues with
strange date bucket keys in elastic#9491 and elastic#7673.

Closes elastic#9491
Closes elastic#7673
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants