-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time bucket function #5665
Add time bucket function #5665
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to remove the TIMESTAMPTZ
versions from this PR and implement them separately. The implementations you have now can be accessed by just casting the TSTZ
values to TS
.
Thanks @hawkfish ! I will fix them. |
Added one more overload
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this - especially the TimebaseDB compatibility. And I'm sorry someone posted that this is a "Good First Issue"! It's easy enough for instants (and you did a great job with that), but time zones are a black art.
In particular, "local timestamps" are a dangerous hack and should be avoided whenever possible. We only have support for them because some people have dirty data that uses them and this lets them (mostly) clean them up, and also for compatibility with Postgres and the SQL standard.
I may have mentioned my blog post on time binning, but not you might want to read it if I am not making sense.
test/sql/function/timestamp/test_icu_time_bucket_timestamptz.test
Outdated
Show resolved
Hide resolved
Also I can't resolve the conversations that you have fixed, so can you close them so we can focus on the remaining issues? Thx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry wanted to ask some more questions!
Thanks @hawkfish for careful review! |
Thanks for all the hard work! This is coming along nicely, so please don't be discouraged. I'd highly recommend merging from master before your next push - there have been some broken CI bits. Oh and thanks for fixing the formatting - I swear I thought it was included! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close! I found some small bugs in the naïve timestamp handling but I don't think we will collide for now.
The current implementation still has bug-like behavior.
In Europe/London timezone, DST in 2000 started at '2000-03-26 01:00:00+00' and ended at '2000-10-29 02:00:00+01'. D set timezone = 'Europe/London';
D with t2 as (select t1.ts::timestamptz as ts, time_bucket('10 seconds'::interval, t1.ts::timestamptz, 'Europe/London') as tb from generate_series('2000-10-28 00:00:00'::timestamp, '2000-10-31 00:00:00'::timestamp, '3 hours'::interval) t1 (ts)) select ts, tb, ts = tb as always_should_be_true from t2;
┌──────────────────────────┬──────────────────────────┬───────────────────────┐
│ ts │ tb │ always_should_be_true │
│ timestamp with time zone │ timestamp with time zone │ boolean │
├──────────────────────────┼──────────────────────────┼───────────────────────┤
│ 2000-10-28 00:00:00+01 │ 2000-10-28 00:00:00+01 │ true │
│ 2000-10-28 03:00:00+01 │ 2000-10-28 03:00:00+01 │ true │
│ 2000-10-28 06:00:00+01 │ 2000-10-28 06:00:00+01 │ true │
│ 2000-10-28 09:00:00+01 │ 2000-10-28 09:00:00+01 │ true │
│ 2000-10-28 12:00:00+01 │ 2000-10-28 12:00:00+01 │ true │
│ 2000-10-28 15:00:00+01 │ 2000-10-28 15:00:00+01 │ true │
│ 2000-10-28 18:00:00+01 │ 2000-10-28 18:00:00+01 │ true │
│ 2000-10-28 21:00:00+01 │ 2000-10-28 21:00:00+01 │ true │
│ 2000-10-29 00:00:00+01 │ 2000-10-29 00:00:00+01 │ true │
│ 2000-10-29 03:00:00+00 │ 2000-10-29 04:00:00+00 │ false │
│ 2000-10-29 06:00:00+00 │ 2000-10-29 07:00:00+00 │ false │
│ 2000-10-29 09:00:00+00 │ 2000-10-29 10:00:00+00 │ false │
│ 2000-10-29 12:00:00+00 │ 2000-10-29 13:00:00+00 │ false │
│ 2000-10-29 15:00:00+00 │ 2000-10-29 16:00:00+00 │ false │
│ 2000-10-29 18:00:00+00 │ 2000-10-29 19:00:00+00 │ false │
│ 2000-10-29 21:00:00+00 │ 2000-10-29 22:00:00+00 │ false │
│ 2000-10-30 00:00:00+00 │ 2000-10-30 00:00:00+00 │ true │
│ 2000-10-30 03:00:00+00 │ 2000-10-30 03:00:00+00 │ true │
│ 2000-10-30 06:00:00+00 │ 2000-10-30 06:00:00+00 │ true │
│ 2000-10-30 09:00:00+00 │ 2000-10-30 09:00:00+00 │ true │
│ 2000-10-30 12:00:00+00 │ 2000-10-30 12:00:00+00 │ true │
│ 2000-10-30 15:00:00+00 │ 2000-10-30 15:00:00+00 │ true │
│ 2000-10-30 18:00:00+00 │ 2000-10-30 18:00:00+00 │ true │
│ 2000-10-30 21:00:00+00 │ 2000-10-30 21:00:00+00 │ true │
│ 2000-10-31 00:00:00+00 │ 2000-10-31 00:00:00+00 │ true │
├──────────────────────────┴──────────────────────────┴───────────────────────┤
│ 25 rows 3 columns │
└─────────────────────────────────────────────────────────────────────────────┘ On TimescaleDB, this query returns expected result. postgres=# set timezone = 'Europe/London';
SET
postgres=# with t2 as (select t1.ts::timestamptz as ts, time_bucket('10 seconds'::interval, t1.ts::timestamptz, 'Europe/London') as tb from generate_series('2000-10-28 00:00:00'::timestamp, '2000-10-31 00:00:00'::timestamp, '3 hours'::interval) t1 (ts)) select ts, tb, ts = tb as always_should_be_true from t2;
ts | tb | always_should_be_true
------------------------+------------------------+-----------------------
2000-10-28 00:00:00+01 | 2000-10-28 00:00:00+01 | t
2000-10-28 03:00:00+01 | 2000-10-28 03:00:00+01 | t
2000-10-28 06:00:00+01 | 2000-10-28 06:00:00+01 | t
2000-10-28 09:00:00+01 | 2000-10-28 09:00:00+01 | t
2000-10-28 12:00:00+01 | 2000-10-28 12:00:00+01 | t
2000-10-28 15:00:00+01 | 2000-10-28 15:00:00+01 | t
2000-10-28 18:00:00+01 | 2000-10-28 18:00:00+01 | t
2000-10-28 21:00:00+01 | 2000-10-28 21:00:00+01 | t
2000-10-29 00:00:00+01 | 2000-10-29 00:00:00+01 | t
2000-10-29 03:00:00+00 | 2000-10-29 03:00:00+00 | t
2000-10-29 06:00:00+00 | 2000-10-29 06:00:00+00 | t
2000-10-29 09:00:00+00 | 2000-10-29 09:00:00+00 | t
2000-10-29 12:00:00+00 | 2000-10-29 12:00:00+00 | t
2000-10-29 15:00:00+00 | 2000-10-29 15:00:00+00 | t
2000-10-29 18:00:00+00 | 2000-10-29 18:00:00+00 | t
2000-10-29 21:00:00+00 | 2000-10-29 21:00:00+00 | t
2000-10-30 00:00:00+00 | 2000-10-30 00:00:00+00 | t
2000-10-30 03:00:00+00 | 2000-10-30 03:00:00+00 | t
2000-10-30 06:00:00+00 | 2000-10-30 06:00:00+00 | t
2000-10-30 09:00:00+00 | 2000-10-30 09:00:00+00 | t
2000-10-30 12:00:00+00 | 2000-10-30 12:00:00+00 | t
2000-10-30 15:00:00+00 | 2000-10-30 15:00:00+00 | t
2000-10-30 18:00:00+00 | 2000-10-30 18:00:00+00 | t
2000-10-30 21:00:00+00 | 2000-10-30 21:00:00+00 | t
2000-10-31 00:00:00+00 | 2000-10-31 00:00:00+00 | t
(25 rows)
D set timezone = 'Europe/London';
D with t2 as (select t1.ts::timestamptz as ts, time_bucket('10 seconds'::interval, t1.ts::timestamptz, 'Europe/London') as tb from generate_series('2000-03-25 00:00:00'::timestamp, '2000-03-28 00:00:00'::timestamp, '3 hours'::interval) t1 (ts)) select ts, tb, ts = tb as always_should_be_true from t2;
┌──────────────────────────┬──────────────────────────┬───────────────────────┐
│ ts │ tb │ always_should_be_true │
│ timestamp with time zone │ timestamp with time zone │ boolean │
├──────────────────────────┼──────────────────────────┼───────────────────────┤
│ 2000-03-25 00:00:00+00 │ 2000-03-25 00:00:00+00 │ true │
│ 2000-03-25 03:00:00+00 │ 2000-03-25 03:00:00+00 │ true │
│ 2000-03-25 06:00:00+00 │ 2000-03-25 06:00:00+00 │ true │
│ 2000-03-25 09:00:00+00 │ 2000-03-25 09:00:00+00 │ true │
│ 2000-03-25 12:00:00+00 │ 2000-03-25 12:00:00+00 │ true │
│ 2000-03-25 15:00:00+00 │ 2000-03-25 15:00:00+00 │ true │
│ 2000-03-25 18:00:00+00 │ 2000-03-25 18:00:00+00 │ true │
│ 2000-03-25 21:00:00+00 │ 2000-03-25 21:00:00+00 │ true │
│ 2000-03-26 00:00:00+00 │ 2000-03-26 00:00:00+00 │ true │
│ 2000-03-26 03:00:00+01 │ 2000-03-26 02:00:00+01 │ false │
│ 2000-03-26 06:00:00+01 │ 2000-03-26 05:00:00+01 │ false │
│ 2000-03-26 09:00:00+01 │ 2000-03-26 08:00:00+01 │ false │
│ 2000-03-26 12:00:00+01 │ 2000-03-26 11:00:00+01 │ false │
│ 2000-03-26 15:00:00+01 │ 2000-03-26 14:00:00+01 │ false │
│ 2000-03-26 18:00:00+01 │ 2000-03-26 17:00:00+01 │ false │
│ 2000-03-26 21:00:00+01 │ 2000-03-26 20:00:00+01 │ false │
│ 2000-03-27 00:00:00+01 │ 2000-03-27 00:00:00+01 │ true │
│ 2000-03-27 03:00:00+01 │ 2000-03-27 03:00:00+01 │ true │
│ 2000-03-27 06:00:00+01 │ 2000-03-27 06:00:00+01 │ true │
│ 2000-03-27 09:00:00+01 │ 2000-03-27 09:00:00+01 │ true │
│ 2000-03-27 12:00:00+01 │ 2000-03-27 12:00:00+01 │ true │
│ 2000-03-27 15:00:00+01 │ 2000-03-27 15:00:00+01 │ true │
│ 2000-03-27 18:00:00+01 │ 2000-03-27 18:00:00+01 │ true │
│ 2000-03-27 21:00:00+01 │ 2000-03-27 21:00:00+01 │ true │
│ 2000-03-28 00:00:00+01 │ 2000-03-28 00:00:00+01 │ true │
├──────────────────────────┴──────────────────────────┴───────────────────────┤
│ 25 rows 3 columns │
└─────────────────────────────────────────────────────────────────────────────┘ |
That's odd - I'm going to pull your code and step into it. |
It looks like the problem is that there is a difference between the number of hours between the TB epoch and the non-DST times on the 29th (7203 == 300d 3h) and the interval that |
There is a bug in the ICU TSTZ difference code. It is moving the start date forward instead of moving the end date backwards. So when it moves the epoch date forward by 300 days, it gains an hour crossing the spring DST boundary and never gives it back, resulting in an extra hour of difference at the next step. This is not symmetrical with the interval addition code. I'll have a closer look tomorrow and hopefully have a fix. |
Thanks for your investigation! |
Xcode, actually (which wraps lldb) |
Thanks! |
With the bug fix your examples now work, except the last one above, which is wrong (all the dates should be the same). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the non-ICU changes are ready to go. Can we split them out into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are down to the µs computation and the DST test conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just resolve the conversations.
Thanks @hawkfish for the long-term review! |
Thanks! |
Woah 😍 |
Closes #5589
Edit: PR of non-ICU time_bucket function has been split into #5835.
Implemented four overloads of ICU time_bucket function.
They behave almost same as TimescaleDB's time_bucket.
To ease the tests, I set the default origin to '2000-01-03 00:00:00+00' (when bucket_width is small) or '2000-01-01 00:00:00+00' (when bucket_width is large) as the same as TimescaleDB.