-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Fix DateTime64 to Date conversion for out-of-range dates when working with time zones #88737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DateTime64 to Date conversion for out-of-range dates when working with time zones #88737
Conversation
|
Cause of the very rare case where someone expects: |
I think it should be just a bugfix, it is not an expected behavior |
|
Workflow [PR], commit [4cbd6ef] Summary: ❌
|
|
Also, here I've tested this use-case, and |
Even with saturate I get unexpected behavior, see: https://fiddle.clickhouse.com/9b798be7-6ee3-44d1-a8ff-744591a28942 |
|
@yariks5s |
6ae33f7 to
61becb4
Compare
Why? It is expected that overflow in |
61becb4 to
20039ce
Compare
cfe8b13 to
057c4a9
Compare
src/Functions/DateTimeTransforms.h
Outdated
| t = 0; | ||
| else if (t > MAX_DATE_TIMESTAMP) | ||
| t = MAX_DATE_TIMESTAMP; | ||
| auto day_num = time_zone.toDayNum(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you know, why does this make a difference? For example:
SELECT toDate(toDateTime64('2149-06-07 00:00:00', 0, 'UTC'));
and
SELECT toDate(toDateTime64('2149-06-07 02:00:00', 0, 'Europe/Berlin'));
Brought different results before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes indeed fix it, but without your changes? interesting to find out the reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was wrapped into a negative value due to an integer overflow, so it was smaller than 0 and clamped to 1970-01-01.
Previously, both were clamped to 2149-06-06, but then two hours were added for European time, which exceeds the maximum UInt16 value.
Now, we first convert the time zone and then clamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but how these two cases are different:
SELECT toDate(toDateTime64('2149-06-07 00:00:00', 0, 'UTC'));
and
SELECT toDate(toDateTime64('2149-06-07 02:00:00', 0, 'Europe/Berlin'));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while writing this it came to my mind, that we might also want to handle this for the throw behaviour:
See: https://fiddle.clickhouse.com/411e1529-cb57-445c-8805-8f5846cf9f89
@yariks5s Should I also apply the timezone conversion in the throw case before the if statement?
or is that such a niche edge case that we don't want to handle it?
static UInt16 execute(Int64 t, const DateLUTImpl & time_zone)
{
auto day_num = time_zone.toDayNum(t);
if constexpr (date_time_overflow_behavior == FormatSettings::DateTimeOverflowBehavior::Saturate)
{
if (day_num < 0)
day_num = 0;
if (day_num > DATE_LUT_MAX_DAY_NUM)
day_num = DATE_LUT_MAX_DAY_NUM;
}
else if constexpr (date_time_overflow_behavior == FormatSettings::DateTimeOverflowBehavior::Throw)
{
if (day_num < 0 || day_num > DATE_LUT_MAX_DAY_NUM) [[unlikely]]
throw Exception(ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE, "Value {} is out of bounds of type Date", day_num);
}
return static_cast<UInt16>(day_num);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's caused by the integer overflow (I guess)
- let's say our
tis a value bigger thanMAX_DATE_TIMESTAMPand we are in european timezone - then in previous implementation we would clamp:
t = MAX_DATE_TIMESTAMP - then we apply the timezone offset in the toDayNum method, which adds 2 hours, this will cause a value bigger than 65,535 which is the max unit16 value, so it get's wrapped around and starts from 0 again
1970-01-01 + 65535 days = 2149-06-06
I suspect this is the case, but I am not 100% sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to fix it later and implement a more general approach here. Your last fiddle makes sense, but we're fixing the symptoms, not the causes. I think it would be better to fix it when we convert between the timezones, because otherwise we just make the code more complex and we can have the same kinds of bug in other places.
It is not super important for now, and I can (or you, if you want so) fix these timezone problems later. Maybe let's fix the initial problem for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fiddle doesn't make sense, these two describe the same exact time:
SELECT toDate(toDateTime64('2149-06-06 23:59:59', 0));
SELECT toDate(toDateTime64('2149-07-06 01:59:59', 0, 'Europe/Berlin'));so they should return the same, but the second line is throwing an error.
Yeah I see, that moving the saturation to the timezone conversion could make sense, but then we would also need to handle the throw case. I think my above code snippet would handle all cases pretty well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, your point for this issue does make sense, it is a problem we indeed need to fix, but I would use a more general approach in the future PR
ed41067 to
d664aab
Compare
d664aab to
4cbd6ef
Compare
|
|
26c50f8
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed a bug where converting DateTime64 to Date with
date_time_overflow_behavior = 'saturate'could lead to incorrect results for out-of-range values when working with time zonesDocumentation entry for user-facing changes