Skip to content

Conversation

@raimannma
Copy link
Contributor

@raimannma raimannma commented Oct 17, 2025

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a bug where converting DateTime64 to Date with date_time_overflow_behavior = 'saturate' could lead to incorrect results for out-of-range values when working with time zones

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@raimannma
Copy link
Contributor Author

Cause of the very rare case where someone expects:
select toDate(toDateTime64('1900-01-01 00:00:00', 0)) to return '2079-06-07'
should I mark this PR as a "Backward Incompatible Change"?

@yariks5s yariks5s self-assigned this Oct 17, 2025
@yariks5s yariks5s added the can be tested Allows running workflows for external contributors label Oct 17, 2025
@yariks5s
Copy link
Member

Cause of the very rare case where someone expects: select toDate(toDateTime64('1900-01-01 00:00:00', 0)) to return '2079-06-07' should I mark this PR as a "Backward Incompatible Change"?

I think it should be just a bugfix, it is not an expected behavior

@clickhouse-gh
Copy link

clickhouse-gh bot commented Oct 17, 2025

Workflow [PR], commit [4cbd6ef]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, ParallelReplicas, s3 storage, parallel) failure
03578_parallel_replicas_minicrawl FAIL cidb, flaky

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Oct 17, 2025
@yariks5s
Copy link
Member

yariks5s commented Oct 17, 2025

set date_time_overflow_behavior = 'saturate'; helps for this usecase.
It is a documented behavior, check here: https://clickhouse.com/docs/operations/settings/formats#date_time_overflow_behavior.

Also, here I've tested this use-case, and toDateTime64('1900-01-01 00:00:00', 0) leads to a negative value in int, so it is indeed a normal behavior. The question is: why do we have this behavior by default, but it isn't related to this fix.

@raimannma
Copy link
Contributor Author

set date_time_overflow_behavior = 'saturate'; helps for this usecase. but still, casting datetime64(0) to date is not an overflow.

Even with saturate I get unexpected behavior, see: https://fiddle.clickhouse.com/9b798be7-6ee3-44d1-a8ff-744591a28942

@raimannma
Copy link
Contributor Author

raimannma commented Oct 17, 2025

@yariks5s
Should I just fix that error with saturate and ignore the "ignore" case?
I think such weird behaviour should not exist even in ignore mode or it should be actively selected by the user, but not the default.

@raimannma raimannma force-pushed the fix/datetime_undefined_behaviour branch from 6ae33f7 to 61becb4 Compare October 17, 2025 16:05
@yariks5s
Copy link
Member

I think such weird behaviour should not exist even in ignore mode

Why? It is expected that overflow in throw mode can return random values. Let's do the fix for your second case, thank you for finding it.

@raimannma raimannma force-pushed the fix/datetime_undefined_behaviour branch from 61becb4 to 20039ce Compare October 17, 2025 16:08
@raimannma raimannma changed the title Fix DateTime64 to Date conversion for out-of-range dates Fix DateTime64 to Date conversion for out-of-range dates when working with time zones Oct 17, 2025
@raimannma raimannma force-pushed the fix/datetime_undefined_behaviour branch 2 times, most recently from cfe8b13 to 057c4a9 Compare October 17, 2025 16:16
t = 0;
else if (t > MAX_DATE_TIMESTAMP)
t = MAX_DATE_TIMESTAMP;
auto day_num = time_zone.toDayNum(t);
Copy link
Member

@yariks5s yariks5s Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you know, why does this make a difference? For example:

SELECT toDate(toDateTime64('2149-06-07 00:00:00', 0, 'UTC'));

and

SELECT toDate(toDateTime64('2149-06-07 02:00:00', 0, 'Europe/Berlin'));

Brought different results before

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes indeed fix it, but without your changes? interesting to find out the reason

Copy link
Contributor Author

@raimannma raimannma Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was wrapped into a negative value due to an integer overflow, so it was smaller than 0 and clamped to 1970-01-01.

Previously, both were clamped to 2149-06-06, but then two hours were added for European time, which exceeds the maximum UInt16 value.

Now, we first convert the time zone and then clamp.

Copy link
Member

@yariks5s yariks5s Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how these two cases are different:

SELECT toDate(toDateTime64('2149-06-07 00:00:00', 0, 'UTC'));

and

SELECT toDate(toDateTime64('2149-06-07 02:00:00', 0, 'Europe/Berlin'));

Copy link
Contributor Author

@raimannma raimannma Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while writing this it came to my mind, that we might also want to handle this for the throw behaviour:
See: https://fiddle.clickhouse.com/411e1529-cb57-445c-8805-8f5846cf9f89

@yariks5s Should I also apply the timezone conversion in the throw case before the if statement?
or is that such a niche edge case that we don't want to handle it?

static UInt16 execute(Int64 t, const DateLUTImpl & time_zone)
{
    auto day_num = time_zone.toDayNum(t);
    if constexpr (date_time_overflow_behavior == FormatSettings::DateTimeOverflowBehavior::Saturate)
    {
        if (day_num < 0)
            day_num = 0;
        if (day_num > DATE_LUT_MAX_DAY_NUM)
            day_num = DATE_LUT_MAX_DAY_NUM;
    }
    else if constexpr (date_time_overflow_behavior == FormatSettings::DateTimeOverflowBehavior::Throw)
    {
        if (day_num < 0 || day_num > DATE_LUT_MAX_DAY_NUM) [[unlikely]]
            throw Exception(ErrorCodes::VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE, "Value {} is out of bounds of type Date", day_num);
    }
    return static_cast<UInt16>(day_num);
}

Copy link
Contributor Author

@raimannma raimannma Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's caused by the integer overflow (I guess)

  1. let's say our t is a value bigger than MAX_DATE_TIMESTAMP and we are in european timezone
  2. then in previous implementation we would clamp: t = MAX_DATE_TIMESTAMP
  3. then we apply the timezone offset in the toDayNum method, which adds 2 hours, this will cause a value bigger than 65,535 which is the max unit16 value, so it get's wrapped around and starts from 0 again

1970-01-01 + 65535 days = 2149-06-06

I suspect this is the case, but I am not 100% sure

Copy link
Member

@yariks5s yariks5s Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to fix it later and implement a more general approach here. Your last fiddle makes sense, but we're fixing the symptoms, not the causes. I think it would be better to fix it when we convert between the timezones, because otherwise we just make the code more complex and we can have the same kinds of bug in other places.

It is not super important for now, and I can (or you, if you want so) fix these timezone problems later. Maybe let's fix the initial problem for now?

Copy link
Contributor Author

@raimannma raimannma Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fiddle doesn't make sense, these two describe the same exact time:

SELECT toDate(toDateTime64('2149-06-06 23:59:59', 0));
SELECT toDate(toDateTime64('2149-07-06 01:59:59', 0, 'Europe/Berlin'));

so they should return the same, but the second line is throwing an error.

Yeah I see, that moving the saturation to the timezone conversion could make sense, but then we would also need to handle the throw case. I think my above code snippet would handle all cases pretty well?

Copy link
Member

@yariks5s yariks5s Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, your point for this issue does make sense, it is a problem we indeed need to fix, but I would use a more general approach in the future PR

@yariks5s yariks5s self-requested a review October 17, 2025 16:32
@raimannma raimannma force-pushed the fix/datetime_undefined_behaviour branch 2 times, most recently from ed41067 to d664aab Compare October 21, 2025 08:22
@raimannma raimannma force-pushed the fix/datetime_undefined_behaviour branch from d664aab to 4cbd6ef Compare October 21, 2025 08:32
@yariks5s
Copy link
Member

03578_parallel_replicas_minicrawl is flaky

@yariks5s yariks5s added this pull request to the merge queue Oct 21, 2025
Merged via the queue into ClickHouse:master with commit 26c50f8 Oct 21, 2025
121 of 123 checks passed
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 21, 2025
@raimannma raimannma deleted the fix/datetime_undefined_behaviour branch October 21, 2025 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants