New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-add fix for accurateCastOrNull()
#54629
Conversation
This is an automated comment for commit 4cf17dc with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Let's wait till CI results come back and merge then.
@aiven-sal Failure of |
Yes, I saw that. I'm going to take a look asap. |
@@ -49,3 +49,10 @@ SELECT accurateCastOrNull('1xxx', 'Date'); | |||
SELECT accurateCastOrNull('2023-05-30', 'Date'); | |||
SELECT accurateCastOrNull('2180-01-01', 'Date'); | |||
SELECT accurateCastOrNull(19, 'Date'); | |||
|
|||
SELECT 'Tests for issue #38585'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test cases as in 01556_accurate_cast_or_null.sql are also tested in
- 01601_accurate_cast.sql
- 02026_accurate_cast_or_default
It would be nice if you could update these other files as well to keep things in-sync. Please also add some positive test cases (= conversion of valid strings/integers to bool and valid IP4/6 strings to IP4/IP6)
@@ -49,3 +49,10 @@ SELECT accurateCastOrNull('1xxx', 'Date'); | |||
SELECT accurateCastOrNull('2023-05-30', 'Date'); | |||
SELECT accurateCastOrNull('2180-01-01', 'Date'); | |||
SELECT accurateCastOrNull(19, 'Date'); | |||
|
|||
SELECT 'Tests for issue #38585'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(l. 53 is technically not needed if we expand the test cases for positive and negative tests, see my other comment)
{ | ||
serialization_from.deserializeWholeText(column_to, read_buffer, format_settings); | ||
} | ||
catch (const Exception & e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConvertImplGenericFromString
(l. 1644) is a generic handler, putting special handling only for bool types (l. 1691ff.) into it is not beautiful. I wonder how parsing errors of other data types are handled, e.g. select accurateCastOrNull('invalid', 'Date');
? (didn't check deeper where the logic is). A better alternative would be to handle errors in bool parsing equivalently to these types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK the bool serialization is the only one that is problematic, other serializations usually don't throw errors after they updated the result. Trying to fix it directly (62d06a0) is what caused the previous issues with 02811_csv_input_field_type_mismatch
. I can take a deeper look to see if there is a better way to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my previous Date
example, which code updates the null bitmap on the result column then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous message was slightly incorrect, I kind of forgot what I did for the initial commit.
So, the problematic types are Bool and IPv{4,6} (the original PR fixed all of them) because those are the only types (afaik) that end up in
ClickHouse/src/Functions/FunctionsConversion.h
Line 4189 in d5ae444
auto make_custom_serialization_wrapper = [&, cast_ipv4_ipv6_default_on_conversion_error_value, input_format_ipv4_default_on_conversion_error_value, input_format_ipv6_default_on_conversion_error_value](const auto & types) -> bool |
Which doesn't correctly handle the "OrNull" case.
The fix for the IP types was easy and I was able to implement it in that lambda itself, because the various convertToIP* functions already had a template argument to return Null instead of throwing, so it was just a matter of invoking them with the right arguments.
For Bool it's a bit trickier because ConvertImplGenericFromString
doesn't support that (and that is why I modified it).
Other types instead end up in
ClickHouse/src/Functions/FunctionsConversion.h
Lines 3047 to 3050 in d5ae444
if (requested_result_is_nullable && checkAndGetDataType<DataTypeString>(from_type.get())) | |
{ | |
/// In case when converting to Nullable type, we apply different parsing rule, | |
/// that will not throw an exception but return NULL in case of malformed input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then.
Let's address the other comments (about tests) and then we can merge.
d5ae444
to
2f12f0c
Compare
AST fuzzer unfortunately complains for queries:
|
7dd8a71
to
4cf17dc
Compare
accurateCastOrNull()
Fixes #38585
This is a re-submit of #54136 which was reverted with #54472.
accurateCastOrNull
must not throw when casting an arbitrary string to Bool. This PR also fixes an unrelated issue that causedaccurateCastOrNull
to behave likeaccurateCastOrDefault
on error when casting to IPv4 or IPv6.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
SQL function
accurateCastOrNull()
now 1. no longer throws an exception but instead returnsNULL
when casting an arbitrary string without "boolean equivalent" to bool, e.g.SELECT accurateCastOrDefault('test', 'Bool')
, 2. when casting an invalid string to an IP now returnsNULL
instead of an address with all zeros, e.g.SELECT accurateCastOrDefault('192.0.2.1x', 'IPv4')
, and, 3. with settingcast_keep_nullable = 1
, casting a nullable empty string to an IP now producesNULL
which is consistent with the behavior for other types.