New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV unquoted nulls and default values #6055
Conversation
Should we rename |
|
||
auto check_for_null = [&istr, &settings, &null_prefix_len] | ||
{ | ||
if (checkStringByFirstCharacterAndAssertTheRest("\\N", istr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it Ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
We can use |
Let's split. |
if (null_prefix_len < buf.count()) | ||
istr.position() = buf.position(); | ||
else if (null_prefix_len > buf.count()) | ||
throw DB::Exception("Some characters were extracted from buffer, but nested parser did not read them", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen if there is something like NU
in a place of numeric data type?
(and buffer is split in the middle of NU
)
What will be the error message for the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can (also it can happen if format_csv_delimiter
= U
or L
). I've changed the error message to print some diagnostic info.
…nquoted_null_literal_as_null
Will be fixed after merge with master. |
@@ -211,6 +211,11 @@ Possible values: | |||
|
|||
Default value: 0. | |||
|
|||
## input_format_null_as_default {#settings-input_format_null_as_default} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be input_format_csv_null_as_default
?
Suppose we want to implement similar logic for TSV
. But in TSV it's much more dangerous, because there is no way to unambiguate from NULL and a String with 'NULL' value. But still usable (e.g. user has only numeric fields in a table). And this gives the motivation to allow the user to enable this logic separately for CSV and TSV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting is about replacing nulls with default values for non-nullable data types (#2633 and #6033), not about parsing NULL
as \N
(input_format_null_as_default
and input_format_csv_unquoted_null_literal_as_null
are independent). Should we add a separate setting *_null_as_default
for each format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It's fine :)
@@ -24,5 +24,17 @@ echo '"2016-01-01 01:02:03","1" | |||
1502792101,"3" | |||
99999,"4"' | $CLICKHOUSE_CLIENT --query="INSERT INTO csv FORMAT CSV"; | |||
|
|||
echo '\N, \N' | $CLICKHOUSE_CLIENT --input_format_null_as_default=1 --query="INSERT INTO csv FORMAT CSV"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to make a separate test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we should have a test case with complex defaults.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
For changelog. Remove if this is non-significant change.
Category (leave one):
Short description (up to few sentences):
Fixes #5990
Detailed description (optional):
For CSV input format:
NULL
literal as\N
(if settingformat_csv_unquoted_null_literal_as_null=1
)input_format_null_as_default=1
)