Parsing error if you query less columns than source has and some settings are enabled #52591

den-crane · 2023-07-25T19:36:31Z

https://fiddle.clickhouse.com/c43bf6f9-91f8-4209-b2d2-2d59247625e2

create table test(d Date,a String) Engine=Memory
as select  today(), 'a' from numbers(3);

insert into function file('test.csv', 'CSVWithNames') select * from test;

SELECT * FROM file('test.csv', 'CSVWithNames', 'd Date, a String')
SETTINGS
  input_format_skip_unknown_fields = 0,
  input_format_with_names_use_header = 0
Format Pretty;  
+------------+---+
|          d | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+

-- query only one column of 2
SELECT d FROM file('test.csv', 'CSVWithNames', 'd Date, a String')
SETTINGS
  input_format_skip_unknown_fields = 0,
  input_format_with_names_use_header = 0;

Received exception from server (version 23.6.2):
Code: 117. DB::Exception: Received from localhost:9000. DB::Exception: Expected end of line: (at row 1)
: 
Row 1:
Column 0,   name: d, type: Date, parsed text: "<DOUBLE QUOTE>2023-07-25<DOUBLE QUOTE>"
ERROR: There is no line feed. "<DOUBLE QUOTE>" found instead.
 It's like your file has more columns than expected.
And if your file has the right number of columns, maybe it has an unquoted string value with a comma.

https://fiddle.clickhouse.com/97f8549e-e107-4a37-be37-ed1adc9dd69e

create table test(d Date,a String) Engine=Memory
as select  today(), 'a' from numbers(3);

insert into function file('test.csv', 'CSVWithNames') select * from test;

SELECT * FROM file('test.csv', 'CSVWithNames', 'd Date, a String')
SETTINGS
  input_format_skip_unknown_fields = 0,
  input_format_with_names_use_header = 1
Format Pretty;  
+------------+---+
|          d | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+

SELECT d FROM file('test.csv', 'CSVWithNames', 'd Date, a String')
SETTINGS
  input_format_skip_unknown_fields = 0,
  input_format_with_names_use_header = 1;
Received exception from server (version 23.6.2):
Code: 117. DB::Exception: Received from localhost:9000. DB::Exception: Unknown field found in format header: 'a' at position 1
Set the 'input_format_skip_unknown_fields' parameter explicitly to ignore and proceed: While executing ParallelParsingBlockInputFormat: While executing File. (INCORRECT_DATA)

https://fiddle.clickhouse.com/446198bb-54a1-47cc-a900-c957059c409b

set allow_experimental_analyzer=1;

create table test(d Date,a String) Engine=Memory
as select  today(), 'a' from numbers(3);

insert into function file('test.csv', 'CSVWithNames') select * from test;

SELECT * FROM file('test.csv', 'CSVWithNames', 'd Date, a String')
SETTINGS
  input_format_with_names_use_header = 0
Format Pretty;  
+------------+---+
|          d | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+
| 2023-07-25 | a |
+------------+---+


select d from file('test.csv', 'CSVWithNames', 'd Date, a String')
SETTINGS
  input_format_with_names_use_header = 0;

Received exception from server (version 23.6.2):
Code: 117. DB::Exception: Received from localhost:9000. DB::Exception: Expected end of line: (at row 1)
: 
Row 1:
Column 0,   name: d, type: Date, parsed text: "<DOUBLE QUOTE>2023-07-25<DOUBLE QUOTE>"
ERROR: There is no line feed. "<DOUBLE QUOTE>" found instead.
 It's like your file has more columns than expected.
And if your file has the right number of columns, maybe it has an unquoted string value with a comma.

: While executing ParallelParsingBlockInputFormat: While executing File. (INCORRECT_DATA)

den-crane · 2023-07-25T21:07:28Z

broken in 22.6

22.5 works https://fiddle.clickhouse.com/f95c40b0-092e-4785-8811-41ff30f2ee1e

den-crane · 2023-09-07T23:31:40Z

@Avogar

Avogar · 2023-09-08T10:46:14Z

The problem is in setting input_format_with_names_use_header, we don't check if it's disabled during check if format CSVWithNames supports reading subset of columns. I will fix it

praniti6111 · 2023-09-08T12:59:04Z

@Avogar We are planning to upgrade to v23.3.7.5 and are currently facing the above issue. Are you going to provide the fix in the same version ? What are the tentative timelines for this release ?

Avogar · 2023-09-08T15:57:42Z

The fix will be in the upstream, but It will be backported to 23.3 and will be included in the patch release for 23.3 after the fix is created and merged. So, I cannot say the timeline because tere is no fix yet, I will start working on the fix next week

praniti6111 · 2023-09-11T05:37:25Z

Thank you for the update @Avogar.

Avogar · 2023-09-11T14:53:30Z

Investigated it a bit more, the problem with input_format_with_names_use_header=0 and parsing errors will be fixed, so this will work:

:) SELECT d FROM file('test.csv', 'CSVWithNames', 'd Date, a String')
SETTINGS
  input_format_skip_unknown_fields = 0,
  input_format_with_names_use_header = 0;

┌──────────d─┐
│ 2023-09-11 │
│ 2023-09-11 │
│ 2023-09-11 │
└────────────┘

:) select d from file('test.csv', 'CSVWithNames', 'd Date, a String') SETTINGS
  input_format_with_names_use_header = 0, input_format_skip_unknown_fields=1;

SELECT d
FROM file('test.csv', 'CSVWithNames', 'd Date, a String')
SETTINGS
    input_format_with_names_use_header = 0,
    input_format_skip_unknown_fields = 1

Query id: caa7b66f-e370-4f4f-bb5c-345a3a4bbbb6

┌──────────d─┐
│ 2023-09-11 │
│ 2023-09-11 │
│ 2023-09-11 │
└────────────┘

But parsing error in case when input_format_with_names_use_header = 1, input_format_skip_unknown_fields=0 won't be fixed, because it will require a lot of changes in the code and in the logic. And it's not worth it, you can just enable input_format_skip_unknown_fields, I don't see real usecases when it will be disabled when input_format_with_names_use_header=1

den-crane added unexpected behaviour v22.8-affected v23.3-affected labels Jul 25, 2023

den-crane changed the title ~~Parsing error if you queries less columns than source has and some settings are enabled~~ Parsing error if you query less columns than source has and some settings are enabled Jul 25, 2023

den-crane added the backward compatibility label Jul 25, 2023

den-crane added the v23.8-affected label Sep 7, 2023

Avogar self-assigned this Sep 8, 2023

Avogar mentioned this issue Sep 11, 2023

Fix possible parsing error in WithNames formats with disabled input_format_with_names_use_header #54513

Merged

1 task

robot-clickhouse closed this as completed in #54513 Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing error if you query less columns than source has and some settings are enabled #52591

Parsing error if you query less columns than source has and some settings are enabled #52591

den-crane commented Jul 25, 2023 •

edited

den-crane commented Jul 25, 2023

den-crane commented Sep 7, 2023

Avogar commented Sep 8, 2023

praniti6111 commented Sep 8, 2023

Avogar commented Sep 8, 2023

praniti6111 commented Sep 11, 2023

Avogar commented Sep 11, 2023

Parsing error if you query less columns than source has and some settings are enabled #52591

Parsing error if you query less columns than source has and some settings are enabled #52591

Comments

den-crane commented Jul 25, 2023 • edited

den-crane commented Jul 25, 2023

den-crane commented Sep 7, 2023

Avogar commented Sep 8, 2023

praniti6111 commented Sep 8, 2023

Avogar commented Sep 8, 2023

praniti6111 commented Sep 11, 2023

Avogar commented Sep 11, 2023

den-crane commented Jul 25, 2023 •

edited