Respect input_format_allow_errors_num/input_format_allow_errors_ratio during schema inference

I made a deliberately bad CSV file:

```
$ cat foo.csv
name,favColour
Mark,Blue
David
Giles,Red
```

I try to process it:

```
SELECT *
FROM file('foo.csv')

Query id: 42729c66-9642-4c79-8c77-a68228aa64a4


Elapsed: 0.031 sec.

Received exception:
Code: 636. DB::Exception: The table structure cannot be extracted from a CSV format file. Error:
Code: 117. DB::Exception: Rows have different amount of values. (INCORRECT_DATA) (version 24.3.1.469 (official build)).
You can specify the structure manually: (in file/uri /Users/m
```

Makes sense, the structure is bad. So I set input_format_allow_errors_num which I thought would skip the bad row and I told it to use the CSVWithNames format too. But it still throws the error?

```
SELECT *
FROM file('foo.csv', CSVWithNames)
SETTINGS input_format_allow_errors_num = 5

Query id: 167b8033-9fa4-496f-b3e5-dd05cd3f8e04


Elapsed: 0.001 sec.

Received exception:
Code: 636. DB::Exception: The table structure cannot be extracted from a CSVWithNames format file. Error:
Code: 117. DB::Exception: Rows have different amount of values. (INCORRECT_DATA) (version 24.3.1.469 (official build)).
You can specify the structure manually: (in file/uri /Users/markhneedham/projects/videos/20240305-WindowFunctions/foo.csv). (CANNOT_EXTRACT_TABLE_STRUCTURE)
```

We can work around that by setting `input_format_max_rows_to_read_for_schema_inference=1` which will have it use only 1 row to infer the schema, but it would be simpler to use if `input_format_allow_errors_num` and `input_format_allow_errors_ratio` were used during schema inference

cc @Avogar 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect input_format_allow_errors_num/input_format_allow_errors_ratio during schema inference #61095

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Respect input_format_allow_errors_num/input_format_allow_errors_ratio during schema inference #61095

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions