Allow smart "merging" of schema for multi-file schema inference #55428

seandavi · 2023-10-09T19:05:54Z

Use case

When performing schema inference on multiple files, there is an opportunity to be more flexible by having clickhouse "union_by_name" the schemata to allow these schema variants to load as a unified table. This would be similar to read_parquet('*.parquet', union_by_name=True) to allow schema inference to simply add new columns when they are not present in all files?

The same logic could apply to CSV, TSV, jsonlines, etc., but my use case is for Parquet, so feel free to scope as you see fit.

seandavi added the feature label Oct 9, 2023

Avogar self-assigned this Oct 9, 2023

Avogar mentioned this issue Oct 10, 2023

Schema inference over multiple files #55463

Closed

Avogar mentioned this issue Oct 20, 2023

Add 'union' mode for schema inference #55892

Merged

1 task

Avogar closed this as completed in #55892 Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow smart "merging" of schema for multi-file schema inference #55428

Allow smart "merging" of schema for multi-file schema inference #55428

seandavi commented Oct 9, 2023

Allow smart "merging" of schema for multi-file schema inference #55428

Allow smart "merging" of schema for multi-file schema inference #55428

Comments

seandavi commented Oct 9, 2023