Skip to content

Does it possible to write all data files in one folder and do not create folder per partition? #2903

@GrigorievNick

Description

@GrigorievNick

Iceberg manages file location in metadata, so there is no reason to keep hive table file structure.
But iceberg still writes data in partition per folder.
In my case partitions are organized as ranges and my storage is s3.
One of the main issues, that sometimes I need to split ranges into two or coalesce them.
So because it's ranged, I actually need only split one-two files on the partition border.
But because S3 does not support rename, if the partition is part of the prefix, I will need to copy all data in the partition.

Iceberg is a great tool to manage files and looks like its architecture does not require a strict file folder hierarchy.
So I wonder do there is a way to say iceberg always writes all files to the same folder?

/tmp/iceberg_cdc_test/iceberg_catalog/hdl/enrichment_table/data/idRange=0-50/ts_day=2021-07-30/00000-8-8b701a28-8a19-4b57-a84e-f2ff5b12bbb6-00001.orc 

Example of files written by an iceberg in the partition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions