Skip to content

[Python] Expose setting basename_template_functor in Python or make basename_template padding-compatible #45851

@jonasdedden

Description

@jonasdedden

Describe the enhancement requested

In this PR, a basename_template_functor was added to the C++ dataset writer. With that, it's possible define arbitrary filenames, for example, have filenames with 0-padding, as even the documentation of the feature itself describes.

Writing datasets in Python however only exposes basename_template in the write_dataset method. This means, as far as I know, it's fundamentally impossible to write a dataset with 0-padding. This is a problem, since writing files without padding and reading them in does not preserve the order of rows, even though this could be trivially achievable.

For that, if the parameter basename_template even could be f-string-ish in the sense that users could define a custom 0-padding with 'part-{i:03d}.parquet' for example. Alternatively, if users could set any arbitrary Callable[[int], str] here, that would even be better.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions