Describe the enhancement requested
In this PR, a basename_template_functor was added to the C++ dataset writer. With that, it's possible define arbitrary filenames, for example, have filenames with 0-padding, as even the documentation of the feature itself describes.
Writing datasets in Python however only exposes basename_template in the write_dataset method. This means, as far as I know, it's fundamentally impossible to write a dataset with 0-padding. This is a problem, since writing files without padding and reading them in does not preserve the order of rows, even though this could be trivially achievable.
For that, if the parameter basename_template even could be f-string-ish in the sense that users could define a custom 0-padding with 'part-{i:03d}.parquet' for example. Alternatively, if users could set any arbitrary Callable[[int], str] here, that would even be better.
Component(s)
Python
Describe the enhancement requested
In this PR, a
basename_template_functorwas added to the C++ dataset writer. With that, it's possible define arbitrary filenames, for example, have filenames with 0-padding, as even the documentation of the feature itself describes.Writing datasets in Python however only exposes
basename_templatein thewrite_datasetmethod. This means, as far as I know, it's fundamentally impossible to write a dataset with 0-padding. This is a problem, since writing files without padding and reading them in does not preserve the order of rows, even though this could be trivially achievable.For that, if the parameter
basename_templateeven could be f-string-ish in the sense that users could define a custom 0-padding with'part-{i:03d}.parquet'for example. Alternatively, if users could set any arbitraryCallable[[int], str]here, that would even be better.Component(s)
Python