No obvious mechanism for partitioning groups of record batches

### Describe the enhancement requested

I've got a group of large fixed width text files that need to be treated as a single dataset in pyarrow without the onerous requirement to read them all into memory at once

Due to the lack of support for fixed width files in arrow or any kind of mechanism to provide a function to transform lines in open_csv, I created a RecordBatchReader for each file and attempted to pass the sequence of readers to  `pyarrow.dataset.dataset` as the docs state should be possible

However, due to #38012 this is currently broken. 

As such there doesn't appear to be any supported mechanism to lazily load partitions of record batches. 

As far as I can tell from the docs, the only possible workaround to deal with large data sources that don't fit the narrow set of supported file types would be to write an entire filesystem in fsspec and customize the returned TextIOBase so that as each file is read, the lines are converted to a standard CSV format that 'open_csv`  can deal with.

Needless to say, such a solution would be  extremely convoluted, error prone, and eliminate much of the utility of arrow/pyarrow.

There  really should be some straightforward mechanism for lazily loading record batches, otherwise you need yet another intermediary dataframe library just to be able to quickly operate on data stored in text files without overwhelming system memory.



### Component(s)

Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No obvious mechanism for partitioning groups of record batches #46432

Describe the enhancement requested

Component(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No obvious mechanism for partitioning groups of record batches #46432

Description

Describe the enhancement requested

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions