[C++] Improve performance of parquet readahead

The 7.0.0 readahead for parquet would read up to 256 row groups at once which meant that, if the consumer were too slow, we would almost certainly run out of memory.

ARROW-15410 improved readahead as a whole and, in the process, changed parquet so it's always  reading 1 row group in advance.

This is not always ideal in S3 scenarios.  We may want to read many row groups in advance if the row groups are small.  To fix this we should continue reading in parallel until there are at least batch_size \* batch_readahead rows being fetched.

**Reporter**: [Weston Pace](https://issues.apache.org/jira/browse/ARROW-16294) / @westonpace
**Assignee**: [Weston Pace](https://issues.apache.org/jira/browse/ARROW-16294) / @westonpace
#### Related issues:
- [[C++][Dataset] Change scanner readahead limits to be based on bytes instead of number of batches](https://github.com/apache/arrow/issues/30191) (relates to)
#### PRs and other links:
- [GitHub Pull Request #12967](https://github.com/apache/arrow/pull/12967)

<sub>**Note**: *This issue was originally created as [ARROW-16294](https://issues.apache.org/jira/browse/ARROW-16294). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++] Improve performance of parquet readahead #31683

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++] Improve performance of parquet readahead #31683

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions