Bound ArrowWriter peak memory 

### Is your feature request related to a problem?
`ArrowWriter` must buffer every column's *compressed* pages for an entire row group before it can splice them into contiguous column chunks at flush, so peak memory is ≈ Σ(compressed bytes of all column chunks) and grows with row group size. On wide, skewed schemas (e.g. ~400 columns, some columns far larger than others) this can consume >=12 GB of memory just to write Parquet. The only existing lever is flushing smaller row groups via `in_progress_size`, which trades away compression and read-time page/row-group pruning.

### Describe the solution you'd like
Some way to buffer not in memory

### Describe alternatives you've considered
Reducing row group size to limit buffering, but this sacrifices encoding efficiency and read performance and doesn't address the underlying coupling where one column's size forces the page layout of others.

### Additional context
Related issues:
- #5828
- #5450
- #5484
- Fixed by PR #10020.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bound ArrowWriter peak memory #10071

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bound ArrowWriter peak memory #10071

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions