-
Notifications
You must be signed in to change notification settings - Fork 2k
Dynamic BufferExec sizing: row limit + memory cap for sort pushdown #21440
Description
Is your feature request related to a problem or challenge?
#21426 introduced a configurable fixed-size BufferExec capacity (default 1GB) for sort pushdown. While this is better than the SortExec it replaces (which buffers the entire partition), a fixed size is not optimal for all cases:
- Wide rows (many columns, large strings): 1GB might not be enough row groups
- Narrow rows (few small columns): 1GB buffers far more data than needed
As noted by @alamb in #21426 (comment):
I suspect a better solution than a fixed size buffer would be some calculation based on the actual size of the data (e.g. the number of rows to buffer). However, that is tricky to compute / constrain memory when large strings are involved. We probably would need to have both a row limit and a memory cap and pick the smaller of the two.
Describe the solution you'd like
Replace the fixed capacity with a dual-limit approach:
BufferExec stops buffering when EITHER limit is reached:
- Row limit: e.g., 100K rows (prevents over-buffering narrow rows)
- Memory cap: e.g., 1GB (prevents OOM for wide rows)
This adapts to different row widths automatically:
- Narrow rows (100 bytes/row): row limit triggers at ~10MB
- Wide rows (10KB/row): memory cap triggers at 1GB
Related issues:
- feat: make sort pushdown BufferExec capacity configurable, default 1GB #21426 — Make BufferExec capacity configurable (current fixed-size approach)
- Make
BUFFER_CAPACITY_AFTER_SORT_ELIMINATIONconfigurable #21417 — Original issue for configurable buffer - feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) #21182 — Sort pushdown phase 2 (introduced BufferExec)