Describe the bug
NestedLoopJoinExec's memory-limited (spill) path allocates more memory than it reserves through the MemoryPool. The accounting-pool framework added in #22626 surfaces this if HEADROOM_FACTOR in the SLT pool is tightened — see #22721, which lowers it from 8.0 → 5.0 and trips this test.
To Reproduce
In datafusion/sqllogictest/src/accounting_pool.rs, set:
```rust
const HEADROOM_FACTOR: f64 = 5.0;
```
Then run nested_loop_join_spill.slt. First query at line 33:
```sql
SET datafusion.execution.target_partitions = 1;
SET datafusion.runtime.memory_limit = '150K';
SELECT count(*) as cnt, min(v1) as mn, max(v1) as mx
FROM generate_series(1, 100000) AS t1(v1)
INNER JOIN generate_series(1, 1) AS t2(v2)
ON (t1.v1 + t2.v2) > 0;
```
Expected behavior
Query succeeds with allocator usage bounded by the configured memory_limit (×~10% slop for legitimate untracked overhead like Tokio/Rayon thread state).
Actual behavior
```
External error: 1 errors in file datafusion/sqllogictest/test_files/nested_loop_join_spill.slt
- query failed: Other Error: allocator overdraft: account balance at panic = -20245 bytes
at nested_loop_join_spill.slt:33
```
Peak allocator usage reaches ~770KB against a declared 150KB pool — 5.13× the budget. The first ~750KB (5×) is absorbed by the headroom factor; the additional 20245 bytes is allocator usage that was never reserved through MemoryPool::try_grow.
Likely root cause
Spill path in NestedLoopJoinExec has at least these untracked allocation sites (need verification with targeted instrumentation):
generate_next_batch buffering — RecordBatches accumulated for the build side before the spill decision
concat_batches at the spill boundary — copies into a coalesced batch before the IPC writer sees it
take_native during the probe phase — gather kernels allocate output buffers
- IPC reader path on spill re-read — the
StreamReader/FileReader decoder owns buffers that aren't accounted
The first query in the test stays small (100K × i32 = ~400KB build side), so the overshoot is whatever lives off-pool in the spill setup, not the bulk data itself.
Additional context
Component
Describe the bug
NestedLoopJoinExec's memory-limited (spill) path allocates more memory than it reserves through theMemoryPool. The accounting-pool framework added in #22626 surfaces this ifHEADROOM_FACTORin the SLT pool is tightened — see #22721, which lowers it from 8.0 → 5.0 and trips this test.To Reproduce
In
datafusion/sqllogictest/src/accounting_pool.rs, set:```rust
const HEADROOM_FACTOR: f64 = 5.0;
```
Then run
nested_loop_join_spill.slt. First query at line 33:```sql
SET datafusion.execution.target_partitions = 1;
SET datafusion.runtime.memory_limit = '150K';
SELECT count(*) as cnt, min(v1) as mn, max(v1) as mx
FROM generate_series(1, 100000) AS t1(v1)
INNER JOIN generate_series(1, 1) AS t2(v2)
ON (t1.v1 + t2.v2) > 0;
```
Expected behavior
Query succeeds with allocator usage bounded by the configured
memory_limit(×~10% slop for legitimate untracked overhead like Tokio/Rayon thread state).Actual behavior
```
External error: 1 errors in file datafusion/sqllogictest/test_files/nested_loop_join_spill.slt
at nested_loop_join_spill.slt:33
```
Peak allocator usage reaches ~770KB against a declared 150KB pool — 5.13× the budget. The first ~750KB (5×) is absorbed by the headroom factor; the additional 20245 bytes is allocator usage that was never reserved through
MemoryPool::try_grow.Likely root cause
Spill path in
NestedLoopJoinExechas at least these untracked allocation sites (need verification with targeted instrumentation):generate_next_batchbuffering —RecordBatches accumulated for the build side before the spill decisionconcat_batchesat the spill boundary — copies into a coalesced batch before the IPC writer sees ittake_nativeduring the probe phase — gather kernels allocate output buffersStreamReader/FileReaderdecoder owns buffers that aren't accountedThe first query in the test stays small (100K × i32 = ~400KB build side), so the overshoot is whatever lives off-pool in the spill setup, not the bulk data itself.
Additional context
HEADROOM_FACTOR; expected to fail until this is fixed).MemoryReservationbefore each of the sites above, not bump the headroom constant — the constant is a leak detector, not a budget.Component