Skip to content

slt+accounting: surface GroupValuesRows::emit untracked decode (closes #22739)#22741

Open
avantgardnerio wants to merge 2 commits into
apache:mainfrom
coralogix:brent/group-values-rows-emit-leak
Open

slt+accounting: surface GroupValuesRows::emit untracked decode (closes #22739)#22741
avantgardnerio wants to merge 2 commits into
apache:mainfrom
coralogix:brent/group-values-rows-emit-leak

Conversation

@avantgardnerio
Copy link
Copy Markdown
Contributor

@avantgardnerio avantgardnerio commented Jun 3, 2026

Which issue does this PR close?

Closes #22739. Follow-up in the family of #22721 / #22723 — same framework (#22626), different operator.

Rationale for this change

Described in issue.

What changes are included in this PR?

Two changes:

  1. HEADROOM_FACTOR: f64 = 8.05.0 in datafusion/sqllogictest/src/accounting_pool.rs. Tighter framework slack so untracked allocations surface as test failures sooner. Same shape as Lower SLT HEADROOM_FACTOR 8.0 -> 5.0 to surface nested_loop_join_spill leak #22721.

  2. New SLT group_by_spill_row_decode.slt that exercises the row-encoded GroupValues path. Uses a wide Utf8 key paired with a small List<Int> "schema poisoner" so the schema falls outside multi_group_by::supported_type and routes through GroupValuesRows (single-column Utf8 alone would route to GroupValuesBytes). At pool=1M with HEADROOM_FACTOR=5.0 the test fails with allocator overdraft: account balance at panic = -1344326 bytes, stack frames pointing at arrow_row::variable::decode_binary_view_innerdecode_string_viewdecode_columnRowConverter::convert_rowsGroupValuesRows::emitGroupedHashAggregateStream::emitspill.

Are these changes tested?

By the SLT, yes.

Are there any user-facing changes?

Less OOMs

`GroupedHashAggregateStream`'s spill path emits via
`GroupValuesRows::emit` → `RowConverter::convert_rows` → `decode_column`,
which allocates per-column buffers (`arrow_row::list::decode` for
List keys, `decode_binary`/`decode_string` for Utf8 keys) without
`MemoryReservation::try_grow`.

Surfaced by apache#22626.
Tighten HEADROOM_FACTOR 8.0 -> 5.0 and update the SLT key shape
(Utf8 + List<Int> schema poisoner) so the test routes through
GroupValuesRows on upstream main. The framework then catches
GroupedHashAggregateStream::emit -> GroupValuesRows::emit ->
RowConverter::convert_rows -> decode_column allocating a
MutableBuffer::with_capacity without MemoryReservation::try_grow.

Overdraft observed: ~1.3 MB. Same operator and emit path that
caused a 79-pod OOM cascade at one DataFusion-based log analytics
deployment on 2026-05-20.
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GroupValuesRows::emit untracked decode buffer leaks past MemoryReservation

1 participant