Skip to content

Shrink residual Vec in split_vec_min_alloc to fix capacity-based memory accounting on long split_off chains #22548

@RyanJamesStewart

Description

@RyanJamesStewart

Summary

Follow-up to #22416. In split_vec_min_alloc, the residual vector keeps its original capacity across a long split_off chain. Under DataFusion's capacity-based memory accounting, that residual is charged for the full backing allocation until it drops, even after most of the data has been emitted.

Background

split_vec_min_alloc lives in datafusion_common::utils and is used by the multi_group_by group-value builders (bytes.rs, primitive.rs) to carve fixed-size chunks off a growing vector.

In the split_off branch, the emitted prefix keeps the original capacity, and the residual likewise keeps its original capacity (unchanged by split_off).

Concrete pathological case (raised by ariel-miculas in #22416 (comment)): a vector of one million elements is drained 1024 at a time. Each emitted slice has capacity 1024 except the final one, which inherits capacity one million via split_off + mem::replace. Symmetrically, on every iteration before the final one the residual still owns the full backing buffer, so accounting charges for it for the entire chain.

Why this was not fixed in #22416

#22416 considered calling shrink_to_fit on the emitted prefix in the split_off branch and backed it out. A caller in datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes.rs pushes onto the emitted prefix immediately after the call (first_n_offsets.push(...)); shrinking the prefix forces a realloc on the next push. The test emitted_prefix_does_not_realloc_on_push in datafusion/common/src/utils/mod.rs pins that constraint.

The push-after-emit constraint applies to the prefix, not to the residual. The residual is the side that lingers; the prefix is the side that gets pushed onto.

Proposed shapes

Two candidates, in order of decreasing utility-side change:

  1. Have split_vec_min_alloc shrink the residual when its length is much smaller than its capacity (some ratio threshold). One realloc near the end of a long chain replaces carrying the full allocation indefinitely. Adds a heuristic to the shared utility.

  2. Push the shrink to the caller's finalize path. Callers driving long split chains (the multi_group_by builders) call shrink_to_fit (or shrink_to) on the residual once they know no more pushes are coming. Keeps the utility allocation-policy-free; each caller picks based on its own access pattern.

Whichever shape lands should come with a benchmark on the 1M to 1024 case plus a smaller fixture to keep CI cheap, so the residual shrink can be shown not to regress the common short-chain path.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions