Skip to content

Consider deferred copying in string_agg GroupsAccumulator #21156

@neilconway

Description

@neilconway

Is your feature request related to a problem or challenge?

We currently copy every input string in update_batch for StringAggGroupsAccumulator.

We could instead just bump the Arc refcount on the input batch and keep <group_id, batch_id, row_id> triples. Then assemble the actual results in evaluate() (this is similar to #20504 for array_agg). This would be quite a bit more complicated than the current approach, but it could be worth it to reduce the amount of data being copied. It will require some bookkeeping to ensure that the right state is reclaimed after a partial emit.

Note that the current string_agg benchmark uses 3 byte strings, so it would underestimate the impact of this optimization.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions