Skip to content

Even more improvements in memory utilization HashAggregationExec when spilling #8428

@milenkovicm

Description

@milenkovicm

Is your feature request related to a problem or challenge?

This task is related to #7858, to follow up with some issues we've found with HashAggregationExec.

Specific issue we found is that during spill HashAggregationExec will sort and spill aggregation buffer, while sorting it will allocate buffer as big as current aggregation state

https://github.com/apache/arrow-datafusion/blob/7acd8833cc5d03ba7643d4ae424553c7681ccce8/datafusion/physical-plan/src/aggregates/row_hash.rs#L672

This will make operator using (twice) more memory than already allocated by the memory manager.

We need to find a solution which would respect allocated memory limit

Some ides can be find in the #7858, more specific in comments:

but we are open for other ideas as well

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions