Skip to content

Enable external operator reclaim / spill hooks for external memory managers #21422

@nathanb9

Description

@nathanb9

Is your feature request related to a problem or challenge?

issue: apache/datafusion-comet#3873

Currently, datafusion comet cannot trigger datafusion native operators to spill in response to memory pressure from Spark’s task memory manager. When Spark task memory manager is under pressure, it may ask one consumer to spill so another consumer in the same task can make progress. Comet can route that request into native code, but there is currently no DataFusion interface for asking the spill-capable operator to reclaim memory in response to that external request.

Describe the solution you'd like

Considering right now:

  • We can add an interface for spill-capable operators to expose reclaim/spill behavior.

Describe alternatives you've considered

  • Current solution is basically only do local spills in df native and spark can make its own spark consumers spill like usual.
  • Do not trigger spill here in df and instead have another signal we can send to df native.

Additional context

  • This may be generally useful for any spark accelerators where datafusion is embedded in a similar way to comet

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions