Skip to content

generate_series doesn't respect memory limit #12919

@samuelcolvin

Description

@samuelcolvin

Describe the bug

You can trivial cause datafusion to use any amount of memory by simply running

select generate_series(9876543210);

Memory management functionality, e.g. MemoryPool does't seem to have any effect

To Reproduce

Run datafusion-cli with a memory limit, then run generate_series:

 datafusion-cli -m 1g -c 'select generate_series(9876543210);'

Memory immediately jumps to ~20GB. (note this is not limited to datafusion-cli)

This query also hangs indefinitely, but in production we see posts being killed OOM for queries like this.

Expected behavior

generate_series should either be streamed so it uses very little memory, or should be killed/constrained by the memory pool.

Additional context

Same presumably applies to the range UDF.

cc @davidhewitt

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions