Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Add scheduler to constrain memory of exec plans #30592

Open
2 tasks
asfimport opened this issue Dec 13, 2021 · 0 comments
Open
2 tasks

[C++] Add scheduler to constrain memory of exec plans #30592

asfimport opened this issue Dec 13, 2021 · 0 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Dec 13, 2021

This is a high-level JIRA that will probably consist of a number of subtasks. Currently the exec plan supports backpressure. This helps to limit the amount of data read in by a single query. Spillover can also help reduce the amount of memory used by a single query.

However, if multiple queries are run concurrently, then the system will eventually run out of memory. I'd like to propose a simple scheduler to start to solve this problem.

  • Initially, the scheduler will be given a configurable max memory target. We will assume the user is responsible for ensuring this memory is not used elsewhere on the system. For example, a user may configure 12GB of RAM for the scheduler and the user is responsible for ensuring that 12GB of RAM is not consumed by other processes. In future JIRA issues we can add more sophistication such as detecting and adapting to the system memory levels.

  • The scheduler will track memory usage by querying for the RSS usage of the process. An alternative approach would be to use a dedicated memory pool. This is more flexible (allows for multiple schedulers / query engines in a single process) but wouldn't capture the non-pool allocations (although these should be small).

  • The ExecPlan will be modified to submit its tasks to the scheduler instead of directly to the executor (the scheduler will submit tasks to the executor). Tasks will be prioritized. If a task is higher priority then lower priority tasks will be pushed to disk. Prioritization will be based initially on a tasks position in the ExecPlan. Tasks closer to the sink will be higher priority.

  • The scheduler is separate from spillover. Although it may interact with spillover mechanisms to ask for more spillover as memory pressure increases.

Reporter: Weston Pace / @westonpace

Subtasks:

Related issues:

Note: This issue was originally created as ARROW-15079. Please see the migration documentation for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant