Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Create a scheduler for asynchronous work #32624

Closed
asfimport opened this issue Aug 9, 2022 · 1 comment
Closed

[C++] Create a scheduler for asynchronous work #32624

asfimport opened this issue Aug 9, 2022 · 1 comment

Comments

@asfimport
Copy link
Collaborator

Note, in the interest of keeping things simple, this ideally replaces the AsyncTaskGroup. This is needed to simplify the logic in ARROW-17287.

The format and implementation will likely be inspired by the synchronous schedulers, TaskScheduler and TaskGroup but it will remain a separate implementation. In the future, when we dedicate time to improving our synchronous scheduler, we can decide if it makes sense to merge these two types.


/// A utility which keeps tracks of, and schedules, asynchronous tasks
///
/// An asynchronous task has a synchronous component and an asynchronous component.
/// The synchronous component typically schedules some kind of work on an external
/// resource (e.g. the I/O thread pool or some kind of kernel-based asynchronous
/// resource like io_uring).  The asynchronous part represents the work
/// done on that external resource.  Executing the synchronous part will be referred
/// to as "submitting the task" since this usually includes submitting the asynchronous
/// portion to the external thread pool.
///
/// By default the scheduler will submit the task (execute the synchronous part) as
/// soon as it is added, assuming the underlying thread pool hasn't terminated or the
/// scheduler hasn't aborted.  In this mode the scheduler is simply acting as
/// a task group, keeping track of the ongoing work.
///
/// This can be used to provide structured concurrency for asynchronous development.
/// A task group created at a high level can be distributed amongst low level components
/// which register work to be completed.  The high level job can then wait for all work
/// to be completed before cleaning up.
///
/// A task scheduler must eventually be ended when all tasks have been added.  Once the
/// scheduler has been ended it is an error to add further tasks.  Note, it is not an
/// error to add additional tasks after a scheduler has aborted (though these tasks
/// will be ignored and never submitted).  The scheduler has a futuer which will complete
/// once the scheduler has been ended AND all remaining tasks have finished executing.
/// Ending a scheduler will NOT cause the scheduler to flush existing tasks.
///
/// Task failure (either the synchronous portion or the asynchronous portion) will cause
/// the scheduler to enter an aborted state.  The first such failure will be reported in
/// the final task future.
///
/// The scheduler can also be manually aborted.  A cancellation status will be reported as
/// the final task future.
///
/// It is also possible to limit the number of concurrent tasks the scheduler will
/// execute. This is done by setting a task limit.  The task limit initially assumes all
/// tasks are equal but a custom cost can be supplied when scheduling a task (e.g. based
/// on the total I/O cost of the task, or the expected RAM utilization of the task)
///
/// When the total number of running tasks is limited then scheduler priority may also
/// become a consideration.  By default the scheduler runs with a FIFO queue but a custom
/// task queue can be provided.  One could, for example, use a priority queue to control
/// the order in which tasks are executed.
///
/// It is common to have multiple stages of execution.  For example, when scanning, we
/// first inspect each fragment (the inspect stage) to figure out the row groups and then
/// we scan row groups (the scan stage) to read in the data.  This sort of multi-stage
/// execution should be represented as two seperate task groups.  The first task group can
/// then have a custom finish callback which ends the second task group.

Reporter: Weston Pace / @westonpace

PRs and other links:

Note: This issue was originally created as ARROW-17350. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Weston Pace / @westonpace:
Issue resolved by pull request 13912
#13912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant