move Worker management to separate CPI operation #377

eirrgang · 2023-07-26T22:58:46Z

Let RuntimeManager mediate Raptor resource acquisition.

RPExecutor merely represents the allocated resources, and does not
directly implement the resource management. This allows RPExecutor to
be responsible for providing the concurrent.futures.Executor interface,
which isn't particularly friendly to the asyncio protocols. For regular
scalems usage, RPExecutor should be used in a non-main thread via the
scalems asyncio utilities.

Acquire the Raptor master task through the RuntimeManager.
Acquire the Worker(s) through CPI call to the RuntimeManager.

Isolate RPExecutor concurrent.futures.Executor support from
asyncio support (avoid blocking the event loop by avoiding event loop
usage in the main implementation)

We also need this so that we can restore and normalize the "stop"
command to shut down everything cleanly and expeditiously. In too many
cases right now, tests take an improperly long time because of various
timeouts.

Supports #335

Reimplement `scalems.radical.executor.manage_raptor()` as `scalems.radical.manager.manage_raptor()` - [X] Acquire the Raptor master task through the RuntimeManager. - [ ] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [ ] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Normalize the "stop" command to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate RPExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) Ref SCALE-MS#345, SCALE-MS#377

* Introduce some containers for CPI Command management. * Reimplement a `cpi` method on the RuntimeManager. * Shut down queue runner threads on `close()` * Translate CPI command messages into appropriate messages to the ScalemsRaptor and fulfil Futures with `CommandItem.run()` - [X] Acquire the Raptor master task through the RuntimeManager. - [X] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [ ] Make sure CPI Session is properly shut down. - [ ] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Normalize the "stop" command to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate RPExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) Ref SCALE-MS#345, SCALE-MS#377

Add and rearrange some program state management. Add some notes and describe incomplete state management. Ref SCALE-MS#378, SCALE-MS#383. - [X] Acquire the Raptor master task through the RuntimeManager. - [X] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [X] Make sure CPI Session is properly shut down. (Partially deferred) - [ ] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Normalize the "stop" command to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate RPExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) Ref SCALE-MS#345, SCALE-MS#377

Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker submission and to leave the scope of those Workers. Note that Raptor does not actually provide an API for stopping Workers from the Raptor master, so the CPI semantics do not map precisely to the runtime logic. - [X] Acquire the Raptor master task through the RuntimeManager. - [X] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [X] Make sure CPI Session is properly shut down. (Partially deferred) - [X] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Normalize the CPI usage to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate CPIExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) - [ ] Separate `messages` for intercomponent communication from `cpi` calls and responses. Ref SCALE-MS#345, SCALE-MS#377

Acquire the Raptor master task through the RuntimeManager. Reimplement `scalems.radical.executor.manage_raptor()` as `scalems.radical.manager.manage_raptor()` Ref #345, #377

Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls). * Introduce some containers for CPI Command management. * Reimplement a `cpi` method on the RuntimeManager. * Shut down queue runner threads on `close()` * Translate CPI command messages into appropriate messages to the ScalemsRaptor and fulfil Futures with `CommandItem.run()` Ref #345, #377

Add and rearrange some program state management. Add some notes and describe incomplete state management. Deferred: - Make sure CPI Session is properly shut down. (Partially deferred) - Acquire the Worker(s) through CPI call to the RuntimeManager. - Normalize the "stop" command to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - Isolate RPExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) Ref #345, #377

* RP TaskDescription *metadata* field changes from a `dict` to a `tuple[str, None|dict]`` of operation name and operand. * `scalems.messages.Command` class hierarchy is replaced by simple `TypedDict`s and utility functions in new `scalems.cpi` module. * `CpiCommand._command_name()` replaces `CpiCommand.command_class()` required classmethod to support `scalems.radical.CpiCommand` subclass registration (for creation function dispatching). * Update signature for `scalems.radical.manager.RuntimeManager.cpi()`. Ref #377

Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker submission and to leave the scope of those Workers. Clarify the difference between EXIT_SCOPE and STOP. Note that Raptor does not actually provide an API for stopping Workers from the Raptor master, so the CPI semantics do not map precisely to the runtime logic. - [X] Acquire the Raptor master task through the RuntimeManager. - [X] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [X] Make sure CPI Session is properly shut down. (Partially deferred) - [X] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Normalize the CPI usage to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate CPIExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) - [ ] Separate `messages` for intercomponent communication from `cpi` calls and responses. Ref #345, #377

* RP TaskDescription *metadata* field changes from a `dict` to a `tuple[str, None|dict]`` of operation name and operand. * `scalems.messages.Command` class hierarchy is replaced by simple `TypedDict`s and utility functions in new `scalems.cpi` module. * `CpiCommand._command_name()` replaces `CpiCommand.command_class()` required classmethod to support `scalems.radical.CpiCommand` subclass registration (for creation function dispatching). * Update signature for `scalems.radical.manager.RuntimeManager.cpi()`. Ref SCALE-MS#377

Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker submission and to leave the scope of those Workers. Clarify the difference between EXIT_SCOPE and STOP. Note that Raptor does not actually provide an API for stopping Workers from the Raptor master, so the CPI semantics do not map precisely to the runtime logic. - [X] Acquire the Raptor master task through the RuntimeManager. - [X] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [X] Make sure CPI Session is properly shut down. (Partially deferred) - [X] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Implement EXIT_SCOPE in the ScalemsMaster. - [ ] Normalize the CPI usage to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate CPIExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) - [ ] Separate `messages` for intercomponent communication from `cpi` calls and responses. Ref SCALE-MS#345, SCALE-MS#377

FIXME: Getting Aborted while queue runner is waiting for next item Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker submission and to leave the scope of those Workers. Clarify the difference between EXIT_SCOPE and STOP. Note that Raptor does not actually provide an API for stopping Workers from the Raptor master, so the CPI semantics do not map precisely to the runtime logic. - [X] Acquire the Raptor master task through the RuntimeManager. - [X] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [X] Make sure CPI Session is properly shut down. (Partially deferred) - [X] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Implement EXIT_SCOPE in the ScalemsMaster. - [ ] Normalize the CPI usage to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate CPIExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) - [ ] Separate `messages` for intercomponent communication from `cpi` calls and responses. Ref SCALE-MS#345, SCALE-MS#377

Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker submission and to leave the scope of those Workers. Clarify the difference between EXIT_SCOPE and STOP. Note that Raptor does not actually provide an API for stopping Workers from the Raptor master, so the CPI semantics do not map precisely to the runtime logic. - [X] Acquire the Raptor master task through the RuntimeManager. - [X] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [X] Make sure CPI Session is properly shut down. (Partially deferred) - [X] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Implement EXIT_SCOPE in the ScalemsMaster. - [ ] Normalize the CPI usage to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate CPIExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) - [ ] Separate `messages` for intercomponent communication from `cpi` calls and responses. Ref SCALE-MS#345, SCALE-MS#377

FIXME: Getting Aborted while queue runner is waiting for next item Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker submission and to leave the scope of those Workers. Clarify the difference between EXIT_SCOPE and STOP. Note that Raptor does not actually provide an API for stopping Workers from the Raptor master, so the CPI semantics do not map precisely to the runtime logic. - [X] Acquire the Raptor master task through the RuntimeManager. - [X] Manage a CPI command queue translating CPI calls to Raptor-backed Futures (RPTasks or RPC calls) - [X] Make sure CPI Session is properly shut down. (Partially deferred) - [X] Acquire the Worker(s) through CPI call to the RuntimeManager. - [ ] Implement EXIT_SCOPE in the ScalemsMaster. - [ ] Normalize the CPI usage to shut down everything cleanly and expeditiously. (In too many cases right now, tests take an improperly long time because of various timeouts.) - [ ] Isolate CPIExecutor concurrent.futures.Executor support from asyncio support (avoid blocking the event loop by avoiding event loop usage in the main implementation) - [ ] Separate `messages` for intercomponent communication from `cpi` calls and responses. Ref SCALE-MS#345, SCALE-MS#377

This was referenced Jul 26, 2023

Add exception handling for Raptor Master #375

Open

Add more state to the ScalemsMaster #378

Open

eirrgang mentioned this issue Aug 10, 2023

Miscellaneous tidying #384

Merged

eirrgang mentioned this issue Aug 10, 2023

Update raptor management #385

Merged

eirrgang mentioned this issue Aug 11, 2023

Simplify the CPI messaging. #387

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move Worker management to separate CPI operation #377

move Worker management to separate CPI operation #377

eirrgang commented Jul 26, 2023 •

edited

move Worker management to separate CPI operation #377

move Worker management to separate CPI operation #377

Comments

eirrgang commented Jul 26, 2023 • edited

eirrgang commented Jul 26, 2023 •

edited