Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move Worker management to separate CPI operation #377

Open
eirrgang opened this issue Jul 26, 2023 · 0 comments
Open

move Worker management to separate CPI operation #377

eirrgang opened this issue Jul 26, 2023 · 0 comments

Comments

@eirrgang
Copy link
Contributor

eirrgang commented Jul 26, 2023

Let RuntimeManager mediate Raptor resource acquisition.

RPExecutor merely represents the allocated resources, and does not
directly implement the resource management. This allows RPExecutor to
be responsible for providing the concurrent.futures.Executor interface,
which isn't particularly friendly to the asyncio protocols. For regular
scalems usage, RPExecutor should be used in a non-main thread via the
scalems asyncio utilities.

  1. Acquire the Raptor master task through the RuntimeManager.
  2. Acquire the Worker(s) through CPI call to the RuntimeManager.

Isolate RPExecutor concurrent.futures.Executor support from
asyncio support (avoid blocking the event loop by avoiding event loop
usage in the main implementation)

We also need this so that we can restore and normalize the "stop"
command to shut down everything cleanly and expeditiously. In too many
cases right now, tests take an improperly long time because of various
timeouts.

Supports #335

eirrgang added a commit to eirrgang/scale-ms that referenced this issue Aug 9, 2023
Reimplement `scalems.radical.executor.manage_raptor()`
as `scalems.radical.manager.manage_raptor()`

- [X] Acquire the Raptor master task through the RuntimeManager.
- [ ] Manage a CPI command queue translating CPI calls to Raptor-backed
  Futures (RPTasks or RPC calls)
- [ ] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Normalize the "stop" command to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate RPExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)

Ref SCALE-MS#345, SCALE-MS#377
eirrgang added a commit to eirrgang/scale-ms that referenced this issue Aug 9, 2023
* Introduce some containers for CPI Command management.
* Reimplement a `cpi` method on the RuntimeManager.
* Shut down queue runner threads on `close()`
* Translate CPI command messages into appropriate messages to the
  ScalemsRaptor and fulfil Futures with `CommandItem.run()`

- [X] Acquire the Raptor master task through the RuntimeManager.
- [X] Manage a CPI command queue translating CPI calls to
Raptor-backed
  Futures (RPTasks or RPC calls)
- [ ] Make sure CPI Session is properly shut down.
- [ ] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Normalize the "stop" command to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate RPExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)

Ref SCALE-MS#345, SCALE-MS#377
eirrgang added a commit to eirrgang/scale-ms that referenced this issue Aug 9, 2023
Add and rearrange some program state management. Add some notes and
describe incomplete state management.

Ref SCALE-MS#378, SCALE-MS#383.

- [X] Acquire the Raptor master task through the RuntimeManager.
- [X] Manage a CPI command queue translating CPI calls to
Raptor-backed
  Futures (RPTasks or RPC calls)
- [X] Make sure CPI Session is properly shut down. (Partially deferred)
- [ ] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Normalize the "stop" command to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate RPExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)

Ref SCALE-MS#345, SCALE-MS#377
eirrgang added a commit to eirrgang/scale-ms that referenced this issue Aug 9, 2023
Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker
submission and to leave the scope of those Workers.

Note that Raptor does not actually provide an API for stopping Workers
from the Raptor master, so the CPI semantics do not map precisely to the
runtime logic.

- [X] Acquire the Raptor master task through the RuntimeManager.
- [X] Manage a CPI command queue translating CPI calls to
Raptor-backed
  Futures (RPTasks or RPC calls)
- [X] Make sure CPI Session is properly shut down. (Partially deferred)
- [X] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Normalize the CPI usage to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate CPIExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)
- [ ] Separate `messages` for intercomponent communication from `cpi`
  calls and responses.

Ref SCALE-MS#345, SCALE-MS#377
eirrgang added a commit that referenced this issue Aug 10, 2023
Acquire the Raptor master task through the RuntimeManager.

Reimplement `scalems.radical.executor.manage_raptor()`
as `scalems.radical.manager.manage_raptor()`

Ref #345, #377
eirrgang added a commit that referenced this issue Aug 10, 2023
Manage a CPI command queue translating CPI calls to Raptor-backed
Futures (RPTasks or RPC calls).

* Introduce some containers for CPI Command management.
* Reimplement a `cpi` method on the RuntimeManager.
* Shut down queue runner threads on `close()`
* Translate CPI command messages into appropriate messages to the
  ScalemsRaptor and fulfil Futures with `CommandItem.run()`

Ref #345, #377
eirrgang added a commit that referenced this issue Aug 10, 2023
Add and rearrange some program state management. Add some notes and
describe incomplete state management.

Deferred:
- Make sure CPI Session is properly shut down. (Partially deferred)
- Acquire the Worker(s) through CPI call to the RuntimeManager.
- Normalize the "stop" command to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- Isolate RPExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)

Ref #345, #377
eirrgang added a commit that referenced this issue Aug 11, 2023
* RP TaskDescription *metadata* field changes from a `dict` to a
  `tuple[str, None|dict]`` of operation name and operand.
* `scalems.messages.Command` class hierarchy is replaced by simple
  `TypedDict`s and utility functions in new `scalems.cpi` module.
* `CpiCommand._command_name()` replaces
  `CpiCommand.command_class()` required classmethod to support
  `scalems.radical.CpiCommand` subclass registration
  (for creation function dispatching).
* Update signature for `scalems.radical.manager.RuntimeManager.cpi()`.

Ref #377
eirrgang added a commit that referenced this issue Aug 23, 2023
Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker
submission and to leave the scope of those Workers.

Clarify the difference between EXIT_SCOPE and STOP.

Note that Raptor does not actually provide an API for stopping Workers
from the Raptor master, so the CPI semantics do not map precisely to the
runtime logic.

- [X] Acquire the Raptor master task through the RuntimeManager.
- [X] Manage a CPI command queue translating CPI calls to
Raptor-backed
  Futures (RPTasks or RPC calls)
- [X] Make sure CPI Session is properly shut down. (Partially deferred)
- [X] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Normalize the CPI usage to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate CPIExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)
- [ ] Separate `messages` for intercomponent communication from `cpi`
  calls and responses.

Ref #345, #377
eirrgang added a commit to eirrgang/scale-ms that referenced this issue Oct 19, 2023
* RP TaskDescription *metadata* field changes from a `dict` to a
  `tuple[str, None|dict]`` of operation name and operand.
* `scalems.messages.Command` class hierarchy is replaced by simple
  `TypedDict`s and utility functions in new `scalems.cpi` module.
* `CpiCommand._command_name()` replaces
  `CpiCommand.command_class()` required classmethod to support
  `scalems.radical.CpiCommand` subclass registration
  (for creation function dispatching).
* Update signature for `scalems.radical.manager.RuntimeManager.cpi()`.

Ref SCALE-MS#377
eirrgang added a commit to eirrgang/scale-ms that referenced this issue Oct 19, 2023
Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker
submission and to leave the scope of those Workers.

Clarify the difference between EXIT_SCOPE and STOP.

Note that Raptor does not actually provide an API for stopping Workers
from the Raptor master, so the CPI semantics do not map precisely to the
runtime logic.

- [X] Acquire the Raptor master task through the RuntimeManager.
- [X] Manage a CPI command queue translating CPI calls to
Raptor-backed
  Futures (RPTasks or RPC calls)
- [X] Make sure CPI Session is properly shut down. (Partially deferred)
- [X] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Implement EXIT_SCOPE in the ScalemsMaster.
- [ ] Normalize the CPI usage to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate CPIExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)
- [ ] Separate `messages` for intercomponent communication from `cpi`
  calls and responses.

Ref SCALE-MS#345, SCALE-MS#377
eirrgang added a commit to eirrgang/scale-ms that referenced this issue Oct 19, 2023
FIXME: Getting Aborted while queue runner is waiting for next item

Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker
submission and to leave the scope of those Workers.

Clarify the difference between EXIT_SCOPE and STOP.

Note that Raptor does not actually provide an API for stopping Workers
from the Raptor master, so the CPI semantics do not map precisely to the
runtime logic.

- [X] Acquire the Raptor master task through the RuntimeManager.
- [X] Manage a CPI command queue translating CPI calls to
Raptor-backed
  Futures (RPTasks or RPC calls)
- [X] Make sure CPI Session is properly shut down. (Partially deferred)
- [X] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Implement EXIT_SCOPE in the ScalemsMaster.
- [ ] Normalize the CPI usage to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate CPIExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)
- [ ] Separate `messages` for intercomponent communication from `cpi`
  calls and responses.

Ref SCALE-MS#345, SCALE-MS#377
eirrgang added a commit to eirrgang/scale-ms that referenced this issue Oct 19, 2023
Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker
submission and to leave the scope of those Workers.

Clarify the difference between EXIT_SCOPE and STOP.

Note that Raptor does not actually provide an API for stopping Workers
from the Raptor master, so the CPI semantics do not map precisely to the
runtime logic.

- [X] Acquire the Raptor master task through the RuntimeManager.
- [X] Manage a CPI command queue translating CPI calls to
Raptor-backed
  Futures (RPTasks or RPC calls)
- [X] Make sure CPI Session is properly shut down. (Partially deferred)
- [X] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Implement EXIT_SCOPE in the ScalemsMaster.
- [ ] Normalize the CPI usage to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate CPIExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)
- [ ] Separate `messages` for intercomponent communication from `cpi`
  calls and responses.

Ref SCALE-MS#345, SCALE-MS#377
eirrgang added a commit to eirrgang/scale-ms that referenced this issue Oct 19, 2023
FIXME: Getting Aborted while queue runner is waiting for next item

Implement the START_SCOPE and EXIT_SCOPE calls to trigger Worker
submission and to leave the scope of those Workers.

Clarify the difference between EXIT_SCOPE and STOP.

Note that Raptor does not actually provide an API for stopping Workers
from the Raptor master, so the CPI semantics do not map precisely to the
runtime logic.

- [X] Acquire the Raptor master task through the RuntimeManager.
- [X] Manage a CPI command queue translating CPI calls to
Raptor-backed
  Futures (RPTasks or RPC calls)
- [X] Make sure CPI Session is properly shut down. (Partially deferred)
- [X] Acquire the Worker(s) through CPI call to the RuntimeManager.
- [ ] Implement EXIT_SCOPE in the ScalemsMaster.
- [ ] Normalize the CPI usage to shut down everything cleanly and
  expeditiously. (In too many cases right now, tests take an improperly
  long time because of various timeouts.)
- [ ] Isolate CPIExecutor concurrent.futures.Executor support from
  asyncio support (avoid blocking the event loop by avoiding event loop
  usage in the main implementation)
- [ ] Separate `messages` for intercomponent communication from `cpi`
  calls and responses.

Ref SCALE-MS#345, SCALE-MS#377
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

No branches or pull requests

1 participant