Add execution order controls #424

jwhite242 · 2023-07-18T17:27:12Z

Add machinery for controlling execution order/priority of study steps

Adds weights to the graph to enable selecting between depth first and breadth first (current production mode) execution order.
Adds new execution block for exposing controls of orders and future hooks for using various step metadata for controlling step weights
Changes internal machinery to use a PriorityQueue to store the ready steps

bgunnar5 · 2023-07-18T20:30:32Z

docs/Maestro/specification.md

+
+|  **Key**      |  **Required**  | **Type** | **Description** |
+|    :-         |      :-:       |    :-:   |       :-        |
+|  `step_order` |      No        |   str    | Type of scheduler managing execution.  One of: {`depth-first`, `breadth-first`}. Default: `depth-first`. |


Should the default here be breadth-first like you say in the sentence above?

yeah, that's definitely a typo. should be fixed now

bgunnar5

I like this addition a lot. Two of my comments here are just suggestions for cleaning up code/planning for the future but if you want to ignore them I'm not going to be offended at all.

bgunnar5 · 2023-07-18T20:45:09Z

maestrowf/datastructures/core/study.py

+        if self.weight < other.weight:
+            return True
+
+        return False


You could get fancy here (and for the __gt__ method below) and make this a one-liner: return self.weight < other.weight

To add onto this, I think you're only required to implement __lt__ and __eq__ since you can build the other comparison methods from them. But I'm only 85% sure from memory 😅

I'll throw another wrench into this and point out this implementation is probably not what we want despite it working for the current feature. I.e. step weights can't be the only basis of comparison. The __eq__ gets used for doing step equality, which means we can't use __lt__ to construct that if we stick with this __lt__ that only cares about the step weights. Had kind of punted on what other things to care about in these during the initial implementation, but should definitely be addressed now

maybe we leave this as is and then set the weights when we determine priorities?

EDIT: this comment is in reply to the previous discussion started here: #424 (comment)

bgunnar5 · 2023-07-18T20:47:30Z

maestrowf/datastructures/core/study.py

+        for weight, (parent, step_name, step) in enumerate(self.walk_study()):
+            pprint(step)
+            pprint("Updating weight")
+            if isinstance(step, StudyStep):


Is there ever a case here where the step is not a StudyStep? Would it just be None?

bgunnar5 · 2023-07-18T21:13:40Z

maestrowf/datastructures/execution/execution.py

+            self.step_prioritization_factory.register_priority_expr(
+                'step_order',
+                step_weight_priority_df
+            )


I'm not entirely certain on your future plans for this section of code, but if you're planning on adding more additions to this if/else chain of the same form then maybe it would be better in the long run if you did something like so:

priority_expr_mapper = { "breadth-first": { "name": "step-order", "func": step_weight_priority_bf }, "depth-first": { "name": "step-order", "func": step_weight_priority_df } } self.step_prioritization_factory.register_priority_expr( priority_expr_mapper[self.step_order]["name"], priority_expr_mapper[self.step_order]["func"] )

Then any changes to the register_priority_expr() method would only have to be modified in one function call rather than multiple.

Another option here is to have some protocoled classes that implement the __call__ method -- then you can just point to the class you want it to use and have them all be callable.

Yeah, both good ideas, and I think the end game is some mix of the two to get all the work out of the constructor. Haven't quite fleshed out the design for the implementation, but idea with this is to eventually enable users to build expressions from things like step resource keys (procs, walltime, combinations of them), parameters, maybe even step names and layer them on to this priority. Priorities then being the sequential eval of all of those expressions (step order being the only one that's always there) and exploiting the sorting mechanism of tuples to give a lot of fine grained control over the orders.

Of course, if there are other ideas on things we might want to use to control this ordering, let me know!

jsemler · 2023-07-20T21:10:06Z

tests/specification/test_specs/prioritized_lulesh_sample1_unix.yml

@@ -0,0 +1,93 @@
+description:


It would be helpful to have an example in the samples directory as well.

FrankD412 · 2023-07-22T18:47:01Z

maestrowf/datastructures/core/executiongraph.py

@@ -906,12 +920,32 @@ def execute_ready_steps(self):
            # Now, we need to take the min of the length of the queue and the
            # computed number of slots. We could have free slots, but have less
            # in the queue.
-            _available = min(_available, len(self.ready_steps))
+            # _available = min(_available, len(self.ready_steps))  # Kind of inconsistent if it's affected by queue size sometimes


I'm not 100% sure what you mean by inconsistent here. This prevents the need for the conditional below (at least that's what it's intent was when it was originally coded. It means not checking for the -1 value explicitly also.

Problem with this is PriorityQueue's are like standard Queue's and do not implement a length like the dequeue that used to live here. The size is also only approximate (unsure if this is really only the case in multithreading cases or not)

Also this whole block can now be removed as we don't need this check -> the if queue.empty() test on the while loop accomplishes the same behavior and exits when there are no more steps are available

FrankD412 · 2023-07-22T18:50:08Z

maestrowf/datastructures/core/executiongraph.py

+        new_steps_running = 0
+        LOGGER.debug("Available slots: %d, Ready steps is empty %s",
+                     _available, self.ready_steps.empty() == True)
+        while not self.ready_steps.empty() and (new_steps_running < _available or _available < 0):


This changes the original intent of the throttle that was added. The --throttle option meant that Maestro would not exceed that number of jobs in total. This form of throttle says that we are allowed to launch that number of new steps instead of being a global limit to my understanding. Am I incorrect there?

Also, this conditional is a little convoluted -- since available could be -1 it adds the extra conditional.

yeah, this part needs another data structure to really work right again: queue's size method is only approximate, so isn't reliable for this like the previous implementation was. will need a running steps tracker alongside this to make this throttling less convoluted

Edit: new_steps_running just needs to not be reset every run through i think and then we'll have the intended behavior

so, this actually does work as intended, no need to do my edit. The controls come from _available getting set using throttle and the number of running steps a little above this loop. So if in_progress is already = throttle this conditional will not let any new jobs through.

just tested it on one of the hello world specs and verified it would properly limit to 1 job at a time (throttle setting) with the slurm adapter

Will add a comment to explain the variable.

FrankD412 · 2023-07-22T18:52:32Z

maestrowf/datastructures/core/study.py

+        if self.weight < other.weight:
+            return True
+
+        return False


To add onto this, I think you're only required to implement __lt__ and __eq__ since you can build the other comparison methods from them. But I'm only 85% sure from memory 😅

FrankD412 · 2023-07-22T18:53:31Z

maestrowf/datastructures/core/study.py

-    def __init__(self, name, description,
-                 studyenv=None, parameters=None, steps=None, out_path="./"):
+    def __init__(self, name, description, studyenv=None,
+                 parameters=None, steps=None, out_path="./"):


Since you're passing the study environment here, the output path might be redundant. Could be an opportunity to clean up some parameters here.

FrankD412 · 2023-07-22T18:54:48Z

maestrowf/datastructures/dag.py

@@ -127,6 +127,35 @@ def remove_edge(self, src, dest):
        logging.debug("Removing edge (%s, %s).", src, dest)
        self.adjacency_table[src].remove(dest)

+    def __iter__(self):
+        """
+        Iterator over the graph


This docstring could be clearer. It also looks like this is a DFS search. It might be worth mentioning that in the docstring or making it configurable somehow.

FrankD412 · 2023-07-22T18:56:40Z

maestrowf/datastructures/execution/execution.py

+            self.step_prioritization_factory.register_priority_expr(
+                'step_order',
+                step_weight_priority_df
+            )


Another option here is to have some protocoled classes that implement the __call__ method -- then you can just point to the class you want it to use and have them all be callable.

FrankD412 · 2023-11-13T22:13:27Z

@jwhite242 -- Is this a dead end at this point? We're assessing Maestro and it's looking like DFS would be good on our end too. Wondering just in case we need to revisit.

jwhite242 · 2024-02-17T01:34:32Z

@jwhite242 -- Is this a dead end at this point? We're assessing Maestro and it's looking like DFS would be good on our end too. Wondering just in case we need to revisit.

No, i just got derailed by other things for a bit. Am ramping back up on this now. Having the more general expression based priorities are going to be pretty helpful -> major use case here being getting big/long running variants of steps running sooner, allowing smaller ones to be churned through within the throttle limit alongside it for improved throughput.

Thinking more on the protocol question.. I'm on the fence on whether we shouldn't just use abstract base classes and tie info to these things; i.e. per step overrides of expressions. But will play with both and see how they feel

order controls

Fix validator to handle errors at block roots

for users

priority expressions

jwhite242 · 2024-03-14T03:46:27Z

@FrankD412, @bgunnar5, @jsemler
Think this is finally ready for another pass/real review. An interesting question left (beyond any implementation issues/comments) is what to do about the spec. I refactored it to be a list so it's more clearly ordered for users, but maybe it'd make sense to contian this list in a subkey (priority_expressions or something) instead of at the root of the execution block? Don't have any other things in mind for this block yet, but thinking the key would be more future proof in case we do think of something. (i know docs are slightly out of sync, pending this subkey/not question)

jwhite242 · 2024-03-14T04:43:54Z

actually, just had another thought that might fit nicer, expanding it and making the value more of a 'oneOf' type, so either value or expression, making it more clear that there's two types and avoiding having to do greedy parsing on things to figure it out on our end

execution:
  priority:
    - name:
      description: # optional, but encouraged... can make built-ins dump the code's internal description in the reserialized spec
      value: # use this for built-ins with string keys to select (e.g. current 'step-order')
    - name:
      description:
      expression: # use this for the eventual string based expression compilation

warning to highlight interaction with throttle setting. Swap to single hued blue color scheme

jwhite242 · 2024-03-28T02:36:23Z

actually, just had another thought that might fit nicer, expanding it and making the value more of a 'oneOf' type, so either value or expression, making it more clear that there's two types and avoiding having to do greedy parsing on things to figure it out on our end
execution:
  priority:
    - name:
      description: # optional, but encouraged... can make built-ins dump the code's internal description in the reserialized spec
      value: # use this for built-ins with string keys to select (e.g. current 'step-order')
    - name:
      description:
      expression: # use this for the eventual string based expression compilation

Continued tweaking/iteration with a mind toward this being amenable to a mix of built-in/user things (think plugins for reusable expressions checkable via the dependencies machinery)

execution:
  priority:
    - prioritizer_id:  step_order  # built-in dag traversal order method 
      args:
         - step_order: 'depth-first'
         - ...
      - prioritizer_id: expression  # the built in expression prioritizer
        expression: step.procs*step.walltime ....

Think of the prioritizer_id (or similar name) as akin to the key used to id script adapters, so we can use this to tag plugin installed things in a way that makes error messaging helpful since it's a standardized place to register these functions. So for the sharing, maestro can tell the recipient that they're missing this plugin somebody was using. Still like the idea of descriptions too, though not sure about making them mandatory given the built-in ones can have that set internally and just serialized to the spec in the workspace

bgunnar5 · 2024-04-09T16:07:07Z

samples/how_to_guide/batched_parameters/batched_parameters_demo_2.yaml

+    - name: extract-data
+      description: |
+        Extract data and upload to external source, then clean up workspace
+      run:
+          cmd: |
+            # Placeholder for some database upload
+            python $(UPLOADER) -i $(echo-params.workspace)
+
+            upload_success=$?   # Capture retcode of uploader for use later
+            # Clean up workspace to save space
+            if [[ upload_success == 0 ]] then
+               echo 'Data upload successful'
+               echo 'Cleaning up data files'
+               rm $(echo-params.workspace)/datafile.txt
+            else
+               echo 'Data upload failed'
+               exit upload_success


Should this step depend on echo-params?

bgunnar5 · 2024-04-09T16:28:24Z

docs/Maestro/specification.md

+    clear.
+
+
+These execution orders can be shown with the sample hello-bye world specification as shown in the tutorials:


Link to Tutorials page here would be helpful

bgunnar5 · 2024-04-09T16:30:59Z

docs/Maestro/specification.md

+          run:
+              cmd: |
+                echo "$(FAREWELL), $(NAME)!" > $(BYE_FORMAT)
+              depends: [say-hello]


If this depends statement is changed to depends: [say_hello_*] can depth-first execution still take place? If so, should this be changed?

bgunnar5 · 2024-04-09T16:37:51Z

docs/Maestro/specification.md

+
+=== "Breadth-first order, throttle=1"
+
+    Breadth first, order of excution marked by colors with lighter colors executing first:


typo here, should be: "order of execution". This same typo is in the DFO tab below as well

bgunnar5 · 2024-04-09T16:41:20Z

maestrowf/abstracts/execution.py

+    """
+    Defines api for Priority Expressions for study steps.
+    """
+    def __call__(self, study_step):


For now the type hint could be study_step: "StudyStep" that way there's at least something until the circular import is resolved. Up to you though

bgunnar5 · 2024-04-10T19:00:01Z

maestrowf/datastructures/core/study.py

+        if self.weight < other.weight:
+            return True
+
+        return False


maybe we leave this as is and then set the weights when we determine priorities?

EDIT: this comment is in reply to the previous discussion started here: #424 (comment)

bgunnar5 · 2024-04-10T20:48:27Z

maestrowf/maestro.py

@@ -317,12 +317,17 @@ def run_study(args):
        batch = spec.batch
        if "type" not in batch:
            batch["type"] = "local"
+
+    # Check the execution block early, then store it
+    exec_block = spec.execution


if no execution block is given, is there a default that needs to be set? Something like:

if "graph-order" not in exec_block: exec_block["graph-order"] = "breadth-first"

bgunnar5 · 2024-04-10T21:21:50Z

tests/study/test_study.py

+            assert step.weight == expected_step_weights[step_name]
+
+
+# NOTE: can we refactor to use a shared study fixture instead of redoing it frequently?


Maybe you can have a single hello_bye.yaml script and then write fixtures to update certain blocks of the script? In the fixture you can yield control over to whatever test/other fixture needs this then after the yield you can do cleanup to reset the yaml to what it was prior to running whatever test/fixture changed it.

I'm doing something similar with Merlin's CONFIG object on a branch where I'm working on incorporating pytest/coverage into Merlin's test suite: https://github.com/bgunnar5/merlin/blob/feature/coverage/tests/conftest.py#L230. This is definitely subject to change as it's still a WIP but just showcasing an example of what I'm thinking.

bgunnar5 · 2024-04-10T22:45:44Z

maestrowf/datastructures/execution/execution.py

+        self.step_prioritizer = StepPrioritizer()
+
+        for expr in self.exec_list:
+            # note: likely be nicer to make these actual objects during spec parsing..


+1 to this note

bgunnar5 · 2024-04-10T23:06:49Z

tests/study/test_study.py

+        pprint(f"Parameterized step name: {parameterized_step_name}, step: {parameterized_step}")
+        if not parameterized_step:  # catch source, which is always none
+            continue
+        # NOTE: really can't get the base step name anymore after staging?


I ran into this same problem when developing Merlin's status commands so if we could implement an easy way to access this post-staging that would be amazing

jwhite242 requested a review from jsemler July 18, 2023 17:27

jwhite242 self-assigned this Jul 18, 2023

jwhite242 added enhancement Feature Specification labels Jul 18, 2023

bgunnar5 reviewed Jul 18, 2023

View reviewed changes

jsemler reviewed Jul 20, 2023

View reviewed changes

FrankD412 reviewed Jul 22, 2023

View reviewed changes

jwhite242 added 3 commits February 16, 2024 16:06

Initial version of execution order controls

4b0e301

Document new block

e8be21f

Add sample code block, fix default

1513ef1

jwhite242 force-pushed the feature/jwhite242/execorder branch from a1f3a35 to 1513ef1 Compare February 17, 2024 00:06

Fix step path fixture that got munched in the rebase

11e09dc

jwhite242 added 7 commits March 5, 2024 16:13

Add initial tests for step weights and step prioritizers

4a6e500

Test out using Protocol's for the priority expressions

b8dd9a7

Clarify traversal mode used in the iterator

80d0ac5

Clarify docstrings

3c80dd1

Refactor naming as prioritizer didn't end up being a factory

96774ba

Missed instance of factory name change

5620133

Fix up usage of protocols with pre 3.8 python installs

bf010f8

jwhite242 marked this pull request as ready for review March 6, 2024 17:47

jwhite242 added 6 commits March 6, 2024 09:54

Remove some dead/debugging code

517e5d7

Strip out some dead code, fix up log statement

a73c345

Update batched parameter example to highlight usecase for execution

525d5bb

order controls

Add dag's to try and highlight the different execution modes better

ea59e5d

Add execution order legend

79c92a3

First test of refactored spec to change execution block into a list.

2b14dae

Fix validator to handle errors at block roots

jwhite242 added 2 commits March 13, 2024 20:37

Hide validation stack traces, and more direclty print out input error

20aa543

for users

Refactor step priority machinery to prep for easier addition of future

e07618e

priority expressions

Use tabs to compress the execution order examples. Update note to

a703008

warning to highlight interaction with throttle setting. Swap to single hued blue color scheme

bgunnar5 reviewed Apr 10, 2024

View reviewed changes

		clear.


		These execution orders can be shown with the sample hello-bye world specification as shown in the tutorials:


		=== "Breadth-first order, throttle=1"

		Breadth first, order of excution marked by colors with lighter colors executing first:

		assert step.weight == expected_step_weights[step_name]


		# NOTE: can we refactor to use a shared study fixture instead of redoing it frequently?

Add execution order controls #424

Are you sure you want to change the base?

Add execution order controls #424

Conversation

jwhite242 commented Jul 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgunnar5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgunnar5 Apr 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwhite242 Jul 24, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FrankD412 commented Nov 13, 2023

jwhite242 commented Feb 17, 2024

jwhite242 commented Mar 14, 2024

jwhite242 commented Mar 14, 2024

jwhite242 commented Mar 28, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgunnar5 Apr 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgunnar5 Apr 10, 2024 •

edited

jwhite242 Jul 24, 2023 •

edited

jwhite242 commented Mar 28, 2024 •

edited

bgunnar5 Apr 10, 2024 •

edited