Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow-specific hold/release #5698

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Flow-specific hold/release #5698

wants to merge 18 commits into from

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Aug 17, 2023

Close #4277

Hold/release tasks in a specific flow (or any flow, by default)

  • Implement cylc hold --flow=n
  • Same for cylc release --flow=n

If --flow is not used, matching tasks will be held or released regardless of flow number.

  • (most of the time there will only be one flow)

If --flow=n is specified, matching tasks will be held or released if their flow numbers include n

  • (if a merged task foo has flow numbers {2,3} it belongs to flow 2 (and 3), so if I ask to hold foo in flow 2 (or 3) it should be held. Ditto for release).

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at document flow-specific hold and release cylc-doc#646
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch. (not a bug fix)

@hjoliver hjoliver added this to the cylc-8.3.0 milestone Aug 17, 2023
@hjoliver hjoliver self-assigned this Aug 17, 2023
cylc/flow/scheduler.py Outdated Show resolved Hide resolved
@hjoliver
Copy link
Member Author

hjoliver commented Aug 17, 2023

Test case:

[scheduling]
    [[graph]]
        R1 = "a => b => c => d"
[runtime]
    [[a,b,c,d]]
        pre-script = "sleep 2"
    [[a]]
        script = """
            if (( CYLC_TASK_SUBMIT_NUMBER == 1)); then
                cylc hold --flow=2 ${CYLC_WORKFLOW_ID}//1/b
            fi
        """
    [[c]]
        script = """
            if (( CYLC_TASK_SUBMIT_NUMBER == 1)); then
                cylc trigger --flow=new ${CYLC_WORKFLOW_ID}//1/a
            fi
        """
  • at start-up, a puts a future hold on b in flow 2
  • flow 1 runs b and c, which starts flow 2 at a
  • flow 2 gets held at b

@hjoliver
Copy link
Member Author

hjoliver commented Sep 1, 2023

One thing to note, on "hold after cycle point" functionality:

This branch makes it flow-specific, in that you can start a new flow and tell just that flow to hold after a given point:

cylc trigger --flow=new ...
cylc hold --after=4 --flow=2

However, there is still only one global hold-after point, whether it is for all tasks or for a single flow, so canceling the hold point doesn't need to be flow-specific:

cylc release --all ...

In future we should probably allow multiple flow-specific hold points, but I don't think that's a high priority. This branch is still a significant enhancment as-is. I expect most demand will be for single-flow re-runs that we don't want to carry on indefinitely (so stop or hold at a future point).

@hjoliver hjoliver marked this pull request as ready for review September 1, 2023 23:06
@hjoliver
Copy link
Member Author

hjoliver commented Sep 1, 2023

This bad boy is ready for review.

@hjoliver
Copy link
Member Author

hjoliver commented Sep 7, 2023

Test coverage good without any functional tests, although that makes me a bit nervous for this sort of thing. Any opinions on that?

@dwsutherland
Copy link
Member

dwsutherland commented Sep 7, 2023

With the example, does the 2nd flow get merged into the first flow?
image

It ends, after release;

(flow) sutherlander@cortex:cylc-flow$ cylc release --flow=2 fhold/run1//1/b
Done

(confirmed working)

(flow) sutherlander@cortex:cylc-flow$ cylc release --flow=1 fhold/run1//1/b
Done
(flow) sutherlander@cortex:cylc-flow$ cylc release --flow=3 fhold/run1//1/b
Done

(confirmed not working 👍 )
with flow 2 coming out the other end:

{
  "data": {
    "workflows": [
      {
        "id": "~sutherlander/fhold/run1",
        "taskProxies": [
          {
            "id": "~sutherlander/fhold/run1//1/c",
            "state": "succeeded",
            "flowNums": "[1]",
            "isHeld": false
          },
          {
            "id": "~sutherlander/fhold/run1//1/d",
            "state": "running",
            "flowNums": "[2]",
            "isHeld": false
          }
        ],
        "familyProxies": [
          {
            "id": "~sutherlander/fhold/run1//1/root",
            "state": "running"
          }
        ]
      }
    ]
  }
}

Kind of beyond the scope of this PR, but I guess I'll need to look into whether the flow numbers are correct in the data-store (perhaps no-merge is default but I would have expected flow number "[1, 2]" coming out of it otherwise).

Looks good.

@hjoliver
Copy link
Member Author

hjoliver commented Sep 7, 2023

With the example, does the 2nd flow get merged into the first flow?

No, merging only occurs if the same task from different flows meet in n=0. Here, flow=1 finishes before flow-2 can catch it up:

$ cylc log exa --no-timestamp | grep "=> succeeded" | sed -e 's/^.*-//g'
 [1/a running job:01 flows:1] => succeeded
 [1/b running job:01 flows:1] => succeeded
 [1/c running job:01 flows:1] => succeeded
 [1/a running job:02 flows:2] => succeeded
 [1/d running job:01 flows:1] => succeeded  # <----- flow 1 finished
 [1/b running job:02 flows:2] => succeeded
 [1/c running job:02 flows:2] => succeeded
 [1/d running job:02 flows:2] => succeeded

@hjoliver hjoliver mentioned this pull request Oct 7, 2023
Comment on lines +123 to +126
parser.add_option(
"--flow",
help="Hold tasks that belong to a specific flow.",
metavar="INT", action="store", dest="flow_num")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably --flow=all is the default?

Comment on lines +1738 to +1740
flow_num = Int(
description='Number of flow to hold.'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default be flow=all?

The pattern used in other mutations is a field called flow of type Flow which accepts either an integer or a string. See FlowMutationArguments in this file.

In this case new and none don't make sense, so it might need a new type.

@@ -30,6 +31,15 @@
FLOW_NONE = "none"


def validate_flow_opt(val):
Copy link
Member

@wxtim wxtim Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this semi duplicate the logic in cylc trigger? Surely all use-cases of --flow should behave roughly the same, and probably be centralized in option_parser.py. Centralizing them here is reasonable too, but on balance this is more about options than flows?

You might want to give it a kwarg where you pass it a list of acceptable strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, there are some differences, e.g. cylc trigger --flow=new makes sense, but cylc hold --flow=new doesn't.

@@ -30,6 +31,15 @@
FLOW_NONE = "none"


def validate_flow_opt(val):
"""Validate command line --flow opions."""
if val is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the parser allow this value to be None?

) -> None:
"""Flag that we should hold a future task."""
self.hold[(name, point)] = flow_num
self._update_stores((name, point, True))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this means that you can only flag a future task for one flow at a time:

> cylc hold flows-hold//10100101T0000Z/a --flow 1
Done
> sqlite3 ~/cylc-run/flows-hold/runN/log/db 'SELECT * FROM tasks_to_hold'
a|10100101T0000Z|1
> cylc hold flows-hold//10100101T0000Z/a --flow 2
Done
> sqlite3 ~/cylc-run/flows-hold/runN/log/db 'SELECT * FROM tasks_to_hold'
a|10100101T0000Z|2

I would expect to get either

a|10100101T0000Z|1,2
# or
a|10100101T0000Z|1
a|10100101T0000Z|2

I also think that this means that:

> cylc hold flows-hold//10100101T0000Z/a --flow 1 --flow 2
Done
> sqlite3 ~/cylc-run/flows-hold/runN/log/db 'SELECT * FROM tasks_to_hold'
a|10100101T0000Z|2

Is not what I as a user would expect: I would expect this to either return an error ("don't use --flow twice please") or set that task as held in both flows as above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at cylc trigger --help:

  --flow=FLOW           Assign the triggered task to all active flows (all);
                        no flow (none); a new flow (new); or a specific flow
                        (e.g. 2). The default is all. Reuse the option to
                        assign multiple specific flows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case it's only the Reuse the option that I have a problem with.

Copy link
Member

@wxtim wxtim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have had a good look at this. I think it mostly works as advertised, but I have a couple of questions.

@oliver-sanders oliver-sanders modified the milestones: cylc-8.3.0, cylc-8.4.0 Feb 19, 2024
@hjoliver
Copy link
Member Author

[Update]: I think this is basically done, I just have to come back to respond to some review comments. It's possible I can do that in time for 8.3.0, let's see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task hold/release needs to be flow-specific
4 participants