Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interventions: agree on the set-outputs/remove/forget discussions #172

Closed
oliver-sanders opened this issue Apr 19, 2023 · 3 comments
Closed
Assignees

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Apr 19, 2023

@oliver-sanders oliver-sanders self-assigned this May 15, 2023
@oliver-sanders
Copy link
Member Author

oliver-sanders commented May 15, 2023

Clarification on use cases 6 & 7 in the cylc set proposal...

6.?? Set jobs to failed when a job platform is known to be down

I don’t think this case is valid. (Unless I’ve misunderstood the requirement?).

This case describes the scenario where a job has successfully submitted (or even started?) on a remote platform which subsequently becomes uncontactable leaving us with a job "stuck" in the submitted(/running?) state. We cannot poll or kill these tasks, so at Cylc 7 this could stall workflows. The cylc reset command was used to work around the issue allowing us to disown the job submission in situations where Cylc could not confirm that they had failed.

I think this issue can still occur at Cylc 8, if so we need a mechanism for telling Cylc to disown these job submissions so that we can re-submit on another system and continue.

Suggested solutions:
  • cylc forget <job>
  • cylc message <id> <task> -- failed

7.?? Set switch tasks at an optional branch point, to direct the future flow

I’m not sure this is valid either. Why would we need to do this?

Sometimes tasks act as "if" statements in workflows, governing graph branching. With optional outputs these branching patterns are likely to become more common as people pull these "if" statements out of task logic and into the workflow graph.

a => b
a:x? => x => b
a:y? => y => b
a:z? => z => b

E.G. we have a few workflows where the first task in every cycle yields an output which decides which data source to use based on runtime conditions. Users might want to intervene in this decision rather than leaving it up to the automatic logic in order to work around unexpected issues, for development or to test recovery logic manually. They may want --wait behaviour for this case.

To do this they need to be able to set the desired output on the switch task (covered by the cylc set proposal), but will probably also want to remove/expire the task to prevent it from being re-run and potentially trigger another branch in so doing.

Suggested solutions:
  • cylc set && cylc expire?

@hjoliver
Copy link
Member

Thanks for the clarifications, that makes more sense now. I'll digest ASAP and update the doc.

@oliver-sanders
Copy link
Member Author

Agreed!

The proposal pages are now merged into cylc-admin, the implementation work is now tracked by the following issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants