-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle restart with changed sequence better #6154
Comments
@MetRonnie, I've not been able to reproduce this one, have tried 8.3.x, 8.3.0 and 8.2.3. Might need a tweak to the example. It's possible that this PR might help: #6229 |
I can still reproduce on 8.3.2 and 8.3.x as of dc9f01b, from copying the example above. |
Ach, blindingly obvious, I didn't reinstall! #6229 does not appear to fix this :( |
This is very similar to #6229, the compute_runahead algorithm is (correctly) erroring due to a complete lack of tasks.
This latter case should be extremely rare and is an error case in the first place so fairly low priority. Easy fix however! With this diff the workflow will shutdown gracefully on restart/reload: diff --git a/cylc/flow/task_pool.py b/cylc/flow/task_pool.py
index f1a6942f0..c992e880b 100644
--- a/cylc/flow/task_pool.py
+++ b/cylc/flow/task_pool.py
@@ -398,6 +398,9 @@ class TaskPool:
self._prev_runahead_sequence_points = sequence_points
self._prev_runahead_base_point = base_point
+ if len(sequence_points) == 0:
+ # no cycles to schedule
+ return False
if count_cycles:
# (len(list) may be less than ilimit due to sequence end)
limit_point = sorted(sequence_points)[:ilimit + 1][-1] However, a reload that essentially empties the task pool is sure to be a mistake, so going ahead with the restart/reload and wiping out the pool making it difficult to recover the workflow is not necessarily the best move. We could consider raising an error here (examples of this form are the only confirmed way to hit this bug). This would cause the restart/reload to fail with an informative message: diff --git a/cylc/flow/task_pool.py b/cylc/flow/task_pool.py
index f1a6942f0..647353378 100644
--- a/cylc/flow/task_pool.py
+++ b/cylc/flow/task_pool.py
@@ -398,6 +398,9 @@ class TaskPool:
self._prev_runahead_sequence_points = sequence_points
self._prev_runahead_base_point = base_point
+ if len(sequence_points) == 0:
+ # no cycles to schedule
+ raise WorkflowConfigError('No tasks scheduled to run')
if count_cycles:
# (len(list) may be less than ilimit due to sequence end)
limit_point = sorted(sequence_points)[:ilimit + 1][-1] Which will it be?
|
Option 2 is no worse than the current behaviour so could at least be implemented as a quick address of the issue. Edit: however, with the option 2 patch, I've just seen that reloading a workflow after changing [[graph]]
- P1 = foo[-P1] => foo
+ R1 = foo[-P1] => foo causes the workflow to get stuck in the reloading state after the |
Errors raised during reload should cause the reload to be aborted. The error should be logged and the workflow should continue with the original config. If that's not happening, we should fix it. I think the reload aborts correctly on master (due to the error in compute_runahead). |
It aborts, however the workflow status gets stuck as reloading |
Description
Restarting a workflow with a waiting or orphaned task that is no longer on sequence (due to
flow.cylc
having been changed) can result in traceback.Reproducible Example
Run this workflow:
When it shuts down, edit
flow.cylc
:Then restart the workflow to get:
Expected Behaviour
No traceback; more helpful error message.
Need to handle
sequence_points
being empty here:cylc-flow/cylc/flow/task_pool.py
Line 400 in 3962f3d
The text was updated successfully, but these errors were encountered: