Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amendment to proposal amendment #189

Conversation

oliver-sanders
Copy link
Member

Summary

  • Redefine task "incompletion" to avoid problem edge cases.
  • Permit pre-conditions in completion expressions (no longer any reason not to, this makes things clearer for users)
  • Remove redundant safety check that was only required for the previous proposal

Incompletion condition bugs

Closer inspection of the condition for task "incompletion" revealed some logical flaws, e.g:

1. Task cannot be completed by the expired output.

If a task is expired (and expiry is permitted) then it is complete, but subsequently setting a finished output would make it incomplete by satisfying the "it finished executing" condition.

Similarly, if the task succeeds, fails or submit-fails, then it will not be possible to subsequently complete it by setting the expired output.

2. Tasks with no required outputs

It is possible for a task to have no required outputs, e.g:

a:submitted?
a:succeeded?
a:expired?

In this case the completion condition is True. By the previous logic the task is complete before it has even been submitted.

Underlying issue

These flaws weren't intentional, they are just bugs. The issue here is the defining incompletion which results in a lot of double negatives.

For the same reason we cannot use not in completion conditions (task outputs accumulate over multiple executions / interventions), we cannot define task completion in terms of negatives.

The solution is to define task completion in terms of positives. This is also clearer and easier to implement.

Implementation detail: Exposing the completion condition

A bonus of the new definition of task completion is that we can define the default completion condition for a task in a single flat expression e.g:

( succeeded and x) or expired

(As opposed to the previous branched code logic)

This condition can then be exposed to the user clarifying how the optional outputs they define in the graph impact the completion of the task.

E.G:

a?
a:failed?
$ cylc config -i '[runtime][a]completion'
completion = succeeded or failed

Behaviour change

There has been one functional change in this amendment. Consider:

a?
a:failed?
a:x

Following the previous logic the task a could have been completed in one of two ways:

  1. succeeded and x
  2. failed and x

So the completion condition, effectively reduced to:

(succeeded and x) or (failed and x)

This isn't especially helpful as we probably wouldn't have expected x to be generated in the failure scenario.

I'm not sure whether this is a bug in the condition or purposeful. This amendment changes:

All required outputs must be completed if the task runs. The task must succeed unless stated otherwise.

To:

All required outputs must be completed if the task succeeds. The task must succeed unless stated otherwise.

Changing the default completion expression for the above example to:

(succeeded and x) or failed

Which is a tad more practical.

* Redefine task "incompletion" to avoid problem edge cases.
* Permit pre-conditions in completion expressions.
* Remove redundent safety check that was only required for the previous
  proposal
Comment on lines -117 to -119
Note: the completion condition only covers outputs generated when a task is executed - it cannot
include `:expired`, `:submit` or `:submit-fail`. It also cannot include `started` (this has to
happen if a task executes) or `finished` (this isn't an output).
Copy link
Member Author

@oliver-sanders oliver-sanders Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restriction is no longer necessary with the above change.

We could decide to preserve this restriction, however, it will be clearer for users if we broaden the scope of the completion condition to include pre-conditions as otherwise we will have to automatically add the pre-condition logic internally.

I can't think of a good use case for using started in a completion expression, but there's no reason to blacklist it.

The pseudo finished output should probably be blacklisted though (to avoid confusion if nothing else).

Comment on lines -125 to -126
If a task is configured to clock-expire it must be made optional via the graph - otherwise this
will cause a validation failure. This is designed to reduce the risk of unintended early shutdown.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is left over from the previous proposal. It was necessary there because expired? implied succeeded? so if you wanted a task to be able to expire, you needed to define the expiry pathway in the graph.

The new proposal (Dave's proposal) separates pre-conditions from execution logic so expired? can be defined without making success optional. As a result this check is no longer required to avoid unintended early shutdown.

Now, if you don't tell Cylc that expiry is permitted, either by defining an :expire? pathway in the graph or by permitting it in the completion expression, then the workflow will stall, but won't shutdown.

As a result this validation error has been downgrade to a validation warning (see below).

@oliver-sanders
Copy link
Member Author

Playground:

I have an implementation of the proposed logic on this branch, it's not ready yet of course, but you can use it to see how the default completion condition is generated, e.g:

[scheduling]
    [[graph]]
        R1 = """
            a? & a:x
            a:failed?

            b? & b:x?
            b:failed?

            c? & c:x
            c:failed? & c:y

            d?
            d:submit?
            d:expire?
        """

[runtime]
    [[root]]
        [[[outputs]]]
            x = xxx
            y = yyy
            z = zzz

    [[a]]
    [[b]]
        # user defined completion expression
        # (this can be anything so long as it's consistent
        #  with the graph)
        completion = (succeeded and x) or (failed and y)
    [[c]]
    [[d]]
$ cylc config . -i '[runtime]'
[[root]]
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz
[[a]]
    completion = (x and succeeded) or failed
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz
[[b]]
    completion = (succeeded and x) or (failed and y)
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz
[[c]]
    completion = ((x and y) and succeeded) or failed
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz
[[d]]
    completion = succeeded or failed or submit_failed or expired
    [[[outputs]]]
        x = xxx
        y = yyy
        z = zzz

Copy link
Contributor

@dpmatthews dpmatthews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the way the completion condition is now exposed via cylc config

Comment on lines +49 to +52
1. All required outputs (referenced in the graph) must be satisfied.
2. If success is optional, then the task is complete if it fails.
3. If submission is optional, then the task is complete if it submit-fails.
4. If expiry is optional, then the task is complete if it expires.
Copy link
Member

@hjoliver hjoliver Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These points need ORs, and for 1., the task must be finished, right?

Suggested change
1. All required outputs (referenced in the graph) must be satisfied.
2. If success is optional, then the task is complete if it fails.
3. If submission is optional, then the task is complete if it submit-fails.
4. If expiry is optional, then the task is complete if it expires.
A task is complete if:
1. If it finished and all required outputs (referenced in the graph) are satisfied.
2. Or if it fails and success is optional.
3. Or if it submit-fails and submission is optional.
4. Or if expires and expiry is optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to decide whether outputs get "completed" or "satisfied". I have always said the former (only prerequisites get satisfied). However I suppose we could say that outputs get satisfied, in order to avoid any confusion with task completion.

Copy link
Member Author

@oliver-sanders oliver-sanders Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These points need ORs

Can add ors, it doesn't change the meaning.

and for 1., the task must be finished, right?

No, we determine whether a task is "finished" from the task state not the task outputs. E.G. a retrying task may have the "failed" task output but be in the "waiting" state. The completion expression relates only to outputs. Attempting to build the "finished" check into the completion expression results in bugs.

The completion expression tells you whether a task is complete, a task is incomplete if this expression evaluates False AND the task is in a final state, e.g:

https://github.com/cylc/cylc-flow/blob/a9becc2f8cad9e53a6122d2551b7b17f196857e4/cylc/flow/task_pool.py#L1084-L1089

(explained in the lines below in the proposal)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. All required outputs (referenced in the graph) must be satisfied.
2. If success is optional, then the task is complete if it fails.
3. If submission is optional, then the task is complete if it submit-fails.
4. If expiry is optional, then the task is complete if it expires.
1. All required outputs (referenced in the graph) must be satisfied.
2. Or, if success is optional, then the task is complete if it fails.
3. Or, if submission is optional, then the task is complete if it submit-fails.
4. Or, if expiry is optional, then the task is complete if it expires.

Copy link
Member

@hjoliver hjoliver Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we determine whether a task is "finished" from the task state not the task outputs...

Yes but no but. By "finished" I did mean a final state, not outputs.

I missed this critical change a bit higher up, sorry, which partly explains your response:

- 3. A task is incomplete if the completion condition is not satisfied and:
+ 3. A task is complete if the completion condition is satisfied.

This change in terminology really requires another attendant change.

Currently:

  • a task is complete if its outputs are complete and it is finished
  • a task is incomplete if its outputs are not complete and it is finished

[N.B., by "its outputs are complete" I mean either "its required outputs are complete" (currently) or "its completion condition is satisfied" (soon).]

So currently the word "complete" is a bit overloaded, but at least complete and incomplete tasks are logical opposites. And that's important because it describes when the scheduler can forget (as complete) or retain (as incomplete) a finished task.

If we change to a task is complete if its output completion condition is satisfied (even if it's not finished), then logically a task should be incomplete if its output completion condition is NOT satisfied (even if not finished). But we still have a task is incomplete, if the completion condition is not satisfied but the task has a final task status.

So I think we need to change the definition of incomplete as well, to a task is incomplete if its output completion condition is not satisfied. And then, a finished task will be removed if complete, or retained if incomplete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a task is incomplete if its outputs are not complete and it is finished

Read down the proposal a couple lines, this is already in there.

Copy link
Member Author

@oliver-sanders oliver-sanders Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that's my only problem with the amended amended proposal, otherwise great)

Sorry, I'm a bit confused about what you're suggesting. This proposal states:

A task is complete if the completion condition is satisfied.

And:

a task is incomplete, if the completion condition is not satisfied
but the task has a final task status (note status not output). This is the
same as presently implemented (8.2.x)

Ok?

Copy link
Member Author

@oliver-sanders oliver-sanders Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's go with your use of "satisfied" instead of "completed" for individual outputs

I'm not fussed by grammar gerrymandering. I would lean towards:

  • An expression can be "satisfied".
  • A task output can be "completed".

(a collection of required outputs form an expression e.g. succeeded and x and y)

If you want something different let me know.

Copy link
Member Author

@oliver-sanders oliver-sanders Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's my only problem with the amended amended proposal, otherwise great

I think the issue is grammar? Assuming there's no fundamental issue with this proposal, I'm going to push ahead with the branch in the name of haste. Let me know if I've misinterpreted this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a task is incomplete if its outputs are not complete and it is finished

Read down the proposal a couple lines, this is already in there.

I know. My point is: that is now logically inconsistent with your new definition of "completed task".

Sorry, I'm a bit confused about what you're suggesting. This proposal states:

A task is complete if the completion condition is satisfied.
And:
a task is incomplete, if the completion condition is not satisfied
but the task has a final task status (note status not output). This is the
same as presently implemented (8.2.x)

Ok?

I agree (well, I observe) that the proposal states both of those things, but the problem is that means "complete" and "incomplete" tasks are no longer logically opposite - only one of them cares about final task status - and I think that is a problem for clarity and user understanding.

let's go with your use of "satisfied" instead of "completed" for individual outputs
I'm not fussed by grammar gerrymandering. I would lean towards:

  • An expression can be "satisfied".
  • A task output can be "completed".
    If you want something different let me know.

This isn't grammar, it's about the definition of the words that we're using to describe these concepts. In this case, I also lean towards those existing uses, but I thought you wanted something different! I was literally quoting your wording in the proposal ("satisfying" rather than "completing" an output), assuming that was deliberate, and speculating that your rationale was to disambiguate our current overloaded use of the word "complete".

I think the issue is grammar? Assuming there's no fundamental issue with this proposal, I'm going to push ahead with the branch in the name of haste. Let me know if I've misinterpreted this.

It's not grammar, as I said, it's about defining terms clearly and consistently. But yes, that's still not a fundamental issue, so please do go ahead with implementation (sorry I was under the impression you already were working on it, and didn't realize you might think this was a "fundamental" objection).

@hjoliver
Copy link
Member

hjoliver commented Mar 20, 2024

@oliver-sanders - I'll try to distill my two "definitional" issues further. This is important because we all have to agree on what the terms mean, and document the system accordingly.

1. complete vs incomplete tasks

Task completion should depend ONLY on outputs, not on task state original comment

  • a task is complete if its output completion expression is satisfied
  • a task is incomplete if its output completion expression is not satisfied #<--- NEW (no final status involved)

Then, a task can be removed from the active task pool only if it is finished (i.e., it has a final task status) AND complete.

Note, without this, as proposed complete and incomplete tasks are not logical opposites - you can have a task that is both not complete and not incomplete. 🤯

2. satisfying vs completing outputs

Individual outputs should be "satisfied", not "completed" original comment

  • this would disambiguate our overloaded use of the term "complete"
  • it can be spun as making good sense too: a task message "satisfies" a task output

Note, I thought, from your wording that you were proposing this; I was just trying to take it seriously 😁. Either way, it is less important than 1., but worth considering.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Mar 20, 2024

I'm ok with complete and incomplete not being literal opposites, like flammable and inflammable, hence it's a grammar argument, happy to accept suggestions but I don't have any myself. Moreover, it's not inconsistent, one defines when outputs are complete, the other when tasks are complete, note, is_complete is internal only and not even user facing!

Happy to fiddle the meaning of completed and satisfied, but I don't think I'm going to be changing terminology universally in this proposal/PR as it's more far reaching than task_outputs.py.

Trying to avoid getting drawn into one of Cylc's legendary grammar/terminology arguments right now.

@dpmatthews
Copy link
Contributor

We're going to need to support an event (and indicate it in the GUI) if a task can not complete so we need to be able to describe this clearly. I agree that incomplete would not be a good name for this event. Maybe wont_complete?

@hjoliver
Copy link
Member

hjoliver commented Mar 20, 2024

This isn't one of those "legendary" arguments (the one you referenced was a semi-humorous suggestion for a collective noun where one didn't exist). We have to have an agreed meaning of important terms like this or we can't understand each other or expect users to. And it's not grammar (i.e., it not about the structure of language and sentences) it's just about consistent and logical definitions of specific terms.

But OK, since you don't seem to have strong feelings on the matter, except that you think it's grammar and I don't, how about we go with my proposed definitions which are both intuitive and logically consistent:

  • a task is complete if it's completion condition is satisfied
  • a task is incomplete (i.e. not complete) if it's completion is not satisfied
  • (and therefore a task can be removed from the active pool if it finished (i.e., it has a final status) AND it is complete)

@hjoliver
Copy link
Member

hjoliver commented Mar 20, 2024

My suggestion also handles @dpmatthews point - the event that we need to signal is that a task finished without completing. "Incomplete" is not a good name for that, because the "finished" bit is critical as well. "wont_complete" is a bit ungainly, but it makes sense because a finished task can no longer complete automatically. Maybe we can think of a better name for that. Maybe just finished_incomplete?

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Mar 20, 2024

a task is incomplete (i.e. not complete) if it's completion is not satisfied

I don't think your suggestion works? E.G. if a task has no required outputs, then it will be completed before it has even been spawned. A task is not complete if its completion expression is satisfied unless it has a final task status.

This is what I'm proposing:

  • Outputs are complete if the completion expression is satisfied.
  • Tasks are complete if they have a final task status AND their outputs are complete.

This is not inconsistent, it is the required logic. Feel free to suggest different names if you feel strongly.

Note these definitions control internal logic and aren't user facing, we report this to the users as "task finished with incomplete outputs".

@hjoliver
Copy link
Member

hjoliver commented Mar 20, 2024

I don't think your suggestion works? E.G. if a task has no required outputs, then it will be completed before it has even been spawned.

That works fine, if (as I'm proposing) "completion" refers only to completion of outputs. If such a task gets triggered, it is not required to generate any outputs at all, it just needs to finish.

This is what I'm proposing:

  • Outputs are complete if the completion expression is satisfied.
  • Tasks are complete if they have a final task status AND their outputs are complete.

That's not quite what you are proposing. The proposal refers to incomplete tasks as well:

  • A task is incomplete if the completion condition is satisfied and the task has a final status.

Something is either complete or incomplete, end of story, by the universally agreed meaning of those terms. But your use of these terms does not conform to that. As proposed, a task that is "not complete" is not necessarily "incomplete". That is not consistent (aka inconsistent!) with the normal definition of those words, and it will cause confusion.

This is not inconsistent, it is the required logic. ...

The required code logic is 100% the same either way, but language used in the proposal to describe it is inconsistent.

Note these definitions control internal logic and aren't user facing, we report this to the users as "task finished with incomplete outputs".

Our proposals do necessarily also invent and use terminology to describe what's going on, and which then has to be documented in the user guide. So this is both developer- and user-facing. To fix it in this case, there are two choices:

  1. a task can be described as "complete" or "incomplete" based on only output completion
    • task pool removal logic depends on both "task completion" and "task finished"
  2. only outputs, not tasks, can be described as "complete" or "incomplete"
    • task pool removal logic depends on both "task output completion" and "task finished"
    • (this avoids having to explain to users that a "complete task" is not necessarily "finished")

I quite like 1. as a convenient way of describing a task whose outputs are complete. But maybe we should go with 2. It's a little more verbose but it has exactly zero chance of confusing anyone because "completion" is not an overloaded term anymore. What do you think?

@hjoliver
Copy link
Member

hjoliver commented Mar 20, 2024

Feel free to suggest different names if you feel strongly.

Ok, I think 2. above is my suggestion. Reasons:

  • we can't document this new logic by describing complete/incomplete tasks as defined in the proposal (it's inconsistent)
  • but I accept your point that users might be confused that a task can be "complete" without finishing

So let's just not say that tasks can be complete or incomplete. There's no need to. Outputs can be complete or incomplete, and a task can be finished or not finished.

@@ -41,16 +41,33 @@ There are currently three problems with expiry (as of cylc-flow 8.1.4):
executed and that might not happen if expiry or submission are optional (or if the pre-requisites
for `a` are not met).

3. A task is incomplete if the completion condition is not satisfied and:
3. A task is complete if the completion condition is satisfied.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. A task is complete if the completion condition is satisfied.
3. Outputs are complete if the completion condition is satisfied.

Copy link
Member Author

@oliver-sanders oliver-sanders Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjoliver is this correction sufficient for your concerns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not by itself - it has to be carried throughout the document wherever "complete tasks" and "incomplete tasks" are referred to.

I take it you are agreeing to 2. above then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to have a crack at that myself if you're busy on implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you are agreeing to #189 (comment) then?

No, I'm sticking with:

  • Outputs are complete if the completion expression is satisfied.
  • Tasks are complete if they have a final task status AND their outputs are complete.

Reminder, TaskOutputs.is_complete is an internal interface not exposed to users, this definition is not important.

Providing there aren't any technical qualms with this, I'm going to plug on with implementation as there's still quite a lot to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell you are now saying that you won't call tasks complete if their outputs are complete, but you will still call them incomplete if their outputs are incomplete AND they are finished

Correct.

You could stop spending time on this by simply agreeing

Harsh. Your turn to propose something, my turn to be nit picky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't mean to be harsh - but I have proposed a solution (two of them in fact), in great detail, above. This is only a small question of consistent use of two common terms, it doesn't require a whole new proposal. Are you sure you've read and understood my "proposals"? (This sort of thing would be a lot easier to resolve if we didn't live in opposite timezones!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I merge this and immediately put up a tweak PR that redoes the use of "in/complete tasks" in the way that I propose? (However, don't take that as full approval of this one yet 😁)

Copy link
Member Author

@oliver-sanders oliver-sanders Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could pick an option and raise it so I can review it in context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll clone your branch (rather than merge it) and put up an alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants