Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix param graph when mixing offset and conditional #3452

Merged
merged 10 commits into from Dec 17, 2020

Conversation

kinow
Copy link
Member

@kinow kinow commented Nov 29, 2019

These changes close #2608

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Appropriate tests are included (unit and/or functional).
  • Appropriate change log entry included.
  • No documentation required.

@kinow kinow added this to the cylc-8.0a2 milestone Nov 29, 2019
@kinow kinow self-assigned this Nov 29, 2019
@kinow
Copy link
Member Author

kinow commented Nov 29, 2019

Hi @hjoliver

While reading param_expand.py code, I noticed that for m = [cat, dog], while expanding the graph foo<m-1> & baz => foo<m>, the code would start by m=cat. It would replace m-1 by -32768, resulting in foo_-32768 & baz => foo<m>, and then apply a regex on it later.

The regex used is (?:^|\s*=>).*-32768.*?(?:$|=>\s*?). For a given value foo_-32768 & baz => foo_cat, this regex matches from foo_-32768 up to the arrow => token. Figure below to simplify.

image

So the returned text is foo_cat. While on paper, and from our discussion in the issue linked, it ought to be baz => foo_cat. In this PR, I have altered the regex for (?:^|\s*=>).*-32768([&|\s]*)?. It is quite similar to the previous regex, changing only the last part: was .*?(?:$|=>\s*?) is now ([&|\s]*)?, meaning more or less consume any spaces and & if any exist. Which produces:

image

And the returned values in param_expand are now correct I guess, returning:

# a python set
{
  'baz => foo_cat',
  'foo_cat & baz => foo_dog'
}

Here's an example workflow I used to test it.

# file ~/cylc-suites/p1/suite.rc
[cylc]
    [[parameters]]
        m = cat, dog
[scheduling]
    cycling mode = integer
    initial cycle point = 1
    [[dependencies]]
        [[[R1]]]
            graph = "foo<m-1> & baz => foo<m>"

The reference graph for Cylc 8 master:

# master
(venv) kinow@kinow-VirtualBox:~/Development/python/workspace/cylc-flow$ cylc graph --reference p1
2019-11-29T15:31:04+13:00 WARNING - deprecated graph items were automatically upgraded in 'suite definition':
2019-11-29T15:31:04+13:00 WARNING -  * (8.0.0) [scheduling][dependencies][X][graph] -> [scheduling][graph][X] - for X in:
	R1
edge "baz.1" "foo_dog.1"
edge "foo_cat.1" "foo_dog.1"
graph
node "baz.1" "baz\n1"
node "foo_cat.1" "foo_cat\n1"
node "foo_dog.1" "foo_dog\n1"
stop

And for this branch:

# this branch
(venv) kinow@kinow-VirtualBox:~/Development/python/workspace/cylc-flow$ cylc graph --reference p1
2019-11-29T15:30:54+13:00 WARNING - deprecated graph items were automatically upgraded in 'suite definition':
2019-11-29T15:30:54+13:00 WARNING -  * (8.0.0) [scheduling][dependencies][X][graph] -> [scheduling][graph][X] - for X in:
	R1
edge "baz.1" "foo_cat.1"
edge "baz.1" "foo_dog.1"
edge "foo_cat.1" "foo_dog.1"
graph
node "baz.1" "baz\n1"
node "foo_cat.1" "foo_cat\n1"
node "foo_dog.1" "foo_dog\n1"
stop

From what I understood, the SuiteConfig will call the GraphParser to parse the graph. The names & parameters expansion is done in param_expand.py, where I changed this regular expression.

But after the GraphParser calls the methods in param_expand.py's GraphExpander object, it will do a further "dependency chaining". I believe this is the part that you explained to me today, where the

          baz => foo_cat
foo_cat & baz => foo_dog

Gets simplified into

foo_cat => foo_dog
baz => foo_cat & foo_dog

While it doesn't print exactly that, I think after this change the user would get the same graph & dependencies? i.e., baz having no dependencies, foo_cat having only baz as dependency, and foo_dog having baz and foo_cat as dependencies.

Added some unit tests to demonstrate the above. Tests (unit & functional) may fail, but I will deal with that later if you confirm this PR is going in the right direction 👍

Thanks!
Bruno

@hjoliver
Copy link
Member

hjoliver commented Dec 1, 2019

@kinow - thanks for the fix! From a quick look, you're right; but a proper review later.

Gets simplified into

I wouldn't say either one of those is simpler, they're just equivalent. And the "simplification" may have been done in my head, I wouldn't necessarily expect the graph parser to do that transformation.

@kinow
Copy link
Member Author

kinow commented Dec 1, 2019

Gets simplified into

I wouldn't say either one of those is simpler, they're just equivalent. And the "simplification" may have been done in my head, I wouldn't necessarily expect the graph parser to do that transformation.

Ah! That makes sense! I thought for other graph expressions Cylc would have some sort of optimization/rewriting/etc happening somewhere for that sort of modification 😄

@kinow kinow marked this pull request as ready for review April 14, 2020 04:30
@@ -261,7 +261,7 @@ class GraphExpander(object):

_REMOVE = -32768
_REMOVE_REC = re.compile(
r'(?:^|\s*=>).*' + str(_REMOVE) + r'.*?(?:$|=>\s*?)')
r'(?:^|\s*=>).*' + str(_REMOVE) + r'.*?(?:$|=>\s*?|&\s*)')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler fix. Just stop the greedy regex once it finds a) end of stream $, b) an arrow followed by spaces =>\s*, or c) ampersand symbol followed by spaces &\s*.

The last c) item covers the bug reported in the issue linked.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these solutions handle the other forms of the problem e.g:

  • baz & foo<m-1> => foo<m>
  • baz & foo<m-1> & pub => foo<m>
  • bar & foo<m-1> & pub<m-1> & qux => foo<m>

Copy link
Member Author

@kinow kinow May 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliver-sanders not really good at parsing the graph expressions, but here's the output of the examples you provided.

With

params_map = {'m': ["cat", "dog"]}
templates = {'m': '_%(m)s'}
  • baz & foo<m-1> => foo<m>
    • is expanded to {'baz & foo_cat => foo_dog', ' foo_cat'}
  • baz & foo<m-1> & pub => foo<m>
    • is expanded to {'baz & foo_cat & pub => foo_dog', 'pub => foo_cat'}
  • bar & foo<m-1> & pub<m-1> & qux => foo<m>
    • is expanded to {'bar & foo_cat & pub_cat & qux => foo_dog', 'qux => foo_cat'}

These look OK to me, but can't really say I'm sure these are really correct (@hjoliver do these look correct?)

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expansion of the parameterised items seems correct to me but there is a bit of a discrepancy with the non-parameterised items.

E.g in example (2) we get pub => foo_cat but not baz => foo_cat similar to example (3) but different to example (1) where we just get foo_cat.

I'm not entirely sure how we should expand these things, here are my guesses:

  • baz & foo<m-1> => foo<m>
    • baz & foo_cat => foo_dog
    • [foo_cat]
    • (baz => foo_cat)
  • baz & foo<m-1> & pub => foo<m>
    • baz & foo_cat & pub => foo_dog
    • [foo_cat]
    • (baz => foo_cat)
    • (pub => foo_cat)
  • bar & foo<m-1> & pub<m-1> & qux => foo<m>
    • bar & foo_cat & pub_cat & qux => foo_dog
    • [foo_cat]
    • (bar => foo_cat)
    • (qux => foo_cat)

Where:

  • compulsory => dependency - needed for graphing
  • [optional => line] - not needed as task is referenced elsewhere but not harmful
  • (pre-initial-type => dependency) - not sure if we should have these or not, but if so they should be consistent

@kinow
Copy link
Member Author

kinow commented Apr 14, 2020

Travis happy! 🎉 Will push one more commit, with change log. It may fail, but it would be due to some flaky test I guess.

image

@kinow kinow requested a review from hjoliver April 14, 2020 04:59
@hjoliver hjoliver modified the milestones: cylc-8.0a2, cylc-8.0a3 Apr 30, 2020
@kinow
Copy link
Member Author

kinow commented May 26, 2020

(rebased)

@kinow
Copy link
Member Author

kinow commented May 26, 2020

CI test failure not related to this change I think

 =========================== short test summary info ============================
FAILED cylc/flow/tests/network/test_client.py::TestSuiteRuntimeClient::test_serial_request
============ 1 failed, 593 passed, 1 skipped, 24 warnings in 22.25s ============

@kinow
Copy link
Member Author

kinow commented Jul 30, 2020

Rebased

@kinow
Copy link
Member Author

kinow commented Aug 3, 2020

Rebased, and moved the change log entry from 8.0a2 to 8.0a3.

@kinow
Copy link
Member Author

kinow commented Aug 3, 2020

Rebased, and moved the change log entry from 8.0a2 to 8.0a3.

And this time the functional tests failed (passed before rebasing). Maybe new tests caught something I had missed before 🤓

@kinow
Copy link
Member Author

kinow commented Aug 4, 2020

Rebased, and moved the change log entry from 8.0a2 to 8.0a3.

And this time the functional tests failed (passed before rebasing). Maybe new tests caught something I had missed before nerd_face

Kicked GH actions and now the builds passed 👀

@hjoliver hjoliver requested a review from wxtim August 7, 2020 11:20
@hjoliver
Copy link
Member

hjoliver commented Aug 7, 2020

Just came back to look at this, and it doesn't seem to be working correctly (although I haven't tried to understand why). To pick one example

# m = cat, dog
baz & foo<m-1> & pub => foo<m> 

This should expand as follows:

baz & foo_cat & pub => foo_dog  # m = dog
baz & pub => foo_cat  # m = cat

In terms of individual dependencies this is:

# m = dog:
baz => foo_dog
foo_cat => foo_dog
pub => foo_dog

# m = cat
baz => foo_cat
pub => foo_cat

If I use cylc graph on that graph on this branch, I get all over the above except for this one:

baz => foo_cat  # MISSING!

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not working correctly - see prev comment)

@hjoliver
Copy link
Member

hjoliver commented Aug 7, 2020

I'm actually not sure if the kind of examples above were considered when parameters where originally implemented. Do they make sense? Maybe we should omit the whole line if one or more parameters goes beyond its bounds?

# m = cat, dog
foo<m-1> & bar => baz<m>

This works for m=dog where we get foo_cat & bar => baz_dog.

But for m=cat we could choose either:

  • ignore the graph line, because foo<m-1> is undefined
  • or ignore foo<m-1> (undefined) but keep bar => baz_cat

Maybe the former choice is more sensible?

I suspect we originally only considered examples like the following, where the two choices give exactly the same result:

# m = 1, 2, 3
foo<m-2> => foo<m-1> => foo<m>

If we just ignore individual tasks that are undefined, this gives:

foo_m1  # m = 1
foo_m1 => foo_m2  # m = 2
foo_m1 => foo_m2 => foo_m3  # m = 3

Whereas if we only accept lines where all tasks are defined, we just get:

foo_m1 => foo_m2 => foo_m3  # m = 3  (m =1, 2 are invalid)

... which is the same result, because the m = 1, 2 lines in the first case are redundant with the m = 3 line.

@hjoliver
Copy link
Member

hjoliver commented Aug 7, 2020

If doing parameterized tasks the old way, with Jinja2 loops, going beyond the list bounds is an error, so you can't loop over the whole range if there's an "index offset". Then you can treat the edge case separately if you need it.

# m = 0,1,2
for m in 1, 2:
    x => foo_{m-1} => bar_{m}
y => bar_{m=0}  # m = 0 edge case if needed

The equivalent for parameterized tasks is:

x => foo<m-1> => bar<m>  # only valid for m = 1, 2, not m = 0
y => b<m=0>  # m=0 edge case if needed

However this still means this branch isn't working right.

# m = cat, dog
baz & foo<m-1> & pub => foo<m> 

this should generate only:

baz & foo_cat & pub => foo_dog

(and it should not generate baz => foo_cat or pub => foo_cat)

@hjoliver
Copy link
Member

hjoliver commented Aug 7, 2020

In terms of implementation, at least it should be easier to ignore whole lines that contain invalid parameterized tasks than to cut out the individual invalid tasks.

This also gets rid of the following problem:

# m = cat, dog
foo => bar<m-1> => baz

if m=cat then bar<m-1> is undefined, so is it valid to say foo => baz? NO, obviously not, if you think about the task IO files that these dependencies represent.

@hjoliver
Copy link
Member

hjoliver commented Aug 7, 2020

(So I seem to have just proven that my own original advice on this problem was wrong, sorry @kinow 😬 )

@oliver-sanders
Copy link
Member

oliver-sanders commented Aug 7, 2020

Maybe the former choice is more sensible?

The issue with this is that you get inconsistent behaviour depending on the number of tasks in the parameterisation e.g:

# m = cat, dog
foo<m-1> & bar => baz<m>

Would give you:

foo_cat & bar => baz_dog

However:

# m = cat
foo<m-1> & bar => baz<m>

Would give you nothing, omitting bar and baz from the workflow entirely.

or ignore foo (undefined) but keep bar => baz_cat

This is somewhat closer to Cylc's pre-initial logic e.g:

# cycle point=1
# initial cycle point = 1
foo[-P1] & bar => baz

Would give you:

bar => baz

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new logic doesn't quite work sorry! (although it should be much easier to modify, to make it work, I hope).

This:

foo => bar => baz

is equivalent to this:

foo => bar
bar => baz

so they must yield the same result after parsing.

In the parameter case though:

# m = cat
foo => bar<m-1> => baz

the split (two line) version gives a different result (via cylc graph) because your logic treats the first token in the chain as special.

@kinow
Copy link
Member Author

kinow commented Dec 14, 2020

So close!!! 😄

@hjoliver
Copy link
Member

I think this shows we can't treat out-of-bounds parameters like pre-initial cycle dependence, because of the fact that out-of-bounds parameters can appear in on the right side of a trigger arrow.

So:

# m = cat
foo => bar<m-1> => baz

This means:

foo => ERR => baz

This should yield only foo. ✔️

# m = cat
foo => ERR
ERR => baz

This should also yield only foo BUT currently it yields foo and baz 😨

IMO what I'm arguing makes sense in spite of @oliver-sanders pre-initial-logic based argument above because:

    1. consistency between the single and multi-line versions of the same expression
    1. pre-initial dependence is different because it can only ever be on the left side of a chain
    1. ERR => baz says "baz triggers off of an illegal/non-existent task" - clearly the right answer to that is "baz should not trigger" (it is NOT "baz should trigger immediately as if it had no prerequisites")

@hjoliver
Copy link
Member

hjoliver commented Dec 14, 2020

More thoughts on the pre-initial analogy:

# initial cycle point = 1
P1 = foo[-P1] => bar

Here foo.0 is technically an illegal/non-existent task BUT we ignore foo.0 to get just bar.1 as a convenience to allow users to boot-strap into an infinite cycle without having to handle the initial cycle properly. i.e. we're are assuming the user really means this:

R1 = bar
R/2/P1 = foo[-P1] => bar

In the parameter case, the same logic arguably makes sense here

# m = 1,2,3,...
foo<m-1> => bar<m>

but does it make sense here:?

# m = 1,2,3 ...
foo<m-1> => bar

(This is another difference between parameters and pre-initial tasks: not all tasks are parameterized, but all tasks do have a cycle point "parameter").

@hjoliver
Copy link
Member

hjoliver commented Dec 14, 2020

The good news is, maybe it only matters for the case of a single out-of-bounds parameter. This should expand to nothing, not bar:

# m = cat
foo<m-1> => bar

but with m = cat, dog it could expand tobar (m=cat) as well as foo<cat> => bar (m=dog) because the total result is the same (bar depends on foo_cat).

@hjoliver
Copy link
Member

Upshot: this issue is a nightmare 😬 However, it seems to me that we have to go with what I've suggested above because of the fact that parameterized tasks can appear on the right side of a trigger, and we need consistency between chained and list-of-pairs graph strings. Would you agree @oliver-sanders ? (before @kinow bashes his head on this wall any further).

@oliver-sanders
Copy link
Member

Oh dammit, this one is the gift that just keeps giving.

I'm going to have to go and work through some examples to suss out the implications.

@hjoliver
Copy link
Member

hjoliver commented Dec 14, 2020

@kinow - I meant to make a PR to your branch, but I accidentally pushed the commit directly to it. Doh! (not a disaster, as it can be backed out if necessary, of course). Anyhow, it simplifies your new logic to make it do what I think is correct, which is: in a chain of dependencies stop at an out-of-bounds task (whether it is the first on in the chain or not).

# m = cat
foo => bar => baz<m-1>   # becomes foo => bar
foo => bar<m-1> => baz   # becomes foo
foo<m-1> => bar => baz   # becomes nothing

If it passes all the test, we'll just wait to see if @oliver-sanders can convince himself that this is the correct way to go.

Note:

  • this logic is easy to explain: an out-of-bounds task is illegal, so it can't trigger other tasks downstream of it
  • if users really want an out-of-bounds task to be magically ignored at the start of the chain, they can still do it manually:
# m = cat
R1 = """
foo<m-1> => bar => baz  # becomes nothing
bar => baz  # becomes "bar => baz"
"""

@kinow
Copy link
Member Author

kinow commented Dec 15, 2020

Feel free to update this branch/PR as necessary :) Thanks @hjoliver !

@hjoliver
Copy link
Member

Tests all pass 🎉

@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 15, 2020

## the parameters (prefixes by ##)
non =
one = 1
two = 1, 2

# the results (prefixed by #)

## an empty parameter yields no tasks
<non>

## a non-empty parameter does yield tasks
# 1
<one>

## a dependency on an empty parameter under SoD is like a dependency on a task
## that is not in the graph, <one> will never run so we might as well ignore it here???
<non> => <one>

## ok but what about the other way around if:
##     a => b
## is equivalent to
##     a
##     a => b
## then the same should be true for parameters?
##    <one>
##    <one> => <non>
## (I'm going to go with yes):
# 1
<one> => <non>

## ok, now for offsets...
## a parameter expands simply
# 1 => a
# 2 => a
<two> => a

## but how does a parameter offset like this expand?
## Is is equivalent to:
##     <two>
##     <two - 1> => a
## or:
##     <two - 1>
##     <two - 1> => a
## (I'm going to go with <two - 1> == [1])
# 1 => a
<two - 1> => a

## if you stick another task at the beginning of the chain, the offset
## should still behave the same, but should the head of the chain expand
##    <two>
##    <two> => <two -1> => a
# 1 # ???
# 2 =1 => a
<two> => <two - 1> => a

## If so
# 1
<one> => <one - 1> => a

Resulting rules:

  1. Ignore everything in the chain after the first out-of-bounds parameter?
    (i.e. <one> => <non> => <one> results in <one>)
  2. The first item in the chain is always added irrespective of what follows?
    (i.e. <one> => <one - 1> results in <one> && <one> => <one - 1>)

@hjoliver
Copy link
Member

hjoliver commented Dec 15, 2020

@oliver-sanders - in your example above, does "non" mean:

    1. an empty parameter list (m = # nothing with foo<m>)
      • this fails validation
    1. or a non-parameterized task (foo)
      • this doesn't fit with your statement that "## an empty parameter yields no tasks"
    1. or an out-of-bounds parameter?
    • this doesn't seem to fit either, since you're addressing that with the "one" and "two" parameters

It seems to me you mean (i); however that fails validation so we don't need to worry about it (which I think is sensible).

So we only need to address:

  • non-parameterized tasks
  • parameterized tasks with valid parameter values
  • parameterized tasks with invalid or out-of-bounds parameter values

Maybe I didn't clearly state my rule above in one place:

  • remove tasks with invalid parameter values, and all tasks downstream of them

I think that corresponds to your rule 1

Ignore everything in the chain after the first out-of-bounds parameter?

but I'm not sure about your rule 2 ??

The first item in the chain is always added irrespective of what follows?

I would say don't add the first item either if it has an invalid parameter value.

E.g. for

# m = 1,2
foo<m-1> => bar<m> => baz

This becomes:

foo_1 => bar_2 => baz   # m = 2
# (nothing)             # m = 1

When m=1 we don't retain bar_1 => baz because the graph says bar_1 is supposed to trigger off of an illegal task. As I said in previous comments, if we did try to allow that (which is somewhat analogous with our handling of pre-initial dependence) it seems problematic when the out-of-bounds parameter is not at the front of the chain:

# m = 1
foo => bar<m-1> => baz

My rule says this should be interpreted as foo. Surely foo => baz is not reasonable ??? Maybe interpreting it as foo & baz would be acceptable, but a convincing use case would be nice because it seems less reasonable to me than saying baz should not trigger when <m-1> is invalid.

@oliver-sanders
Copy link
Member

@oliver-sanders - in your example above, does "non" mean:

An empty parameter list (m = # nothing with foo)

The first item in the chain is always added irrespective of what follows?

What I'm asking is should this:

# m = 1,2
foo<m> => bar<m-1> => baz

Result in this:

foo_1 => bar_2 => baz   # m = 1
foo_2                   # m = 2

Or this:

foo_1 => bar_2 => baz   # m = 1
# (nothing)             # m = 2

Based on the logic that this:

foo => bar

Is equivalent to this:

foo
foo => bar

I'm inclined to go with the first example.

@hjoliver
Copy link
Member

Ah, I think your example got munged.

Presumably what you mean is, should this:

# m = 1,2
foo<m> => bar<m-1> => baz

Result in this:

# Option 1
foo_2 => bar_1 => baz   # m = 2
foo_1                   # m = 1

Or this:

# Option 2
foo_2 => bar_1 => baz   # m = 2
# (nothing)             # m = 1

If that's what you meant, then we're in agreement that Option 1 is correct. And that's what the current implementation does:

$ cylc graph --reference flow.cylc

edge "bar_m1.1" "baz.1"       # <----
edge "foo_m2.1" "bar_m1.1"    # <----
graph
node "bar_m1.1" "bar_m1\n1"
node "baz.1" "baz\n1"
node "foo_m1.1" "foo_m1\n1"   # <----
node "foo_m2.1" "foo_m2\n1"
stop

@hjoliver
Copy link
Member

@oliver-sanders - requesting re-approval from you (it looks like you are happy with it, based on previous discussion, but just in case).

@oliver-sanders
Copy link
Member

Yep looks like we are in agreement, good, bye bye to one of our oldest PRs, sorry @kinow!

@oliver-sanders
Copy link
Member

Quick lets just get this in before we change our minds again!

@oliver-sanders oliver-sanders merged commit 124dd13 into cylc:master Dec 17, 2020
@kinow kinow deleted the fix-param-expand-try-01 branch December 17, 2020 14:14
@hjoliver hjoliver modified the milestones: cylc-8.0a3, cylc-8.0b0 Feb 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

parameter graph incorrect on mixing offset and conditional
3 participants