Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New conditional api #2443

Merged
merged 9 commits into from May 5, 2020
Merged

New conditional api #2443

merged 9 commits into from May 5, 2020

Conversation

jcrist
Copy link

@jcrist jcrist commented Apr 28, 2020

This adds two new control flow constructs:

  • case: for creating conditional branches in a flow
  • Variable: for merging conditional branches back together

These do similar things to switch and merge, but should hopefully be
easier to read.

Still needs tests and docs, just getting a WIP up for comments.

Fixes #2381

  • adds new tests (if appropriate)
  • updates CHANGELOG.md (if appropriate)
  • updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

This adds two new control flow constructs:

- `case`: for creating conditional branches in a flow
- `Variable`: for merging conditional branches back together

These do similar things to `switch` and `merge`, but should hopefully be
easier to read.

Still needs tests and docs, just getting a WIP up for comments.
@jcrist
Copy link
Author

jcrist commented Apr 28, 2020

And an example:

from prefect.tasks.control_flow.case import case, Variable

from prefect import task, Flow

@task
def some_condition():
    return True

@task
def other_condition():
    return True

@task
def do_a():
    print("Doing a")
    return 1

@task
def do_b():
    print("Doing b")
    return 2

@task
def do_c(a, b):
    print("Doing c")
    return a + b

@task
def do_d():
    print("Doing d")
    return 1

@task
def do_e(d):
    print("Doing e")
    return d + 1

def example_3():
    """Same example logic without prefect"""
    if some_condition():
        if other_condition():
            a = do_a()
        else:
            a = do_c(1, 2)
        b = do_b()
        do_c(a, b)
    else:
        d = do_d()
        do_e(d)


with Flow("example-3") as flow:
    cond = some_condition()

    with case(cond, True):

        cond2 = other_condition()

        a = Variable(name="a")

        with case(cond2, True):
            a.set(do_a())

        with case(cond2, False):
            a.set(do_c(1, 2))

        b = do_b()
        do_c(a, b)

    with case(cond, False):
        d = do_d()
        do_e(d)

Results in the following graph:

test

@jcrist
Copy link
Author

jcrist commented Apr 28, 2020

See #2381 for the description of the semantics.

A few naming questions:

  • The user should never interact with a case object directly, but case is a class for ease of implementation (it doesn't need to be). We could pull the class into an internal type and make case a @contextmanager wrapped function just as easily.
  • Is Variable the right name? Other options with the same connotation: Var, var, variable.
  • Should Variable's require a name? Should they be named Variable("{name"} instead of just "{name}"?
  • Is Variable.set the right name for this method?

And a few semantic question:

  • Currently Variable returns the last non-None result it gets. This breaks down if the upstream task meant to return None. This is also the opposite of merge, which returns the first. I think using the last makes more sense, since it will use the last successful write of a variable.
  • Mixing Variable.set calls and uses can be a bit confusing. For example:
@task
def inc(x):
    return x + 1

v = Variable(name="a")
a.set(1)
b = inc(a)
a.set(2)

In this case inc gets the value 2 instead of 1, even though a was set to 2 after the call to inc. I see a few options here:

  • Leave this footgun, but document it.
  • Make it an error to call Variable.set twice within the same case block.
  • With a small bit of special casing instead Task we can transform the above to do the right thing automatically (on each use of a Variable a copy is made before adding it to the flow, so future .set calls don't mutate the previous call).
  • Drop Variable, rely on merge as before. I can't decide if I find this easier or harder to read.
with Flow("example-3") as flow:
    cond = some_condition()

    with case(cond, True):

        cond2 = other_condition()

        with case(cond2, True):
            a1 = do_a()

        with case(cond2, False):
            a2 = do_c(1, 2)

        a = merge(a1, a2)

        b = do_b()
        do_c(a, b)

    with case(cond, False):
        d = do_d()
        do_e(d)

@jcrist jcrist mentioned this pull request Apr 28, 2020
@jcrist
Copy link
Author

jcrist commented Apr 28, 2020

Mixing Variable.set calls and uses can be a bit confusing. For example

Thinking about this a bit more, I think I'm leaning slightly towards dropping Variable entirely in favor of merge. In the simple idiomatic cases we can simulate python variable assignment, but I'm not convinced it's more readable and worth the bother.

@jlowin
Copy link
Member

jlowin commented Apr 30, 2020

@jcrist i really love this syntax - I think it removes significant complexity, especially for users who just want to gate a small number of conditional tasks.

My opinion is to drop the Variable entirely; on the one hand I’m super impressed you solved the problem of working with a deferred multi-assigned variable, but I worry the nuance of applying variables correctly will be overshadowed by the additional burden of explaining them, why they differ from parameters, etc., especially because they add a lot of machinery to replace a relatively small merge op.

Therefore I think a great example in our documentation of how to merge two conditional branches will provide a good solution with lower maintenance overhead.

@jcrist
Copy link
Author

jcrist commented May 4, 2020

Ok, I've updated to remove Variable, and added some tests and docs. A few api decisions left:

  • Currently case is a class, but syntactically I like the lowercase case better than the uppercase Case. People may be confused by this, idk.

  • Currently there is no return value from __enter__, meaning that in:

    with case(task, value) as temp:
        pass

    temp is None. I think this makes sense, I don't expect users to ever need to explicitly reference a case object. A few alternatives:

    • Return the case object
    • Return the conditional check task (the CompareValue) task that's created by the case call
  • Currently we reuse the CompareValue tasks used by ifelse and switch. Alternatively we could create a new Case class (basically identical to CompareValue, except in name), which (IMO) may make the task structure clearer (or we could rename CompareValue to Case - I find the name Case a bit clearer for what it's doing). At the same time, it may be confusing to users to have both case (the context manager) and Case (the conditional task).

@codecov
Copy link

codecov bot commented May 4, 2020

Codecov Report

Merging #2443 into master will decrease coverage by 2.64%.
The diff coverage is 96.87%.

@jcrist jcrist changed the title [WIP] New conditional api New conditional api May 5, 2020
@joshmeek
Copy link

joshmeek commented May 5, 2020

@jcrist here's my 2¢ on your Qs:

  • Could you do something where the class is Case and then there's an importable case which in the original file is something like case = Case. Not sure if that's bad practice or not
  • I can't think of a time where case or CompareValue would need to be referenced again BUT I'm not opposed to having __enter__ return the original case
  • Personally I'm used to CompareValue so it makes sense to keep it. Having two cases for the context manager and conditional task is a bit confusing

@cicdw
Copy link
Member

cicdw commented May 5, 2020

Awesome this is great; to your points:

  • I think keeping case a lowercased class is fine
  • I don't have a strong opinion on the behavior of case.__enter__ but if you wanted to return something other than None my vote would be for the CompareValue task created under the hood, because otherwise that task is harder to find and access
  • My bias is to keep CompareValue and not introduce new control flow tasks / renames

@jcrist
Copy link
Author

jcrist commented May 5, 2020

Ok, given the above comments I, I think I'm going to keep things as they are. I've added a changelog entry, I believe this is now ready for merge.

@jcrist jcrist merged commit 55fd1ea into PrefectHQ:master May 5, 2020
@jcrist jcrist deleted the new-conditional-api branch May 5, 2020 19:32
@cicdw cicdw mentioned this pull request May 8, 2020
3 tasks
abrookins pushed a commit that referenced this pull request Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New conditional api
4 participants