-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New conditional api #2381
Comments
Should this be more formalized as a PIN? Doesn't have to be very long and this seems like really cool functionality 😄 |
Maybe? I'd be happy to write one up if that's the route we should go. |
@jcrist just for posterity from talking in core release planning, the temperature of the group was to carry on with implementation without a dedicated PIN given that the new API is additive and does not change the existing API. If implementation does tread into breaking change territory for |
I put a quick initial draft up in #2443. |
Background
The current conditional api (
switch
,ifelse
, andmerge
) work, but have a few problems. Usingifelse
as an example:If
cond
evaluates toTrue
,true_task
is run andfalse_task
is skipped. However, tasks upstream tofalse_task
aren't skipped. For example:if
some_condition
evaluates toFalse
, only theload
task is skipped,transform
andextract
still execute.If the result of a task is conditional based on upstream tasks, you'll need to call
merge
explicitly. We tried makingifelse
/switch
return the merge implicitly, but had to revert due to issues with implicitly creating terminal tasks in the graph (Control flow tasks return executed branch #2310, Revert "Control flow tasks return executed branch" #2379).This approach works, but can be tricky to read.
Proposal
I propose we add a contextmanager api for expressing conditional flows. The existing
switch
/ifelse
/merge
api will still exist (and may be used internally by the contextmanager api).The api consists of two new functions:
case
: used to express branches in logicvar
: used to express merging outputs (this could equally be a classVariable
)A few examples to illustrate use:
Example 1
Here we use
case
to express anifelse
branch:If
some_condition
isTrue
,do_a
,do_b
, anddo_c
are run. If it's false,do_c
is run with some constant inputs. For eachcase
block, aCondition
task is created. If a task is created inside thecase
block and has no upstream dependencies that were also created inside thatcase
block, then the condition is set as an upstream dependency. In this case,a
andb
would have aCondition
forTrue
set as an upstream dependency, butdo_c
would only depend ona
andb
. We could equally have all tasks created inside the block have the condition as an immediate upstream task, but that makes the graph visually complicated.Example 2
Here we illustrate merging tasks with
var
. Lets say we wanted to capture the output ofdo_c
and use it later on. In normal Python code we might write this as:Using this proposal, an equivalent flow would be:
Each call to
c.set
binds the value ofc
inside its correspondingcase
block. Behind the scenes this builds up a merge task branch-by-branch (note: we may not use merge directly, but the general structure will be the same). Sincec
is aTask
itself, it can then be fed to downstream tasks.Example 3
These context managers should work fine even with nested conditional logic. For example, the following function:
corresponds to an equivalent flow of:
Note that in this case, the
Condition
task created by callingcase
oncond2
will have theCondition
created forcond1
as a direct upstream task (following the logic described above, since it has no upstream tasks created inside thiscase
context, the corresponding condition is set as a direct upstream task).I think implementing the api as described above is fairly doable. It might require #2298 though to properly handle complex branching/merging logic (not sure yet).
The text was updated successfully, but these errors were encountered: