-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swf.executor: improve the generation of a task's id #11
Comments
Here's another illustration for this problem, extracted from real code that was misbehaving on SWF. Let's assume we run two tasks for two sub-datasets (called "blocks" below), each second one depending on the completion of the first one for the same block. If the first task takes longer for block 0, then we'll submit second task twice for block 1, and never for block 0, with the following code: import time
from simpleflow import activity, Workflow, futures
@activity.with_attributes(task_list="test", version="1.0")
def first_activity(block):
print "start first_activity for block {}".format(block)
if block == 0:
time.sleep(5)
else:
time.sleep(1)
print "finish first_activity for block {}".format(block)
@activity.with_attributes(task_list="test", version="1.0")
def second_activity(block):
print "triggered second_activity for block {}".format(block)
class ChangingWorkflow(Workflow):
name = "changing"
version = "1.0"
task_list = "test"
def run(self):
_all = []
for block in [0, 1]:
first = self.submit(first_activity, block)
_all.append(first)
if first.finished:
second = self.submit(second_activity, block)
_all.append(second)
futures.wait(*_all) The resulting log in a single activity worker speaks for itself:
If we can assume that tasks are idempotent, sure that simplifies the problem. I'm not sure about variations of the arguments though (we can imagine simple scenarios where the arguments themselves would vary in the workflow replay, and we probably don't want to spin up a new activity each time). A more generic solution would consist in giving the ability to control the task id generation from the What I don't like here overall is that it's a trap for users, it's very easy to assume that everything will work well regarding the code above, and if you don't understand this basic simpleflow assumption you'll fail. Back to the drawing board, stay tuned ;-) |
Closing this old one, idempotent tasks mostly do the job \o/ |
New variant of this now that we have chains/groups. Example of a buggy decider (given "my_task" is not configured as idempotent): self.submit(
Group(
Chain(ActivityTask(my_task, 1),
ActivityTask(my_task, 2))
Chain(ActivityTask(my_task, 3),
ActivityTask(my_task, 4))
)
) => 1st decision: 2 activities are scheduled:
Now imagine that my_task-1 finishes. A new decision is triggered. => 2nd decision:
In the end we will finish with 4 tasks executed:
=> probably not what we wanted... |
To find a task in the workflow's history, the executor assigns a unique id to the task. It increments the last id associated with the task's name:
However it does not work if the execution flow changes like in simple implementation of a retry on a timeout:
Let's simulate the execution of this workflow:
In the first run, the executor:
1
totask1
.2
totask2
.futures.wait()
.task1
with id1
andtask2
with id2
.In the second run:
task1
timed out.task2
finished at the same time.1
totask1
.2
totask1bis
and finds it in the history. It mistakenly put the result oftask2
intask1bis
(they are not even called with the same argument).3
totask2
and does not find it in the history.futures.wait()
The issue there is that the id of a task is generated with respect to the code that is evaluated before and this code may change depending on the execution of the workflow.
Assuming that tasks are idempotent would make things easier, because the executor would only need to hash the name and input of a task to generate its id. However if tasks can have side-effects (and we cannot ensure they cannot), we need to provide another mechanism. I thought that with could mark a computation as idempotent with a
cached
ormemoized
attribute. But we still need to handle other computations.The text was updated successfully, but these errors were encountered: