You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today when a workflow execution is closed (completed, failed or timed out) or when an activity task has timed out, we stop heartbeating but the worker continue its work.
In some cases that might be a good idea:
1/ we want the activity to finish so it can cleanup after itself,
2/ if the task is 99% done, maybe we want it to finish anyway (dubious?)
Anyway in most use cases at Botify this is counter-intuitive and as simpleflow workers are limited in number, we can quickly reach a point where all workers are busy working for tasks that won't complete anyway on the SWF side, leaving new workflows without enough workers.
Hence, we should change how simpleflow handles closed activity tasks (actually: UnknownResource errors when sending a heartbeat):
MVP: kill the worker when this case happen
put this behing a feature flag ; per activity task probably ; we could do that globally on a simpleflow worker but it happens we don't use those for now at Botify (we use private code for launching processes), and they're probably buggy anyway
allow tasks to define a cleanup action so the worker can be killed and cleanup can take place after that (not easy a priori, since we'd probably need a way to pass parameters between the two things..)
The text was updated successfully, but these errors were encountered:
Note that we use SIGKILL here, which might look a bit violent for the
purpose of stopping a process (the process won't be able to cleanup
anything before dying for instance). This should probably be a SIGTERM
but we already handle SIGTERM signals today and we alias it to a
graceful shutdown. Maybe we should change this behavior, but that's a
first version.
Closes#88.
Today when a workflow execution is closed (completed, failed or timed out) or when an activity task has timed out, we stop heartbeating but the worker continue its work.
In some cases that might be a good idea:
Anyway in most use cases at Botify this is counter-intuitive and as simpleflow workers are limited in number, we can quickly reach a point where all workers are busy working for tasks that won't complete anyway on the SWF side, leaving new workflows without enough workers.
Hence, we should change how simpleflow handles closed activity tasks (actually: UnknownResource errors when sending a heartbeat):
The text was updated successfully, but these errors were encountered: