Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill worker when the activity task is closed #88

Closed
3 tasks
jbbarth opened this issue May 3, 2016 · 0 comments
Closed
3 tasks

Kill worker when the activity task is closed #88

jbbarth opened this issue May 3, 2016 · 0 comments
Assignees

Comments

@jbbarth
Copy link
Collaborator

jbbarth commented May 3, 2016

Today when a workflow execution is closed (completed, failed or timed out) or when an activity task has timed out, we stop heartbeating but the worker continue its work.

In some cases that might be a good idea:

  • 1/ we want the activity to finish so it can cleanup after itself,
  • 2/ if the task is 99% done, maybe we want it to finish anyway (dubious?)

Anyway in most use cases at Botify this is counter-intuitive and as simpleflow workers are limited in number, we can quickly reach a point where all workers are busy working for tasks that won't complete anyway on the SWF side, leaving new workflows without enough workers.

Hence, we should change how simpleflow handles closed activity tasks (actually: UnknownResource errors when sending a heartbeat):

  • MVP: kill the worker when this case happen
  • put this behing a feature flag ; per activity task probably ; we could do that globally on a simpleflow worker but it happens we don't use those for now at Botify (we use private code for launching processes), and they're probably buggy anyway
  • allow tasks to define a cleanup action so the worker can be killed and cleanup can take place after that (not easy a priori, since we'd probably need a way to pass parameters between the two things..)
@jbbarth jbbarth self-assigned this May 3, 2016
jbbarth added a commit that referenced this issue May 18, 2017
Note that we use SIGKILL here, which might look a bit violent for the
purpose of stopping a process (the process won't be able to cleanup
anything before dying for instance). This should probably be a SIGTERM
but we already handle SIGTERM signals today and we alias it to a
graceful shutdown. Maybe we should change this behavior, but that's a
first version.

Closes #88.
jbbarth added a commit that referenced this issue May 18, 2017
Kill worker on UnknownResourceFault's during a heartbeat (#88)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant