Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully stop celery tasks when calling helm upgrade #5412

Open
2 tasks done
RonZhang724 opened this issue Mar 27, 2019 · 4 comments
Open
2 tasks done

Gracefully stop celery tasks when calling helm upgrade #5412

RonZhang724 opened this issue Mar 27, 2019 · 4 comments

Comments

@RonZhang724
Copy link

RonZhang724 commented Mar 27, 2019

Checklist

  • I have checked the issues list for similar or identical feature requests.
  • I have checked the commit log to find out if a feature was already implemented in master.

Brief Summary

I have a running minikube to deploy my rabbitMQ, flower, and celery workers. Each task will take approximately 30 seconds or longer to be completed. When I run a bunch of tasks at once, they will be put into the queue to wait to be processed by workers. However, when I upgrade the workers using new image by running helm upgrade myworker /path/to/helmchart The old pod is terminated, and all the running tasks and waiting tasks are nuked and gone. Hence, a graceful way of stoping those tasks should probably be implemented.

Design

Architectural Considerations

Helm and Kubernetes are involved, therefore, some orchestration might be required.

Proposed Behavior

helm upgrade myworker /path/to/helmchart is run, the pod terminating process should wait until all the existing tasks have been processed. Then, a new pod will be spun up that will process new tasks using the new code.

Proposed UI/UX

None

Diagrams

N/A

Alternatives

Similar question has been asked on Stack overflow https://stackoverflow.com/questions/55334885/celery-task-stops-when-calling-helm-install And the author proposed that

"using the pod PreStop hook OR creating something that will prevent the task from stopping."

might work. Thanks for noticing, and I will keep exploring solutions to this problem.

@thedrow
Copy link
Member

thedrow commented Mar 28, 2019

Related issue #4213.

The solution would be to send a SIGINT to the master process and wait until it exits before allowing to shut down.

@thedrow
Copy link
Member

thedrow commented Apr 2, 2019

Sorry, not SIGINT but SIGTERM.

Does that work for you?

@RonZhang724
Copy link
Author

RonZhang724 commented Apr 9, 2019

Sorry about the late response because I was experimenting different ways to get this to work. Thanks to the hint, I looked into the lifecycle of Kubernetes pods and found out that there is this thing called "helm hook", that will get executed in different phases for deployment. One of the hooks that I found useful is the pre-upgrade hook. It requires successful execution of a command specified in the hook yaml file before new pod gets deployed and replace the old pod.

Now my question would be: how to create an image to check if there is any tasks waiting to be processed by the worker pod?

@thedrow
Copy link
Member

thedrow commented Apr 11, 2019

You can of course inspect a worker if gossip is enabled.
However, when you shut down the Celery master with SIGTERM we wait for all tasks to be executed before exiting.

I'm not sure if you really need a helm hook for this.
You just need to extend k8s' timeout since it already sends a SIGTERM to PID 1.

Ensure that Celery master is PID 1 and that you have configured terminationGracePeriodSeconds to a large enough number.

@auvipy auvipy added this to the Future milestone Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants