Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery Kubernetes Operator CEP #1 #29

Merged
merged 3 commits into from
Dec 24, 2020

Conversation

gautamp8
Copy link
Collaborator

This PR adds a draft CEP for Celery Kubernetes Operator. For now, it is written keeping Celery 4.X in mind.

It is ready for the first round of reviews by maintainers.

Copy link
Member

@thedrow thedrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, technically.

I'll review most of the style later as this CEP needs to be formatted according to the template.



Scope
=====
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nitpick:
Our template includes all primary headings for every CEP.
If you have other headings, they should be included in one of the sections defined by the template.

CRD, controller and KEDA together(More to be explored here). We'll also
support YAML apply based deployments.

Additionally, Helm approach is extensible in the sense that we can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention is for the Controller to deploy and manage the brokers too if it is possible in the platform Celery is being deployed on.
We should consider including their respective k8s operators in this operator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I'm yet to explore on that part.

Operators are a relatively new concept and very few are stable and maintained actively by the respective organization. I'm planning to read some failure/downtime stories for some insights on where operators go wrong. It'll help in being careful with what we're shipping with Celery operator.



Motivation & Rationale
======================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are two different sections according to the template.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll update the proposal to strictly reflect the template.



TODOs for Exploration
=====================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should include a section in the template for future work.
It's a good idea.

- Helm chart to install the operator along with a broker of choice
- Add role based access control section for the operator
- Ingress Resource
- KEDA Autoscaling Implementation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to help with this one. Currently, we implemented APIMetricScaler in KEDA which allows users to use any arbitrary API for scaling (for example Flower API): https://keda.sh/docs/2.0/scalers/metrics-api/

I personally had a good experience with using KEDA to scale Celery.

@turbaszek
Copy link

turbaszek commented Sep 13, 2020

I see that the proposal suggests using KEDA for autoscaling and I think this a good idea. That's what we use in Apache Airflow official helm chart for autoscaling Celery workers.

However, there's one thing to remember when scaling down the number of celery workers: terminationGracePeriodSeconds. I assume that all users expect workers to be terminated gracefully meaning that no task will be lost/terminated when running. This is particularly problematic with long-running tasks.

https://www.polidea.com/blog/application-scalability-kubernetes/#tips-for-hpa

@thedrow
Copy link
Member

thedrow commented Sep 13, 2020

Is there a way to notify Kubernetes to extend the pod's termination grace period?

@turbaszek
Copy link

Is there a way to notify Kubernetes to extend the pod's termination grace period?

@thedrow users can adjust this parameter in pod definition. But this will differ from use case to use case. One possible hack is to set it to infinity and then terminate pod using preStop hook - but it's a hack and may affect cluster behavior. That was the main problem we had in Apache Airflow: tasks executed on Celery workers can take arbitrary long time (from seconds to hours).

Here is a discussion about this problem and workarounds from Kubernetes community:
kubernetes/kubernetes#45509
and to related KEPs:
kubernetes/enhancements#1828
kubernetes/enhancements#1888


This project attempts to solve(or automate) these issues. It is aiming to bridge the gap between application engineers and infrastructure operators who manually manage the celery clusters.

Moreover, since Celery is written in Python, we plan to use open-source `KOPF <https://github.com/nolar/kopf>`_\ (Kubernetes Operator Pythonic Framework) to write the custom controller implementation.
Copy link

@turbaszek turbaszek Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Moreover, since Celery is written in Python, we plan to use open-source `KOPF <https://github.com/nolar/kopf>`_\ (Kubernetes Operator Pythonic Framework) to write the custom controller implementation.
Moreover, since Celery is written in Python, we plan to use open-source `KOPF <https://github.com/nolar/kopf>`_\ (Kubernetes Operator Pythonic Framework) to write the custom controller implementation.

It seems that the original project was abandoned: https://github.com/zalando-incubator/kopf And the fork seem to be maintained by a single person (at least 1 person did 95%+ of commits in last month). I'm not sure if it is a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can collaborate with https://github.com/nolar/kopf as well, though your concerns are valid. we usually try to offer a helping hand to the projects we rely on. I like the Kopf though O:)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nolar wrote a detailed post on the same - https://medium.com/@nolar/kopf-is-forked-cdca40026ea7
He plans to continue the development on nolar/kopf in the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually try to offer a helping hand to the projects we rely on

That was also my thought, yet it's more to maintain and more to learn

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for mentioning and for your support! ;-)

The previous (original) repository was maintained by one single person too (the same person, actually — me), so there is no big difference with the fork. The 95% commits in the last month is what actually was in my WIP repo for the past year, just finished, tested, pushed, and merged.

Regarding the project maintainability: the plans are to start building a community asap, probably even in 2020 (though, most likely, in 2021). But with zero knowledge on how to do that, this looks like a challenge — I'm reading articles on that now; and trying to choose a name which is not used on GitHub yet. Another option is joining one of the existing communities/organizations with established workflows — needs a comparison of pros & cons. This is something on my table to solve after the current backlog of urgent and pending things is done. But these plans are too fuzzy and uncertain to even mention them in the blog post.

This might be beneficial for the project both short-term, as it gets the boost, and long-term, as it does not become abandoned/unmaintained again.

I hope, this clarifies the future of Kopf for your decision making. If you have any questions — feel free to ask.

Copy link

@turbaszek turbaszek Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache Software Foundation has an incubator for early projects http://incubator.apache.org . I think going this way has an advantage of established "ways of doing things" (20+ years). I recommend this talk https://www.youtube.com/watch?v=mD0Lh7si5Hc&t=2s

CNCF has an idea of "sandbox projects" https://www.cncf.io/sandbox-projects/ but I'm not that familiar with this one but from what I see (from KEDA) the projects are more on their own than in ASF.

I'm happy to help 🚀

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is description of incubation processes:

@nolar let me know if you would like any help (I'm already PMC of Apache Airflow).

@gautamp8
Copy link
Collaborator Author

gautamp8 commented Sep 16, 2020

I'll review most of the style later as this CEP needs to be formatted according to the template.

@thedrow I've updated the CEP to strictly reflect the template. Kept the Future Work section as it is since you mentioned it'll be good to have.

Copy link
Member

@thedrow thedrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read the entire document now that I've released Celery 5 and I have a little bit more free time.

Overall, this is good work.

There are a few things I'd like you to address:

  • Operator should be aware of the number of processes/greenlets the worker is running and must make autoscaling decisions based on that.
    If we use greenlets, we only utilize one CPU per process and our main resource is I/O.
    Further work could be done there when calium or another network management platform is installed. We should investigate this further.
    If we use the prefork pool, we should count the number of workers to calculate concurrency.
  • We may wish to connect to various Celery signals to notify Kubernetes of certain events which we can handle.
    For example if we didn't execute a task for a certain amount of time and there are messages in the queue, we may wish to terminate the worker as it is malfunctioning.
    Another example is an internal error in Celery.
    This requires a Python library.
  • The CEP should address how the operator intends to install the message broker/results backend.
    This can possibly be done using the app's configuration. I think that this is a feature we want to implement at this stage for RabbitMQ and Redis.
    Support for other components can come on demand.
    We'll be doing this for Celery 5 anyway so it's worth investigating it now.
  • I'd like you to invest a little bit more time on coming up with new features. Your feedback is most valuable.

@gautamp8
Copy link
Collaborator Author

gautamp8 commented Oct 4, 2020

Thanks for reviewing in detail. I'll get back on these comments with some questions/inputs as soon as I can.

@gautamp8
Copy link
Collaborator Author

* Operator should be aware of the number of processes/greenlets the worker is running and must make autoscaling decisions based on that.
  If we use greenlets, we only utilize one CPU per process and our main resource is I/O.
  Further work could be done there when calium or another network management platform is installed. We should investigate this further.
  If we use the prefork pool, we should count the number of workers to calculate concurrency.

Yeah. So in the case of the prefork pool, we could calculate the number of worker pods by taking into account min of CPU cores and concurrency value set by the user. For Solo, we can only increase the number of pods I think.

For greenlets, we could accept the min and max concurrency values from CRD, similar to what celery already does, and only update the concurrency setting until we reach the threshold value specified for the workers. Beyond that, we'll increase the number of pods directly.

That said, I'd have to look into how operator could take care of this custom logic along with KEDA/HPA scaling implementation. Also, I've not worked with Cilium yet, I'll explore it as well.

* We may wish to connect to various Celery signals to notify Kubernetes of certain events which we can handle.

Do you mean these signals?

  For example if we didn't execute a task for a certain amount of time and there are messages in the queue, we may wish to terminate the worker as it is malfunctioning.
  Another example is an internal error in Celery.

And yes, we should do this. I'll read more on all the signals and update the proposal with the ones we could handle. One additional thing I had in mind apart from examples you gave is implementing a simple kind of backpressure protocol, in case one of the components is overwhelmed, we emit certain events and propagate them back to the application or the starting point.

  This requires a Python library.

Can you please elaborate a bit more on this?

* The CEP should address how the operator intends to install the message broker/results backend.
  This can possibly be done using the app's configuration. I think that this is a feature we want to implement at this stage for RabbitMQ and Redis.
  Support for other components can come on demand.
  We'll be doing this for Celery 5 anyway so it's worth investigating it now.

Yes, that makes sense. I did some exploration on how can we go forward with installing Redis/RabbitMQ cluster once Celery CR is created. The zero version solution could be simply to package the simple yaml files with very basic customization properties along with the operator. Support for those properties could be added to CRD in an optional section.

A more scalable approach would be to use the standard helm package manager. The general idea is to decouple the installation/management of Redis/RMQ from the handlers taking care of celery workers.

I found the actively maintained helm charts by Bitnami for Redis and RMQ. However, the catch here is that Python handler taking care of creation/updation needs to somehow notify the Kubernetes to install the chart using helm. One way I thought of is that we also bundle helm operator with our operator. We then create a resource that this operator understands from our handler and it takes care of the rest. I'd like to discuss on the pros and cons of this approach. If someone has a better idea, I can explore it further.

* I'd like you to invest a little bit more time on coming up with new features. Your feedback is most valuable.

Yes! One additional thing operator could do is to handle the dev, test and production sandboxes as well for the given CR specification. We could use Kustomize for the same.

On a side note, I personally/or my company has not been a power user of Celery as such. We use really basic things so my knowledge is limited. But, I'm willing to explore and learn more on Kubernetes & Celery fronts. I feel Celery is awesome and at the same time a huge project to understand ins and outs of. Are there any places I can read more about how people use Celery in production?

For now, I'm gonna go ahead and try to read whole documentation of Celery to filter out some use-cases operator could take care of.

@gautamp8
Copy link
Collaborator Author

gautamp8 commented Nov 1, 2020

Sorry, I couldn't be active this past month. I'm still working on my knowledge of distributed systems in general. @thedrow let me know your inputs for my comment above whenever you can and I'll update the proposal accordingly.

@dejlek
Copy link

dejlek commented Dec 16, 2020

For some use-cases of ours the ephemeral storage is quite important too, not just CPU and memory as some of our tasks are heavily using this feature of AWS for an example... So, it is important to be flexible with parameters that could affect autoscaling.

On a side note - It would be nice if Celery had cluster scaling API that we could "just" implement...

@thedrow
Copy link
Member

thedrow commented Dec 16, 2020

On a side note - It would be nice if Celery had cluster scaling API that we could "just" implement...

We're working on it. See #27.

@thedrow
Copy link
Member

thedrow commented Dec 16, 2020

@Brainbreaker I was extremely busy with Celery 5 myself.
Let's pick up the pace.

I think this PR is a fine first draft and we can merge it.
Before we do that I'd like you to open an issue which details all our open items.

@gautamp8
Copy link
Collaborator Author

Created an issue with open points. We can continue the discussion there on those points.
#30

@thedrow thedrow merged commit be6098e into celery:master Dec 24, 2020
@gautamp8
Copy link
Collaborator Author

@thedrow Couple of things around next steps to start development -

I'm assuming we'll be making a new Github repository for celery operator under celery org. How does that process work?
I also have this prototype repo on my personal account and was planning to extend over that only.

I've been through the contributing guide. However, let me know if are there any particular things you have in mind specific to this project's execution. As a first step, I can come up with a detailed subtasked plan and approximate timelines.

@thedrow
Copy link
Member

thedrow commented Dec 27, 2020

We can move the prototype repo to the organization.

@gautamp8
Copy link
Collaborator Author

We can move the prototype repo to the organization.

Great. I'll have to be part of the celery organization for that.
I tried transferring ownership and the message was "You don’t have the permission to create public repositories on celery"

@gautamp8 gautamp8 changed the title Celery Kubernetes Operator CEP (Draft) Celery Kubernetes Operator CEP #1 Jan 28, 2021
@gautamp8 gautamp8 deleted the kubernetes-operator branch January 28, 2021 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants