-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Volcano as CNCF Sandbox. #318
Conversation
+1 |
1 similar comment
+1 |
Everyone, please no more “+1”s here. They do not provide useful information and do not make a difference in the TOC’s decisions. |
I am familiar with this project and believe it would be a good addition to the CNCF sandbox to promote industry collaboration towards shared implementations of common extensions to Kubernetes to better support batch AI/ML, big data and other similar batch workloads. |
@quinton-hoole have they considered LF AI? There's a scope question here potentially and there may be other organizations that are a better fit: https://landscape.lfai.foundation |
The Volcano is a common extensions to Kubernetes for batch workload (A Kubernetes Native Batch System), and its target is to "help batch workload/application to be cloud native"; the batch workload includes not only AI, BigData, but also HPC, e.g. Gene, and others. So, CNCF seems matching our target better :) |
I agree Klaus.
…On Tue, Nov 12, 2019, 18:35 Klaus Ma ***@***.***> wrote:
@quinton-hoole <https://github.com/quinton-hoole> have they considered LF
AI? There's a scope question here potentially and there may be other
organizations that are a better fit: https://landscape.lfai.foundation
The Volcano is a common extensions to Kubernetes for batch workload (A
Kubernetes Native Batch System), and its target is to "help batch
workload/application to be cloud native"; the batch workload includes not
only AI, BigData, but also HPC, e.g. Gene, and others. So, CNCF seems
matching our target better :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#318?email_source=notifications&email_token=AKNAA6B53KN4E6J2ZNXW3D3QTNRX3A5CNFSM4JJQVDY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED4VHQI#issuecomment-553210817>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKNAA6C3IOINBLL7K3BA7QDQTNRX3ANCNFSM4JJQVDYQ>
.
|
hi team, is there anything I can help for next step? :) |
@k82cn my advice would be to schedule a presentation to one of the CNCF SIGs, I think in this case, App Delivery may be best (e.g., https://github.com/cncf/sig-app-delivery) as I can't think of another SIG that would be better, @quinton-hoole? |
Is that the new process? :)
hm..., IMO, SIG Runtime seems better for Volcano according to the chart of them; is SIG Runtime ready to do that? |
yes we are requiring all projects to present to a SIG first SIG Runtime could be great but it's not formed yet, I believe we are almost there cc: @amye |
@caniszczyk Yes, I discussed with @amye today, and she's busy tying up some loose ends to bring SIG Runtime to a TOC vote. |
Signed-off-by: Klaus Ma <klaus1982.cn@gmail.com>
84ba1f9
to
046fca0
Compare
We reviewed it and our recommendation is to accept to Sandbox. The next step is getting sponsors in the TOC. Here are the recommendation and presentation record :) /cc @raravena80 @amye |
I've just been reviewing the presentation and recommendation. It looks good but I have a couple of questions:
|
I think maybe @k82cn can better answer these questions but I'd like to pitch in for
There is Cyclone, K8s native but it seems to focus more on AI. There are also a few similar projects but they are not K8s native. For example, Airflow, Luigi, and Oozie. Then Peloton is a mixed batch, stateless and stateful jobs scheduler but it mainly works under Mesos. An interesting feature of Volcano is that it allows for DRF allocation in K8s, which I think no other project implements. Something that the Big Data folks running something like Spark or Flink would remember coming from the Mesos world being the default. |
I think @raravena80 cover most of the similar projects. Beside DRF allocation, Volcano also provides other features, e.g. Gang, Queue, to run kubeflow, spark and so on better on k8s.
Some of them decided to use Volcano + Kubeflow or Spark for their ML/BigData platform according to offline discussion; but it's up to the adopters to decide when update the status to avoid any media communication concerns. |
+1024 |
I would love to vote for Volcano to join CNCF Sandbox projects. In ug-machine-learning, volcano is widely adopted by most of the distributed training operator like tf-operator, pytorch-operator, mpi-operator, etc. As Kubernetes default scheduler is not well designed for batch jobs, a lots of features user need like gang-scheduling, binpack are missing. Users from traditional platform plan to move to Kubernetes have to leverage secondary scheduler solutions. Volcano is the most popular one and it solve most of the common problems user have and becomes the essential component to run big data and machine learning workloads on Kubernetes. |
@k82cn is there an Adopters file that they are updating status in? (We don't have a particular gate on usage for Sandbox acceptance, but evidence of adoption could be a persuasive argument to help attract TOC Sponsors.) |
@lizrice , discuss with some adopters about latest status; for now, 3 adopters updated status, e.g. iQiyi (Staging, pre-production), Xiaohongshu.com (Production), HuaweiCloud (Production). It'll take time to get more adopters to update the status; do you think that's enough for a sandbox application? |
BTW, there're also two KubeCon presentations from different companies on use case of Volcano: |
I am happy to sponsor Volcano for Sandbox. (Could have sworn I already added a comment to that effect yesterday - maybe I failed to hit the green button!) Thank you SIG Runtime for your review, and @k82cn for the additional answers |
I am happy to sponsor as well. |
@lizrice , @sheng-liang thanks very much ! |
I find sig-scheduling may be doing something similar with the new scheduler framework. Is there any relationship or difference ? |
|
I am happy to sponsor Volcano for Sandbox. Support for running batch workloads on Kubernetes is in high demand by Big Data/Machine Learning projects. And having Volcano in the Sandbox can foster community collaboration and knowledge sharing in the Cloud Native space. @k82cn thank you for providing all the extra information! Also great to see GPU topology/share support on 2020 roadmap. |
@alena1108 , thanks very much for your sponsoring :) |
@amye @caniszczyk Volcano now has 3 TOC sponsors for Sandbox. Any next steps for the project team? Thanks. |
We'll work with them on transferring assets over, after that's complete I will merge this in. |
Thanks @amye can you start on the onboarding checklist here:
|
hey @k82cn can you confirm you and the maintainers have read https://www.cncf.io/services-for-projects/ :) |
Confirmed. I and the maintainers already read https://www.cncf.io/services-for-projects/ . |
#318 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
This is the proposal to add Volcano to the CNCF.
Name of Project: Volcano
Description:
Volcano is a batch system built on Kubernetes for the above requirements. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including: machine learning/deep learning, bioinformatics/genomics and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, PyTorch, MPI, etc, which Volcano integrates with.