Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pod schedulers other than schedulerName: default-scheduler #233

Closed
scottyhq opened this issue Feb 13, 2020 · 2 comments
Closed

Comments

@scottyhq
Copy link

scottyhq commented Feb 13, 2020

Currently dask worker pods are spread onto available nodes by the default kubernetes scheduler:

[ec2-user@ip-192-168-60-131 ~]$ kubectl get pod -o yaml dask-cgentemann-osm2020tutorial-nqchvhmy-6e9099fc-3k2s6c -n binder-staging | grep schedule
  schedulerName: default-scheduler

This can lead to scale-down issues with multiple users launching clusters or when pods encounter errors because pods by default spread out on available nodes. For example, we recently observed an issue were many dask pods had an Error status, leading to new nodes being launched to meet capacity. We ended up with 17 nodes running with two dask pods per node instead of packing all pods onto 5 nodes.

JupyterHub deals with this same scenario by packing user-notebook pods onto nodes with a custom userScheduler:
https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/optimization.html#using-available-nodes-efficiently-the-user-scheduler

@yuvipanda suggested a possible solution is simply reusing the jupyter scheduler in dask kubernetes config. Some additional relevant docs here:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/#specify-schedulers-for-pods

@jacobtomlinson
Copy link
Member

This sounds reasonable!

I guess the two steps here would be to expose the schedulerName via the configuration and then document how user's should configure things when running Zero2JupyterHub.

Does that sound right? Or is there anything else we should do here?

@jacobtomlinson
Copy link
Member

The classic KubeCluster was removed in #890. All users will need to migrate to the Dask Operator. Closing.

@jacobtomlinson jacobtomlinson closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants