Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes, disable dask scheduler and workers auto resheduling #112

Closed
VMois opened this issue Nov 21, 2018 · 18 comments
Closed

Kubernetes, disable dask scheduler and workers auto resheduling #112

VMois opened this issue Nov 21, 2018 · 18 comments

Comments

@VMois
Copy link

VMois commented Nov 21, 2018

Hello,

The issue is more related to Kuberntes and GCP but anyway I want to get some pieces of advice. I have created a dynamic Dask k8s cluster (using dask-kubernetes) on GCP and set up node autoscaling. An initial state of dask cluster is one scheduler pod (with KubeCluster) and one worker pod (created by scheduler). Everything is working well, but when scheduler starts to add new workers (due to high load) and GCP begin to scale up nodes, quite often I can experience a scheduler pod rescheduling. Kubernetes or GCP decides to delete scheduler and recreate it on another node. Of course, because of that all tasks are deleted, I receive an error and cluster becomes unstable. Have you ever experienced such behavior?

To tackle this problem, I have added nodeSelector to the scheduler pod definition and it working good (at least it looks like this). But, also, the same situation appears for the workers. They are deleted and recreate and you lose your results. In this situation, you cannot easily setup nodeSelector labels to dynamically create workers.

It would be really great to have a feature (property) in pod definition that says: "Don't delete/move this pod until it will fail or succeed". Is it makes sense add such functionality to k8s project?

Maybe, you have ideas for a different solution?

Thank you for your attention.

@jacobtomlinson
Copy link
Member

That is an interesting problem! The way to solve this would be with pod disruption budgets on your scheduler. You should set your scheduler to require a minimum of 1 pod in the budget.

@VMois
Copy link
Author

VMois commented Nov 22, 2018

I have already tried pod budgets before with minAvailable = 1 and it didn't work. Kubernetes just creates the second pod on another node and deletes the current one. Also, it doesn't solve the problem with worker rescheduling.

But, I have checked disruption budgets docs one more time and more carefully. I found that you can set maxUnavailable=0 as your budget parameter (info). I will test it tomorrow. Thanks for pointing.

@jacobtomlinson
Copy link
Member

I'm going to close this as it sounds like you have a solution. But please feed back here to let us know how you get on.

@VMois
Copy link
Author

VMois commented Nov 23, 2018

maxUnavailable=0 is not working. I will try to check PDB one more time, but I doubt it will change something.

Edit:
tested with maxUnavailable=0 one more time. The result is the same, PDB is not working.

@jacobtomlinson
Copy link
Member

Thats a shame! I would've expected that to work. Did you set both minAvailable and maxUnavailable at the same time?

Another option could be to explore stateful sets for the scheduler pod. But that would be a last resort.

@VMois
Copy link
Author

VMois commented Nov 23, 2018

No, I have set only maxUnavailable. I can try set both of them later.

Edit:
you cannot setup both maxUnavailable and minAvailable. I don't have an idea why maxUnavailable is not working.

@VMois
Copy link
Author

VMois commented Nov 27, 2018

Is it possible to reopen the issue? Problem is still relevant. I think it's more related to Kubernetes but still affects dask-kubernetes.

@jacobtomlinson
Copy link
Member

I would like to get to the bottom of this problem, so let's reopen it.

But to be clear this isn't a problem with dask-kubernetes, as the issue you are having is with your own custom scheduler pod which is not created as part of this project.

I'm very surprised that setting maxUnavailable to 0 isn't working. Are you sure you are using the right labels, etc?

@VMois
Copy link
Author

VMois commented Nov 27, 2018

A scheduler pod is based on dask-kuberntes and my own image (link). When I tested maxUnavailable last time I have checked all labels few times and my pod is part of Deployment. Maybe, I missed something. I will try to test it one more time today and comment here results. I'm also surprised that maxUnavailable is not working as expected.

Thanks for reopening this issue.

@VMois
Copy link
Author

VMois commented Nov 27, 2018

I have just tested it. Not working. I clearly can see in GCP logs that container was rescheduled, my tasks also failed. Is it makes open an issue in the Kubernetes project? I'm sure, I have configured everything correctly. Example of pdb:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: dask-scheduler
spec:
  maxUnavailable: 0%
  selector:
    matchLabels:
      component: scheduler
      app: dask

Part of the scheduler describe pod command:

Labels:         
  app=dask
  component=scheduler
  pod-template-hash=3761245660
  release=test

Edit:

It's also worth pointing one more time that I'm using Gcloud Kuberntes Engine with their autoscaler. In their docs they mentioned that during scaling DOWN they will respect PDB and other limitations. Maybe, because my problem occurs during scaling UP, there is a probability that they don't tolerate PDB during scaling up.

@jacobtomlinson
Copy link
Member

You may want to ask this somewhere more kubernetes related. Perhaps on Stack Overflow with the kubernetes and gke tags?

@VMois
Copy link
Author

VMois commented Nov 27, 2018

@mamoit
Copy link

mamoit commented Nov 27, 2018

You really have to setup a pod disruption budget in order for your one pod deployment to not be rescheduled.

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: dask-scheduler
spec:
  maxUnavailable: 0
  selector:
    matchLabels:
      component: scheduler
      app: dask

I don't have the % don't know if it changes anything, and I'm using GKE too.
I had the same problem as you but on scale down, and it went away when I added the PDB.

The correct way to solve this though, is to change your deployment to a pod, since it doesn't seem that you are making use of any of the replicaset properties, and you are using single pod deployments.

EDIT: The % doesn't seem to change anything.

@VMois
Copy link
Author

VMois commented Nov 30, 2018

@mamoit Still the same problem, PDB is not working during scale up, only scale down. Also, there is no difference between Deployment and Pod. The only solution I have found for scheduler is to manually add nodeSelector, but the problem still exists for workers.

@jacobtomlinson
Copy link
Member

There is a difference between deployment and pod. If you create a pod that is not part of a deployment the cluster scheduler will not move it around for fear of breaking it.

@mamoit
Copy link

mamoit commented Dec 11, 2018

@VMois The "not move it around for fear of breaking it" part that @jacobtomlinson mentioned is what I thought that could help in your case.
Did you manage to solve this?

@VMois
Copy link
Author

VMois commented Dec 14, 2018

@mamoit I have not managed to solve it with a pod, as I remember:) The only working solution is still adding manually nodeSelector. I will be able to test all your propositions one more time next week.

@jacobtomlinson
Copy link
Member

As this is still ticking along and is not actually an issue with dask-kubernetes but instead a dask kubernetes use case I'm going to close this again in favor of the stack overflow question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants