Update cpu and memory requests after cluster creation #88

jacobtomlinson · 2018-07-20T15:42:20Z

On our Pangeo deployment users get a default worker-template.yaml which allows them to create clusters by simply running cluster = KubeCluster() without having to worry about what a kubernetes even is.

However in some occasions people want to be able to update the memory and cpu ratios of their workers depending on what they are running. The current workflow for this is to either specify the whole template as a dict or to copy the default worker-template.yaml, understand it, update the values and then user KubeCluster.from_yaml().

Personally I ended up writing a couple of helper functions in my notebook which look like this:

def update_worker_memory(cluster, new_limit):
    cluster.pod_template.spec.containers[0].resources.limits["memory"] = new_limit
    cluster.pod_template.spec.containers[0].resources.requests["memory"] = new_limit
    if '--memory-limit' in cluster.pod_template.spec.containers[0].args:
        index = cluster.pod_template.spec.containers[0].args.index('--memory-limit')
        cluster.pod_template.spec.containers[0].args[index + 1] = new_limit
    return cluster

def update_worker_cpu(cluster, new_limit):
    cluster.pod_template.spec.containers[0].resources.limits["cpu"] = new_limit
    cluster.pod_template.spec.containers[0].resources.requests["cpu"] = new_limit
    if '--nthreads' in cluster.pod_template.spec.containers[0].args:
        index = cluster.pod_template.spec.containers[0].args.index('--nthreads')
        cluster.pod_template.spec.containers[0].args[index + 1] = new_limit
    return cluster

This allows me to adjust the worker template after the cluster has been created and all new workers will follow the updated values.

I'm considering how to add this functionality into the core project. I'm inspired by the dask-jobqueue SLURMCluster which allows you to specify cores and memory as kwargs. Therefore perhaps @mrocklin, @jhamman or @guillaumeeb have thoughts.

Before I go charging in to raise a PR I would like to discuss options.

Would it be useful to add methods to the KubeCluster object to update sizes after creation as I am above?
Should we add kwargs to the cluster init and if so should they create the cluster and use the helpers or update the config before creation?
Are there any other ways of specifying memory and cpu that I haven't captured in the examples above?

The text was updated successfully, but these errors were encountered:

mrocklin · 2018-07-20T15:45:41Z

cc @yuvipanda . I suspect that he may have counter-examples where this might be problematic? I'm not sure.

…

On Fri, Jul 20, 2018 at 11:42 AM, Jacob Tomlinson ***@***.***> wrote: On our Pangeo deployment users get a default worker-template.yaml which allows them to create clusters by simply running cluster = KubeCluster() without having to worry about what a kubernetes even is. However in some occasions people want to be able to update the memory and cpu ratios of their workers depending on what they are running. The current workflow for this is to either specify the whole template as a dict or to copy the default worker-template.yaml, understand it, update the values and then user KubeCluster.from_yaml(). Personally I ended up writing a couple of helper functions in my notebook which look like this: def update_worker_memory(cluster, new_limit): cluster.pod_template.spec.containers[0].resources.limits["memory"] = new_limit cluster.pod_template.spec.containers[0].resources.requests["memory"] = new_limit if '--memory-limit' in cluster.pod_template.spec.containers[0].args: index = cluster.pod_template.spec.containers[0].args.index('--memory-limit') cluster.pod_template.spec.containers[0].args[index + 1] = new_limit return cluster def update_worker_cpu(cluster, new_limit): cluster.pod_template.spec.containers[0].resources.limits["cpu"] = new_limit cluster.pod_template.spec.containers[0].resources.requests["cpu"] = new_limit if '--nthreads' in cluster.pod_template.spec.containers[0].args: index = cluster.pod_template.spec.containers[0].args.index('--nthreads') cluster.pod_template.spec.containers[0].args[index + 1] = new_limit return cluster This allows me to adjust the worker template after the cluster has been created and all new workers will follow the updated values. I'm considering how to add this functionality into the core project. I'm inspired by the dask-jobqueue SLURMCluster <http://dask-jobqueue.readthedocs.io/en/latest/generated/dask_jobqueue.SLURMCluster.html#dask_jobqueue.SLURMCluster> which allows you to specify cores and memory as kwargs. Therefore perhaps @mrocklin <https://github.com/mrocklin>, @jhamman <https://github.com/jhamman> or @guillaumeeb <https://github.com/guillaumeeb> have thoughts. Before I go charging in to raise a PR I would like to discuss options. - Would it be useful to add methods to the KubeCluster object to update sizes after creation as I am above? - Should we add kwargs to the cluster init and if so should they create the cluster and use the helpers or update the config before creation? - Are there any other ways of specifying memory and cpu that I haven't captured in the examples above? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#88>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszFgkz62mGYU0fay_WdW8eJcWTvM_ks5uIfpcgaJpZM4VYPj5> .

guillaumeeb · 2018-08-01T19:58:30Z

I'd be happy to help here, but I don't really know how to... I've never used KubeCluster yet, but I'm planning to give it a go in the next few months. From my limited point of view, I'd like to be able to easily specify my worker size when initializing a cluster, so if this is not easily possible yet, an improvement would be welcome.

One thought: does or may this affect Cloud VM flavor also? Are these hard coded in another place when defining the Kubernetes cluster? It will be good if by specifying cpu, mem, and perhaps flavor, the automatically created VM are adapted to those.

jacobtomlinson · 2018-08-01T20:08:16Z

@guillaumeeb you currently specify the whole worker config when you initialise the cluster and can configure everything from resources to docker image.

The admin of the cluster can also specify a default config to use if the user doesn't specify one. This issue is discussing how to make tweaks to resources in the default config on the fly without having to specify a whole new config.

guillaumeeb · 2018-08-01T20:21:57Z

@jacobtomlinson does Kubernetes chose by itself the machine type to use according to what is specified in the pod spec? Looking at http://dask-kubernetes.readthedocs.io/en/latest/#quickstart, I don't see anything like n1-standard-2 or another flavor.

Sorry if this is a dumb question, I still need to learn how Kubernetes is working.

jacobtomlinson · 2018-08-01T20:36:42Z

That kind of thing is down to kubernetes to decide, you don't really need to worry about it. The way you set up your kubernetes cluster will decide which types are used, and kubernetes will try to pack as efficiently as possible into the available space. For example we used kops to create a kubernetws cluster on AWS which is made up of a mix of m5.2xlarge and m5.4xlarge instances. But when using dask kubernetes we ask for pods with 1 cpu and 6GB of memory and they get packed into those instances.

…

On Wed, 1 Aug 2018, 21:21 Guillaume EB, ***@***.***> wrote: @jacobtomlinson <https://github.com/jacobtomlinson> does Kubernetes chose by itself the machine type to use according to what is specified in the pod spec? Looking at http://dask-kubernetes.readthedocs.io/en/latest/#quickstart, I don't see anything like n1-standard-2 or another flavor. Sorry if this is a dumb question, I still need to learn how Kubernetes is working. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#88 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABiUYtJb12GCWQK89YTf2ngg5YiG02Mjks5uMg3lgaJpZM4VYPj5> .

guillaumeeb · 2018-08-01T20:52:02Z

So looking at core.py, currently the easiest way to do what you say is to use the following method:

pod_spec = make_pod_spec(image='daskdev/dask:latest',
                         memory_limit='4G', memory_request='4G',
                         cpu_limit=1, cpu_request=1,
                         env={'EXTRA_PIP_PACKAGES': 'fastparquet git+https://github.com/dask/distributed'})

Which implies that you understand all the rest.

Again I've only a limited view of all this, but it sounds like it would be a welcome functionality here. The way I would do that is to add your update methods or an equivalent, and call that in the __init__ with associated kwargs, just after pod_template is secured, somewhere here https://github.com/dask/dask-kubernetes/blob/master/dask_kubernetes/core.py#L174, or below.

Updating after creation looks like and edge case to me, this is not what we are doing in dask-jobqueue. But I can understand in some situation you might want to do this.

jacobtomlinson · 2018-08-02T09:31:33Z

Yes I agree that updating it after creating is an edge case. Perhaps a better way would be to allow users to call make_pod_spec with missing kwargs that get filled in from the defaults.

E.g

pod_spec = make_pod_spec(memory_limit='4G', memory_request='4G',
                         cpu_limit=1, cpu_request=1)

In this example the image and extra packages would come from the default. There is no reason why a scientist should even know what a docker image is.

mturok · 2018-08-14T20:40:43Z

Not sure it's such an edge case. It's possible to think about running a single dask scheduler for multiple tasks, each of which may have a slightly different set of arguments/variables.

jacobtomlinson · 2018-08-14T20:44:10Z

@mturok interesting point. However the example I put above would modify the scheduler which would not be great for multi-use clusters.

yuvipanda · 2018-09-06T22:57:24Z

As long as the pods get created with matching values, there should be no pathological cases.

mamoit · 2018-11-27T17:03:40Z

Quick question, that may be related to this issue.
Would it be possible to use multiple pod specs and scale them according to what the client requests?
Like the use case described here for the pure dask distributed.

client.submit(process, d, resources={'GPU': 1})

The main use case is to have some instances with GPUs and others without, and use them as needed.

I guess this would require another scheduling layer to fit the work in the smallest pool possible, plus some logic for when a pool is maxed out but the other one is not.

jacobtomlinson · 2018-11-28T09:40:34Z

That is an interesting idea but would involve upstream changes in dask and distributed. For now I would just create multiple clusters.

@mrocklin I'm sure you are very busy at the moment do you have any thoughts on this?

guillaumeeb · 2018-11-28T09:58:56Z

We are thinking about this in dask/distributed#2118 and dask/distributed#2208 (comment).

But we didn't talk about adaptive part yet ("scale them according to what the client requests"), which would probably need modifications too.

mrocklin · 2018-11-28T13:35:51Z

I think that this seems like a reasonable request. I think that it would require additional logic to the Adaptive class that looked at resources when making requests. It's non-trivial work, but seems reasonably in-scope.

jacobtomlinson · 2019-10-14T10:35:42Z

As work is ongoing in the issues that @guillaumeeb mentioned and much of the scaling logic in dask-kubernetes has been replaced with SpecCluster from distributed I'm going to close this in favour of dask/distributed#2118 in particular.

jacobtomlinson added the enhancement label Aug 5, 2018

jhamman mentioned this issue Aug 7, 2018

Using environment variables to update values in worker-spec.yml #92

Closed

jacobtomlinson closed this as completed Oct 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cpu and memory requests after cluster creation #88

Update cpu and memory requests after cluster creation #88

jacobtomlinson commented Jul 20, 2018

mrocklin commented Jul 20, 2018 via email

guillaumeeb commented Aug 1, 2018

jacobtomlinson commented Aug 1, 2018 •

edited

guillaumeeb commented Aug 1, 2018

jacobtomlinson commented Aug 1, 2018 via email

guillaumeeb commented Aug 1, 2018

jacobtomlinson commented Aug 2, 2018 •

edited

mturok commented Aug 14, 2018 •

edited

jacobtomlinson commented Aug 14, 2018

yuvipanda commented Sep 6, 2018

mamoit commented Nov 27, 2018

jacobtomlinson commented Nov 28, 2018

guillaumeeb commented Nov 28, 2018

mrocklin commented Nov 28, 2018

jacobtomlinson commented Oct 14, 2019

Update cpu and memory requests after cluster creation #88

Update cpu and memory requests after cluster creation #88

Comments

jacobtomlinson commented Jul 20, 2018

mrocklin commented Jul 20, 2018 via email

guillaumeeb commented Aug 1, 2018

jacobtomlinson commented Aug 1, 2018 • edited

guillaumeeb commented Aug 1, 2018

jacobtomlinson commented Aug 1, 2018 via email

guillaumeeb commented Aug 1, 2018

jacobtomlinson commented Aug 2, 2018 • edited

mturok commented Aug 14, 2018 • edited

jacobtomlinson commented Aug 14, 2018

yuvipanda commented Sep 6, 2018

mamoit commented Nov 27, 2018

jacobtomlinson commented Nov 28, 2018

guillaumeeb commented Nov 28, 2018

mrocklin commented Nov 28, 2018

jacobtomlinson commented Oct 14, 2019

jacobtomlinson commented Aug 1, 2018 •

edited

jacobtomlinson commented Aug 2, 2018 •

edited

mturok commented Aug 14, 2018 •

edited