Feature request: custom VHD or a way to prepull docker images offline #1532

Timvissers · 2020-03-30T06:47:59Z

Request: a way to prepull custom docker images. Docker images that are already available in the worker node so that they do not have to be pulled after the node is being started due to a scale up event.

Context:
We have huge docker images (>10GB) which have already been optimized in size.
I tested pulling a docker image (from a premium ACR from a geolocation that was in place) triggered by a kubernetes job, it takes about 6 minutes.
We run kubernetes jobs for which it's crucial to start as soon as possible. We are dealing with +/- 2m scale up time for a new node, but we cannot deal with 6 extra minutes being lost on the docker pulling.

I saw the current VHD packer scripts, which are already prepulling docker images. This request is to bring this to the customer.

zhiweiv · 2020-03-31T03:13:00Z

We'd like this feature too, we have similar use case.

0x53A · 2020-03-31T16:02:09Z

This is especially important for Windows nodes

jluk · 2020-03-31T19:06:27Z

@Timvissers thanks for opening this request, do I read the root problem is image pull time is too long for your scenario? If so could I retitle your request as "Reduce image pull-time on AKS scale"?

We have options to address that such as integrating project teleport.
https://azure.microsoft.com/en-us/resources/videos/azure-friday-how-to-expedite-container-startup-with-project-teleport-and-azure-container-registry/

zhiweiv · 2020-04-01T04:04:23Z

I think for now, especially for Windows containers, pre cache base images is the easiest and most stable way.

Timvissers · 2020-04-01T06:51:58Z

@jluk Thanks for your comment. I will investigate the teleport, I was unaware of this.

I would suggest to not rename the request to 'reduce pull-time'. Maybe it should be called 'customisable worker nodes' or so?
Because other usages for custom worker nodes besides offline prepulling of docker images could be to install extra (prometheus eg) exporters or filebeat collectors for logging or other software that could be of use for teams on worker nodes.

I think AKS is running a bit behind in the topic of worker node customization compared to other big cloud providers' managed kubernetes solutions.
I do see in AKS engine that packer is already in use, so the effort would be to just bring this to the customer.

jluk · 2020-04-02T17:27:05Z

@Timvissers the teleport integration requires a dependency chain to be unblocked, but it is a path we're investigating to reduce image pull time issues.

To clarify my previous ask to rename - I would like to understand the specific needs of customizations needed which is causing the ask for a BYO image scenario. Often the items needing customization on the OS level have alternative solutions on the existing OS or we already plan to address the root problem (like slow image pull time via teleport).

As for custom OS for worker nodes that are managed by cloud providers, there are none to my knowledge which will give you actual support/management of customized nodes. AKS is quite clear in this by only offering a managed node which qualifies for true Azure support / on-call. Any full-customization needed can be done with AKS-Engine which does not provide support, but the full suite of customization you could hope for.

The support experience you will face is very wide if you try to get help on an unmanaged node "BYO image" from any provider. That being said if you are comfortable with no support on a BYO image and acknowledge you only get support on the control plane, are your requirements still met?

zhiweiv · 2020-04-03T01:42:02Z

Our requirement is relative simple, pre cache .net/asp.net base image to improve startup time of pods on scaled up Windows nodes, our workloads are all based on .net framework, it takes a long time to pull and extract the base image.

The best apporach: AKS provides additional Windows image skus with these base images out of box, We can choose the the SKU while creating Windows pool.

The second apporach: AKS provides the ability with BYO images, we build images based on offical AKS images. Only control plane is supported by Azure, we take care of worker nodes by ourself.

Timvissers · 2020-04-03T04:21:00Z

Thank you for giving some extra insights to me. Also about AKS-Engine. But currently I don't think AKS-Engine is the best option to me for the following reasons:

we will be running quite some clusters in different regions, and if I understand well, this is not a managed solution. We currently lack knowledge of the control plane, so we are going for the managed control plane.
this does not benefit of the cost-free master plane
we have everything in terraform, not ARM.

So, yes, I'm ok with no support on the data plane, but I'm not yet at the point that I'm ok with no support at the master plane.
In this case, I would be taking a supported base image and just adding some docker pull statements in a packer file. So those changes are minor. We are already doing this for 1,5 year on another cloud provider. We are planning to migrate to Azure, hence this feature request.

I am open to alternative solutions, but it's just that for me there seems to be no easy one:

teleport is in preview
overprovisioning and using init-containers to pull images, to have hot standby nodes for when customer jobs come in. Unfortunately, we are supporting expensive GPU nodes, so this would be unnecessary expensive.
smaller docker images. But these are already heavily optimized, so I there is not much to gain here.
there is a possible way to start the nodes earlier and gain 1 minute. Context: customer is uploading data. Once it is uploaded, we could determine the type of node (group) needed and already start a node. This would mean that we win about 1 minute of the time the new node is booting and joining the cluster. But this effort is quite big for the possible gain

Other options:

not migrate to azure if we consider this to be a blocker
My previous test results were for a docker image of 13m5 GB, pulling on a node in westeurope, from a georeplicated location (premium ACR) that was fully synced. It took 6 minutes. This is a really long time. If it would be 1 minute, we probably wouldn't care and not have this request. Maybe I should open a support request on why this takes so long (though I know that it's not only about network traffic, it's also about decompressing)??

jluk · 2020-04-03T16:18:48Z

Thanks for all the feedback - @mikkelhegn as FYI on the Windows caching requests from @zhiweiv. @zhiweiv if you were provided a BYO image scenario, would zero support of the data plane also be acceptable?

@Timvissers I'm assuming you're running quite a large Linux image or is it Windows? A 13GB image taking ~5 minutes is about what I would expect, you are correct that wait is incurred by both pull time and decompression.

Thanks for confirming no support of the data plane is acceptable to you if you bring your own nodes, this is something we're open to discussing. Would you mind sharing other generic requirements you may have for customizing OS nodes, I read you mentioned additional OS logging/binaries?

zhiweiv · 2020-04-04T00:25:34Z

We are ok with zero support of data plane in BYO images scenario.

Timvissers · 2020-04-06T05:13:02Z

@jluk We use Linux on Standard_F8s_v2, 100gb disk

github-actions · 2020-07-20T01:34:52Z

This issue has been automatically marked as stale because it has not had activity in 90 days. It will be closed if no further activity occurs. Thank you!

zhiweiv · 2020-07-20T01:38:07Z

Any update?

palma21 · 2020-08-12T01:00:57Z

It seems this thread is leaning a bit towards BYO Image support which is not something we're planning on the foreseeable future right now.

I've created this specific issue specifically Teleport support which is being worked on. #1785

ghost · 2021-02-13T16:01:10Z

Action required from @Azure/aks-pm

ghost · 2021-02-28T18:02:04Z