Support for Teleport on AKS #1785

palma21 · 2020-08-12T00:59:32Z

Support for container teleportation on AKS:
https://stevelasker.blog/2019/10/29/azure-container-registry-teleportation/

zhiweiv · 2020-08-12T01:18:10Z

@palma21
This issue is categorized to In Progress, does this mean MS is actively working on it? Do you have ETA of preview?

palma21 · 2020-08-12T01:31:38Z

Yes we are. No super concrete ETA yet, will be able to provide in a couple of weeks, it will definitely happen before the end of the year.

jeanfrancoislarente · 2020-10-02T10:49:47Z

Poke / prod @palma21

This one would be great in supporting our use case.

We have

over 12 clusters spread across various Azure regions
mix of Windows/Linux nodepools
nodepools set to scale up/down
deployments (install/remove) happen fairly frequently (dev, QA, demos, etc.)
deployments are 7 pods (4 linux and 3 Windows)

Image pull time is killing us on a scale up

Thanks in advance for the update!

PixelRobots · 2020-10-02T12:58:22Z

Any update on this?

palma21 · 2020-10-14T23:42:02Z

We're targeting private preview in November.

bplasmeijer · 2020-10-15T11:07:17Z

We're targeting private preview in November.

@jeanfrancoislarente, and I would really like to test the preview @palma21

sanderaernouts · 2020-10-20T06:40:55Z

@palma21 we would definitely want to participate in the private preview provided it will include support for Windows containers as well. Our setup is quite similar to what the author of #1532 describes. We run Windows docker workloads on-demand on our AKS cluster and see that the pull time when updating images or scaling nodes is long (10-20 minutes depending on the exact image). Sounds like ACR Teleport is perfect for us 👍

jeanfrancoislarente · 2020-11-17T12:02:28Z

@palma21 - this is just my regular ~30 day ping. Have you guys been able to come up with an estimated timeline for the preview?

Thanks!

bplasmeijer · 2020-12-03T11:58:09Z

@palma21 scale down and up windows nodes, and then pull a windowservercore image can take ~8-10 minutes or longer.

bplasmeijer · 2020-12-22T20:43:24Z

any update @palma21 on the preview release?

andrewsali · 2020-12-25T13:31:45Z

Is it correct that Teleport will be based on Azure Fileshare premium as listed in the first comment on this issue?

If so, what kind of performance can be expected?

According to many reports (#223 (comment)), Azure Fileshare is not performant enough when it comes to small file / metadata intensive operations, which might be a limitation to mounting large container images.

Some guidance on the expected performance would be useful to know if it's worthwhile to plan to use this feature once available in preview.

JJ11teen · 2021-02-14T00:31:28Z

Any eta on when this will reach public preview?

kishorerv25 · 2021-04-26T18:33:06Z

Any roadmap on the rollout , we have lot of windows containers running with image size of 10+GB. And we are using autoscale option , but this image download is killing lot of time. Its becoming one of main issue.

miwithro · 2021-08-18T16:39:27Z

Public Preview is planned for Sept. 2021.

PixelRobots · 2021-08-18T16:43:19Z

@miwithro Is that including windows image support or just Linux?

miwithro · 2021-08-18T17:45:17Z

@PixelRobots it is for both.

PixelRobots · 2021-09-13T14:50:21Z

@miwithro We are nearly halfway through September. Any update on when this will be released to public preview?

miwithro · 2021-09-13T14:53:52Z

This has been pushed to October.

PixelRobots · 2021-09-13T14:55:02Z

Sad times. Can you share why?

miwithro · 2021-09-13T14:56:13Z

staffing.

bplasmeijer · 2021-11-03T19:41:24Z

This has been pushed to October.

Let make it happen.

InDieTasten · 2021-11-15T12:05:11Z

I can confirm that auto-scaling with large images (18Gi) is borderline impossible, as scaling up takes up to 30 minutes per node. During that period, the previous resources are completely overloaded, and errors occur due to CPU maxing out.

@miwithro Any updates on the staffing end?

damienpontifex · 2021-11-15T23:21:27Z

Seems GCP are trying to solve the same problem https://cloud.google.com/blog/products/containers-kubernetes/introducing-container-image-streaming-in-gke
Apologies for tangent on topic, but thought it good to be knowledgeable across different solutions to this problem

PixelRobots · 2021-11-24T19:56:05Z

Hey @miwithro any news on this? I could really do with it for some of my customers.

PixelRobots · 2021-12-17T11:43:38Z

Any update on this? I could really do with it. Having to pre pull over 200 images at the moment and not having fun.

george-zubrienko · 2022-01-03T15:51:11Z

Awesome feature, totally +1. Re staffing, only thing I could say, I can totally live without extended windows container support, application ingress gateway, AAD integration. But features like this, that's liquid gold, literally, considering we can't really utilize a new node that is busy pulling images, but we still pay for it from first second of its availability :)

cailyoung · 2022-01-18T23:23:04Z

We're keen for this. Scale up from zero with 'large' (10+Gb) containers running workloads. Anything to speed it up!

johannordincab · 2022-03-14T11:51:54Z

@miwithro october is now 5 months past, do you have an updated eta?

mschumacher-syntellis · 2022-04-07T21:48:11Z

Any updates?

ocdi · 2022-04-12T11:55:18Z

I've been trying to work out a way to speed up scaling of windows node. A fresh node coming online takes 20 minutes, which as other people have said is while the servers are busy/overloaded, not great. Means I need to aggressively scale up in anticipation that load may increase further.

Really would like to see this being a possibility as this seems like a great solution.

bplasmeijer · 2022-07-12T07:09:20Z

hi @palma21

Please prioritize this work item.

Windows containers get smaller every release, but ACR pulling needs improvements on Windows Containers AKS.

cc: @gkaleta @weijuans-msft @richlander @brasmith-ms @brendandburns

guidemetothemoon · 2022-07-26T10:56:57Z

I got a chance to reach out to the ACR and AKS team directly regarding this and their answer is unfortunately that there is no concrete ETA for this and it's not clear when a new ETA will be available 😟
"The item is still accurate, work in progress not further details at the moment. Once we have a date/more details [we] will add it [to the GitHub issue]."

InDieTasten · 2022-08-10T09:01:27Z

For anyone who's interested in working around this issue, Amazons EKS (Elastic Kubernetes Service) does provide a way to inject pre-pulled/extracted images into node images. So when you scale up your nodes these new nodes can have a number of pulled images already present on the machine. Source: https://aws.amazon.com/blogs/containers/speeding-up-windows-container-launch-times-with-ec2-image-builder-and-image-cache-strategy/

I would really like a similar option in AKS as well, where you could just specify as part of the node-pool, that certain images/layers should already be present on machines of these pools.

ender1598 · 2022-08-12T20:42:36Z

Happy 2 year anniversary of the issue! Hopefully this third year of progress is the most productive. 😎

efzn · 2022-08-12T20:57:21Z

Happy 2 year anniversary of the issue! Hopefully this third year of progress is the most productive. 😎

😂

sumitkute · 2022-09-28T05:07:04Z

One of my customer is facing the same issues for a 15GB image, they want to cache it or store it in the image so a new VM comes up in VMSS have the docker image already cached. AWS has it.. we need an alternative in AKS

jrauschenbusch · 2022-11-28T19:10:40Z

While this is far from an ideal solution, in the meantime someone could use the following approach.

But AKS support for Teleportation would definitely be the better option.

InDieTasten · 2022-11-29T11:10:16Z

While this is far from an ideal solution, in the meantime someone could use the following approach.

@jrauschenbusch That works only for cases, where nodes exist long before any pods need to be scheduled there, which isn't the case for most of us. We want to have the teleportation feature, because we are scaling out new nodes and want to deploy large image pods immediately. I don't want to pay nodes to sit around and wait for a load to increase. If I have the node lying around, I could just as easily scale out my replicaset or statefulset directly and force the node to pull the image that way.

If we have multiple deployments utilizing these nodes I see how this could help. If deployment A needs a lot of resources it can schedule on some nodes. The nodes also pull images for deployment B. If load shifts from A to B, then the B deployment can scale quickly, but only if A also decreases. Seems like an uncommon use case :/

InDieTasten · 2022-11-29T11:15:32Z

As of today, missing support for teleportation undermines the entire auto-scaling nodes feature for windows nodes, which is arguably one of the most important features for a cloud-based managed k8s cluster that's supposed to support Windows.

jrauschenbusch · 2022-11-29T13:15:34Z

@InDieTasten This is of course true to a certain degree. But it minimizes the pain a bit, when you have a lot of daemon set pods running on the nodes which are a pre-requisite for your workloads. Then, the pre-pull daemon set pod can already start pulling the image before your workload will be scheduled on the node. I already mentioned that this is by far not the ideal solution, but maybe helps one or the other as long as the teleport support is not ready-to-use.

jrauschenbusch · 2023-01-13T10:18:15Z

@palma21 Will there be any progress on this topic or is it cancelled?

vaibhav-dhawan · 2023-04-17T13:26:53Z

For anyone else who had a hard time finding this: https://github.com/Azure/acr/blob/main/docs/teleport/aks-getting-started.md , this has a guide for requesting access to the private preview

bplasmeijer · 2024-06-03T13:05:15Z

@justindavies @palma21 any update on windows?

palma21 created this issue from a note in Azure Kubernetes Service Roadmap (Public) (In Progress (Development)) Aug 12, 2020

palma21 mentioned this issue Aug 12, 2020

Feature request: custom VHD or a way to prepull docker images offline #1532

Open

palma21 added the feature-request Requested Features label Aug 12, 2020

bplasmeijer mentioned this issue Oct 2, 2020

Pull Windows images can take a while, also on autoscale #1677

Closed

dgkanatsios mentioned this issue Oct 12, 2021

Large game server container images PlayFab/thundernetes#13

Open

hieumoscow mentioned this issue Oct 14, 2021

AKS should cache container images that are repeatedly requested #2594

Closed

pavneeta assigned palma21 Jul 18, 2022

jrauschenbusch mentioned this issue Apr 19, 2023

Container Teleportation Azure/acr#418

Closed

allyford mentioned this issue Sep 28, 2023

[Feature] Artifact Streaming #3928

Open

allyford removed this from In Progress (Development) in Azure Kubernetes Service Roadmap (Public) Oct 11, 2023

Jamie0 mentioned this issue Jan 11, 2024

[BUG] ReImaging a Windows node doesn't correctly wait for the temporary node to bootstrap #4047

Open

microsoft-github-policy-service bot added the action-required label Feb 2, 2024

bplasmeijer mentioned this issue Jun 6, 2024

[Feature] Artifact Streaming for Windows node pools #4269

Open

Support for Teleport on AKS #1785

Support for Teleport on AKS #1785

Comments

palma21 commented Aug 12, 2020

zhiweiv commented Aug 12, 2020 • edited Loading

palma21 commented Aug 12, 2020

jeanfrancoislarente commented Oct 2, 2020

PixelRobots commented Oct 2, 2020

palma21 commented Oct 14, 2020

bplasmeijer commented Oct 15, 2020

sanderaernouts commented Oct 20, 2020 • edited Loading

jeanfrancoislarente commented Nov 17, 2020

bplasmeijer commented Dec 3, 2020

bplasmeijer commented Dec 22, 2020

andrewsali commented Dec 25, 2020

JJ11teen commented Feb 14, 2021

kishorerv25 commented Apr 26, 2021

miwithro commented Aug 18, 2021

PixelRobots commented Aug 18, 2021

miwithro commented Aug 18, 2021

PixelRobots commented Sep 13, 2021

miwithro commented Sep 13, 2021

PixelRobots commented Sep 13, 2021

miwithro commented Sep 13, 2021

bplasmeijer commented Nov 3, 2021

InDieTasten commented Nov 15, 2021

damienpontifex commented Nov 15, 2021

PixelRobots commented Nov 24, 2021

PixelRobots commented Dec 17, 2021

george-zubrienko commented Jan 3, 2022

cailyoung commented Jan 18, 2022

johannordincab commented Mar 14, 2022

mschumacher-syntellis commented Apr 7, 2022

ocdi commented Apr 12, 2022

bplasmeijer commented Jul 12, 2022

guidemetothemoon commented Jul 26, 2022

InDieTasten commented Aug 10, 2022

ender1598 commented Aug 12, 2022

efzn commented Aug 12, 2022

sumitkute commented Sep 28, 2022

jrauschenbusch commented Nov 28, 2022

InDieTasten commented Nov 29, 2022

InDieTasten commented Nov 29, 2022

jrauschenbusch commented Nov 29, 2022 • edited Loading

jrauschenbusch commented Jan 13, 2023

vaibhav-dhawan commented Apr 17, 2023

bplasmeijer commented Jun 3, 2024

zhiweiv commented Aug 12, 2020 •

edited

Loading

sanderaernouts commented Oct 20, 2020 •

edited

Loading

jrauschenbusch commented Nov 29, 2022 •

edited

Loading