New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Teleport on AKS #1785
Comments
@palma21 |
Yes we are. No super concrete ETA yet, will be able to provide in a couple of weeks, it will definitely happen before the end of the year. |
Poke / prod @palma21 This one would be great in supporting our use case. We have
Image pull time is killing us on a scale up Thanks in advance for the update! |
Any update on this? |
We're targeting private preview in November. |
@jeanfrancoislarente, and I would really like to test the preview @palma21 |
@palma21 we would definitely want to participate in the private preview provided it will include support for Windows containers as well. Our setup is quite similar to what the author of #1532 describes. We run Windows docker workloads on-demand on our AKS cluster and see that the pull time when updating images or scaling nodes is long (10-20 minutes depending on the exact image). Sounds like ACR Teleport is perfect for us 👍 |
@palma21 - this is just my regular ~30 day ping. Have you guys been able to come up with an estimated timeline for the preview? Thanks! |
@palma21 scale down and up windows nodes, and then pull a windowservercore image can take ~8-10 minutes or longer. |
any update @palma21 on the preview release? |
Is it correct that Teleport will be based on Azure Fileshare premium as listed in the first comment on this issue? If so, what kind of performance can be expected? According to many reports (#223 (comment)), Azure Fileshare is not performant enough when it comes to small file / metadata intensive operations, which might be a limitation to mounting large container images. Some guidance on the expected performance would be useful to know if it's worthwhile to plan to use this feature once available in preview. |
Any eta on when this will reach public preview? |
Any roadmap on the rollout , we have lot of windows containers running with image size of 10+GB. And we are using autoscale option , but this image download is killing lot of time. Its becoming one of main issue. |
Public Preview is planned for Sept. 2021. |
@miwithro Is that including windows image support or just Linux? |
@PixelRobots it is for both. |
@miwithro We are nearly halfway through September. Any update on when this will be released to public preview? |
This has been pushed to October. |
Sad times. Can you share why? |
staffing. |
Let make it happen. |
I can confirm that auto-scaling with large images (18Gi) is borderline impossible, as scaling up takes up to 30 minutes per node. During that period, the previous resources are completely overloaded, and errors occur due to CPU maxing out. @miwithro Any updates on the staffing end? |
Seems GCP are trying to solve the same problem https://cloud.google.com/blog/products/containers-kubernetes/introducing-container-image-streaming-in-gke |
Hey @miwithro any news on this? I could really do with it for some of my customers. |
Any update on this? I could really do with it. Having to pre pull over 200 images at the moment and not having fun. |
Awesome feature, totally +1. Re staffing, only thing I could say, I can totally live without extended windows container support, application ingress gateway, AAD integration. But features like this, that's liquid gold, literally, considering we can't really utilize a new node that is busy pulling images, but we still pay for it from first second of its availability :) |
We're keen for this. Scale up from zero with 'large' (10+Gb) containers running workloads. Anything to speed it up! |
@miwithro october is now 5 months past, do you have an updated eta? |
Any updates? |
I've been trying to work out a way to speed up scaling of windows node. A fresh node coming online takes 20 minutes, which as other people have said is while the servers are busy/overloaded, not great. Means I need to aggressively scale up in anticipation that load may increase further. Really would like to see this being a possibility as this seems like a great solution. |
hi @palma21 Please prioritize this work item. Windows containers get smaller every release, but ACR pulling needs improvements on Windows Containers AKS. cc: @gkaleta @weijuans-msft @richlander @brasmith-ms @brendandburns |
I got a chance to reach out to the ACR and AKS team directly regarding this and their answer is unfortunately that there is no concrete ETA for this and it's not clear when a new ETA will be available 😟 |
For anyone who's interested in working around this issue, Amazons EKS (Elastic Kubernetes Service) does provide a way to inject pre-pulled/extracted images into node images. So when you scale up your nodes these new nodes can have a number of pulled images already present on the machine. Source: https://aws.amazon.com/blogs/containers/speeding-up-windows-container-launch-times-with-ec2-image-builder-and-image-cache-strategy/ I would really like a similar option in AKS as well, where you could just specify as part of the node-pool, that certain images/layers should already be present on machines of these pools. |
Happy 2 year anniversary of the issue! Hopefully this third year of progress is the most productive. 😎 |
One of my customer is facing the same issues for a 15GB image, they want to cache it or store it in the image so a new VM comes up in VMSS have the docker image already cached. AWS has it.. we need an alternative in AKS |
While this is far from an ideal solution, in the meantime someone could use the following approach. But AKS support for Teleportation would definitely be the better option. |
@jrauschenbusch That works only for cases, where nodes exist long before any pods need to be scheduled there, which isn't the case for most of us. We want to have the teleportation feature, because we are scaling out new nodes and want to deploy large image pods immediately. I don't want to pay nodes to sit around and wait for a load to increase. If I have the node lying around, I could just as easily scale out my replicaset or statefulset directly and force the node to pull the image that way. If we have multiple deployments utilizing these nodes I see how this could help. If deployment A needs a lot of resources it can schedule on some nodes. The nodes also pull images for deployment B. If load shifts from A to B, then the B deployment can scale quickly, but only if A also decreases. Seems like an uncommon use case :/ |
As of today, missing support for teleportation undermines the entire auto-scaling nodes feature for windows nodes, which is arguably one of the most important features for a cloud-based managed k8s cluster that's supposed to support Windows. |
@InDieTasten This is of course true to a certain degree. But it minimizes the pain a bit, when you have a lot of daemon set pods running on the nodes which are a pre-requisite for your workloads. Then, the pre-pull daemon set pod can already start pulling the image before your workload will be scheduled on the node. I already mentioned that this is by far not the ideal solution, but maybe helps one or the other as long as the teleport support is not ready-to-use. |
@palma21 Will there be any progress on this topic or is it cancelled? |
For anyone else who had a hard time finding this: https://github.com/Azure/acr/blob/main/docs/teleport/aks-getting-started.md , this has a guide for requesting access to the private preview |
Support for container teleportation on AKS:
https://stevelasker.blog/2019/10/29/azure-container-registry-teleportation/
The text was updated successfully, but these errors were encountered: