-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AKS should cache container images that are repeatedly requested #2594
Comments
Hi eugen-nw, AKS bot here 👋 I might be just a bot, but I'm told my suggestions are normally quite good, as such:
|
Hi @eugen-nw, can you describe your use case further? Currently the container image does get cached on the node after the first pull until that node is restarted by an upgrade. |
Thanks very much for having looked into this issue! It is fantastic to learn that AKS already caches the container images. As I suggested earlier, the caching timeout should better be only 24 hours, so the ACR secret's timeout can kick in and disable those downloads. Our scenario is a bit different than the norm. We run only the Virtual Kubelet on AKS and that creates our Windows containers in ACI. Recently we've had a problem with containers running for weeks in a "Failed" state in ACI because the pull operation from ACR could not complete within N minutes. We use K8S' HPA to scale out the count of containers based on the count of messages received in Azure Message Queues. And there's plenty of scaling out that we do, several times a day, up to having 50 containers running. We do have 7 containers running all the time in order to quickly respond to demands. |
Another slightly different alternative for you to consider could be the deallocate scale down with cluster autoscaler. Cluster autoscaler will respond to the pending pods pressure from HPA and start/deallocate VMs as necessary. Deallocated VMs will have your images preservered. https://azure.microsoft.com/en-us/updates/public-preview-scaledown-mode-in-aks/ |
That is fantastic, thanks very much for the info! But does it work with ACI hosted containers?
Thank you, Eugen
Diese Nachricht wurde von meinem iPhone gesendet.
Am 10/14/21 um 9:26 AM schrieb Marwan Ahmed ***@***.***>:
Another slightly different alternative for you to consider could be the deallocate scale down with cluster autoscaler. Cluster autoscaler will respond to the pending pods pressure from HPA and start/deallocate VMs as necessary. Deallocated VMs will have your images preservered.
https://azure.microsoft.com/en-us/updates/public-preview-scaledown-mode-in-aks/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#2594 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADT64RV725HR6M7EJTE6ZGTUG4ACVANCNFSM5F26ZDSQ>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
This will not work for us because we are not using AKS nodes to run the containers on, but use the Virtual Kubelet to run the containers in the ACI instance. |
@miwithro @justindavies I don't think we can fix this on AKS side, perhaps one of you could relay this ask to ACI/Virtual Kubelet folks? |
I kind of think that the caching should happen on the ACI side. It also requires a bit of communication with the Virtual Kubelet on this matter. |
Hi, The image caching is not working because Virtual Kubelet uses ACI to run containers and ACI does not have any caching, it will pull containers each time from ACR making the whole process slow. We exactly faced the same issue while using ACI (Not in virtual kubelet). We are thinking to move to AKS + KEDA now to achieve faster start time. Project teleport is also coming to AKS first and no plans for ACI as of today. I am not sure why you are using virtual kubelet if you want to run 7 containers always. Better approach will be to have few nodes in AKS running and scaleout on demand. |
@huzefaqubbawala Thanks very much for having looked into this! As I wrote above, we are scaling out from 7 containers running constantly to 50 containers at a time. Our reasons for running all the containers in ACI are:
I did not use KEDA yet. I know as much that it creates a new container in response to each incoming request. If you host the KEDA-generated containers on AKS VM nodes, I wonder how' it'd look the scenario where the count of incoming calls is so high that AKS is out of resources and can no longer generate containers for KEDA? Do those calls get queued up or are they lost? |
First, For scaled out scenarios caching is not possible even in AKS since new VMs will be spinned up dynamically and first time pull is required from ACR. Project teleport will eventually solve this problem to reduce time to pull from registry in AKS. Second, Why you need your scaled out containers to run on ACI ? you can use only AKS to run your scaled out containers using cluster autoscaler and it will also scale down when there are no messages (You will only pay for your usage). Assuming you need to run your AKS with 7 containers always and rest on load. You can configure your AKS like below - Nodepool - Cluster Autoscaler -> Min node 2 - max node 10 (Depends on what your containers need) In the above case, where there are 1000s of messages in azure service bus, it will scale up to your cluster autoscaler which is 10 nodes. Even with ACI, you do have some Quota which needs to be respected and you cannot create more containers plus you would need external orchestrator to create containers in ACI. |
This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment. |
This issue will now be closed because it hasn't had any activity for 7 days after stale. eugen-nw feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion. |
Downloading each image time and again from ACR is a waste of cycles. Keep the cached images for only 1 day so the ACR secret’s expiration can kick in and disable the ACR downloads. In our solution we do a lot of on-demand scale-out and the suggested improvement would help the scaled-out containers to start faster.
The text was updated successfully, but these errors were encountered: