New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fargate/ECS] [Image caching]: provide image caching for Fargate. #696
Comments
@matthewcummings can you clarify which doc you're talking about ("The doc is horrific")? Can you also clarify which regions your Fargate tasks and your ECR images are in? |
@jtoberon can we have these kinds of things in every region? I generally use us-east-1 and us-west-2 these days. |
It seems better now https://docs.aws.amazon.com/AmazonECR/latest/userguide/vpc-endpoints.html. It has been updated from what I can see. However, it still feels like a leaky abstraction. I'd argue that I shouldn't need to know/think about S3 here. Nowhere else in the ECS/EKS/ECR ecosystem do we really see mention of S3. It would be great if the S3 details could be "abstracted away". |
Regarding regions, I'm really asking whether you're doing cross-region pulls. You're right: this is a leaky abstraction. The client (e.g. docker) doesn't care, but from a networking perspective you need to poke a hole to S3 right now. Regarding making all of this easier, we plan to build cross-region replication, and we plan to simplify the registry URL so that you don't have to think as much about which region you're pulling from. #140 has more details and some discussion. |
Ha ha, thanks. Excuse my snarkiness. . . I am not doing cross-region pulls right now but that is something I may need to do. |
@jtoberon your call on whether this should be a separate request or folded into the other one. |
Wait, aren't you really asking for This was added (it seems) to ECS EC2 in 2018: Agent config docs. I get the impression Fargate does not give control over that, and does not have it set to |
@ronkorving yes, that's exactly what I've requested. I wasn't aware of the ECS/EC2 feature. . . thanks for pointing me to that. However, a Fargate option would be great. I'm going to update the request. |
much needed indeed this caching option for fargate |
I would like to upvote this feature too. |
How's this evolving? There are many use cases where what you need is just a Lambda with unrestricted access to a kernel / filesystem. Having Fargate with cached / hot images perfectly fits this use case. |
@jtoberon @samuelkarp I realize that this is a more involved feature to build than it was on ECS with EC2 since the instances are changing underneath across AWS accounts, but are you able to provide any timeline on if and when this image caching would be available in Fargate? Lambda eventually fixed this same cold start issue with the short-term cache. This request is for the direct analog in Fargate. Our use case: we run containers on-demand when our customers initiate an action and connect them to the container that we spin up. So, it's a real-time use case. Right now, we run these containers on ECS with EC2 and the launch times are perfectly acceptable (~1-3 seconds) because we cache the image on the EC2 box with We'd really like to move to Fargate but our testing shows our Fargate containers spend ~70 seconds in the We have to make some investments in the area soon so I am trying to get a sense for how much we should invest into optimizing our current EC2-based setup because we absolutely want to move to Fargate as soon as this cold start issue is resolved. As always, thank you for your communication. |
I wish Fargate could have some sort of caching. Due to lack of environment variables my task just kept falling during all weekend. And every restart meant that new image will be downloaded from docker hub. In the end I've faced with horrible traffic usage, since Fargate had been deployed within private VPC. |
@Brother-Andy For this use-case, I built cdk-ecr-sync which syncs specific images from DockerHub to ECR. Doesn't solve the caching part but might reduce your bill. |
Ditto on the feature. We use containers to spin-off cyber ranges for students. Usage can fluctuate from 0 to thousands, Fargate is the best solution for ease of management, but the launch time is a challenge even with ECR. Caching is a much-needed feature. |
+1 |
1 similar comment
+1 |
Same here, I need to run multiple Fargate cross-region and it takes around a minute to pull the image. Once pulled, the task only takes 4 seconds to run. This completely stops us from using Fargate. |
we had the same problem, the Fargate task should take only 10 seconds to run but it takes like a minute to pull the I image :( |
Is that possible to use EFS file system to store image and the task just run this image? Or that is the same question of pulling from EFS to VPS which storing the container? |
Azure is solving this problem in their plataform |
+1 we run a very large number of tasks and 1GB image. This would significantly speed up our deploys and would be a super helpful feature. We're considering moving to EC2 due to Fargate deployment slowness and this is one of the factors. |
Currently using Gitlab Runner Fargate driver which is great, except for the spinup time ~1-2 minutes for our image (> 1gb) because it has to pull it from ECS for every job. Not super great. Would really like to see some sort of image caching. |
@trivedisorabh I believe cross region replication is what you're looking for. https://aws.amazon.com/blogs/containers/cross-region-replication-in-amazon-ecr-has-landed/ |
Would love an update on this feature! I'm very excited to reduce my organization's Fargate cold start times. The impact will be significant. |
Would also love an update, have been refreshing this issue once a week for 2 years hoping to hear of some progress. Would be a massive game changer. |
Joining the waiting room for this one. |
I am here because our NAT gateway costs went up. Use Fargate they said, it will be easier they said. |
@thereforsunrise if that's your only pain with the lack of caching you may benefit from defining VPC endpoints for ECR & S3 + host your docker images on ECR. This way you will not pay for traffic passing through NAT gateways due to image pull upon task launch. ref.: |
@maxgashkov Yes we're enabling the VPC endpoints and looking at the pull-through caching options for DockerHub too. |
If you are using cross-region replication cost should not be a problem. The problem is the time it takes to pull the image, which is not acceptable if you compare it with Kubernetes products from their competitor Cross region replication: https://aws.amazon.com/blogs/containers/cross-region-replication-in-amazon-ecr-has-landed/ |
Thanks for your continued interest, we hear you and are working on a couple of specific areas to address the requirements discussed in this issue. We wanted to take the opportunity to share how we are thinking about the problem and what to expect. In the context of this specific feature request, our ultimate goal is to provide mechanisms that reduce launch times. Caching is one approach to this problem, but as pointed out in a previous update, this is not an easy problem to solve in Fargate as every new task is run on a freshly booted and patched EC2 instance. While caching remains an area of investigation, we are also working towards some alternative approaches to achieve the same goal of reducing launch times. The first is an easy grab but it involves a specific build workflow to compress images using zstd. Nothing is required server-side (e.g. Fargate), and pull time improvements vary depending on the type of image. This mechanism is available today for use with Fargate and we have recently published a blog post that gets into the details. Another approach we are working on to reduce pull times is to use the concept of lazy loading. The idea here, in a nutshell, is to keep pulling the image at every launch in the background but to start the container as early as possible. Loading only the essential elements needed to start means that the container can be started before the pull is completed. You may have seen we recently launched the soci-snapshotter open source project which is at the core of this idea. You can also read more about it in this what’s new post. One of our goals is to make this technique available to Fargate customers transparently without changing your current workflows and thus making it work for all existing container images. We don’t yet have timelines to share but we expect this will be made available before an image caching feature specific to Fargate. Just like with the zstd technique, we expect improvements in pull times to vary depending on the image size and type. As others have noted, ECR does offer features to help minimize the transfer costs, which is a bit unrelated to improving Fargate launch times, but important to call out. One thing to consider for private networks is the use of VPC endpoints to avoid unnecessary charges in NAT Gateway, as discussed by @magoun and @alexjeen. For many use cases, ECR Replication can be used to minimize cross-region transfer costs. For images that are stored upstream in public registries, ECR Pull Through Cache may work for you. We have ongoing work to increase the use cases for PTC through authentication for rule upstreams and would be happy to hear from you. |
Is there any update on this or i am asking too early ? |
Has anyone had much luck with zstd? I haven't foudn the time to try it out yet, but if anyone else has, would be interested to know how much difference it has made? |
Hello @MattFellows I have been on a quest to optimize my company ECS fargate containers to try to bring down the startup time. @mreferre told me about this blog post where he is using a buildx to build and push the container image to ECR using the new zstd compression. I found a 25% reduction in startup time in ECS and the image size decreased by 50%. Also take a look at this blog Also as a final tip, following the best practices for docker by removing the number of instructions in the Dockerfile this is because it creates more layers in the final image and that increases the size and therefore the startup time. |
ps: the docker instructions I'm talking are:
see this for more information And i forgot to mention the buildx setup broke my wsl and took a bit to fix it |
Is there any update on this thread? |
2023 , any roadmap or news ? |
I found out I cannot pull these images on my M1 mac so I disabled the compression, I prefer being able to pull production images than having them 10-20% smaller. Plus out of the 2+ minutes for the pod to come online, downloading the image is not the worse, the worse is registering the Fargate pod in the VPC (at least 60 seconds) |
Podman supports pulling zstd images |
@Maxwell2022 which tool did you use to pull zstd images on Mac? If you used docker, it didn't support zstd until the most recent release v23.0.0. Another option on Mac (apart from podman) is finch, you may give it a try. |
From |
The v23.0.0 docker engine (open source name is Moby) was released 2 weeks ago. I guess it hasn't been integrated into Docker Desktop yet. Once the engine is upgraded to v23, the cli command will work for zstd images. |
Is there any update on this? As this is surely solid downside when using Fargate! |
This limitation makes me want to switch to ECS + EC2. |
Yep, I've already given up on Fargate because of this issue, and am using ECS + EC2 for everything now. With EC2 you can tell it to cache your images so after the first one starts, all subsequent ones start within ~2 seconds. Of course that relies on keeping your same EC2 running and not letting them cycle, but in our case that's easy to do. Shame thought because I used to be excited about Fargate. |
I guess that's assuming your pool doesn't have to add another node. Because if it has to add a node you'll probably have the same latency than Fargate which takes at least a minute to register it in the VPC |
Hi folks, the Fargate team is invested in solving this issue for our customers. We are actively working integration SOCI with Fargate and conducting the due diligence to deliver a frictionless customer experience that incurs minimal onboarding effort. Please give us some more time and we will keep you posted with updates here. |
Thanks for the update. Looks like an interesting solution :)
…On Wed, 1 Mar 2023, 01:38 Vaibhav Khunger, ***@***.***> wrote:
Hi folks, the Fargate team is invested in solving this issue for our
customers. We are actively working integration SOCI
<https://aws.amazon.com/about-aws/whats-new/2022/09/introducing-seekable-oci-lazy-loading-container-images/>
with Fargate and conducting the due diligence to deliver a frictionless
customer experience that incurs minimal onboarding effort. Please give us
some more time and we will keep you posted with updates here.
—
Reply to this email directly, view it on GitHub
<#696 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDYYNTIUQTHORU67SWZP3LWZ2SBTANCNFSM4KGYJX3A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@vaibhavkhunger Any further update on this. I have very large image sizes ~5GB. Fargate is a non-starter for me because of this. Image sizes cannot be reduced due to the sizable robotics/ai libraries required. |
@bearrito My guess is that your robotics/ai libraries install large binaries / artifacts as dependencies. Is that right? If so, you might consider hosting those static assets in S3 or something comparable. This way, your much slimmer container can download the asset during its boot sequence. I've never actually done this, but my understanding is that it's usually best to avoid embedding large assets in docker images. |
@mmarinaccio Doesn't work that way in my case. The image I receive is produced by an upstream team. There are no large artifacts embedded in the image e.g model files. Pile on a bunch of robotics libraries and you are going to get a big image... |
EDIT: as @ronkorving mentioned, image caching is available for EC2 backed ECS. I've updated this request to be specifically for Fargate.
What do you want us to build?
I've deployed scheduled Fargate tasks and been clobbered with high data transfer fees pulling down the image from ECR. Additionally, configuring a VPC endpoint for ECR is not for the faint of heart. The doc is a bit confusing.
It would be a big improvement if there were a resource (network/host) local to the instance where my containers run which could be used to load my docker images.
Which service(s) is this request for?
Fargate and ECR.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
I don't want to be charged for pulling a Docker image every time my scheduled Fargate task runs.
On that note the VPC endpoint doc should be better too.
Are you currently working around this issue?
This was for a personal project, I instead deployed an EC2 instance running a cron job, which is not my preference. I would prefer using Docker and the ECS/Fargate ecosystem.
The text was updated successfully, but these errors were encountered: