New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ECR] [Remote Docker Repositories]: Pull through cache #939
Comments
|
We use Artifactory to perform this role, and it has many security, availability and performance benefits. |
|
@DJMatus23 We are using Artifactory today but would like to get out of having to run that ourselves. |
|
With the recent changes to docker hub with rate limiting this has become more important then ever: |
|
Would love to see this feature in ECR, or even better have ECR integrated in CodeArtifactory (which supports already pull-through-cache) I build an CDK Construct that syncs specific images from DockerHub to ECR (https://github.com/pgarbe/cdk-ecr-sync). Might be useful until this feature is implemented. |
|
Thanks for raising this issue. We're looking at what this will take to implement. A couple of questions for the community:
|
|
For our use case, all our private images are in ECR, so upstream authentication wouldn't be needed, The main reason we want this is to allow us to restrict our cluster to pull from one source, and only have approved images in ECR. Also with the new limits for the docker hub and interruptions to quay.io a few months ago, being able to cache public images is important for availability. |
|
Our use case is similar. Our private images are in Artifactory with plans to move them to ECR. The images we are thinking about are public images that do not require auth to pull. Being able to cache public images is more important. |
|
Same for us too. We'd like this feature to ensure we can get public images. No need for authentication with an upstream registry. |
|
Same as above. Caching Docker/Quay/GitHub. About the only auth'd images we might cache are from other ECRs. |
|
Please use reactions instead wrote same here like the entry post describe it. Got tons of mails. |
|
According to the DockerHub announcement I think most of us are now in big trouble because there's only 1 month left before we go from unlimited DockerHub pulls to just 16 pulls per hour which is slated to start on November 1, 2020. That is a big problem, mostly for our automation mechanisms (CICD) which make the assumptions that DockerHub image pulls work. I am encouraged an AWS ECR employee said they would look into solutions, but I think for all intents and purposes that everybody who is in this situation is now scrambling for solutions and we should not wait and hope AWS solves this anytime soon. Even if they roll out a full solution in the next few days we would still need time to adapt to using it. I am now going to look into how to solve this with Artifactory. |
|
I don't know if the ECR will helps here. I guess the outgoing IPs of the ECR will be always on limit. I do some research for my company. I start to investigate how to setup a caching registry for our CI and our Kubernetes Platform (not EKS). I decide to buy a 5$/month user which has no rate-limit on DockerHub. I don't now if it's cheaper as a caching ECR, but it may cheaper and easier then setup an high availability caching docker registry. Since the caching docker registry will affected by rate-limits, too. (no matter using a own IP or a shared IP by ECR), a caching registry may not help. |
Google wants to transform its Container Registry to https://cloud.google.com/artifact-registry It feels like something similar should occur with AWS CodeArtifact and AWS ECR in near future. Good time to ask AWS about RoadMap on that with all these multi-accounts and docker hub limits. |
Presumably, AWS could afford a Docker subscription 😄 |
|
This feels like a win-win for AWS, allow people to cache docker layers somewhere, and charge them the S3 costs plus some small pull fee. Then AWS could just effectively de-dupe those layers, so they're not storing a million copies of the base alpine layer or whatever is popular. Everytime someone pulls through with a new layer, they have to go get it, and that'll mean an arrangement with each of the downstream providers. But once they've got it, they can decide how long it stays around and at what point it is cheaper to download it again vs just storing it. This gives the protection people will gladly pay for, and the bonus that there should be a noticeable improvement in container spin-up times too. |
|
I think having a pass through will be a super useful feature, we host all our golden source of images on artifactory today and would love to establish a method between ECR and artifactory in such a way that if image is not available in ECR, it should go to artifactory and pull image from there and cache it in ECR. Artifactory should act as remote backend with or without authentication and ECR should act as local cache for EKS clusters |
|
Another alternative solution would be a AWS managed Nexus Service. We use Nexus internal to cache remote docker repositories like dockerhub. Would love to see this in AWS like the managed Prometheus or Grafana. |
|
Hi @omieomye I think a pass through registry would be beneficial for a few reasons
At the moment, this can be partially accomplished by Amazon ECR Public Gallery, but even amazon fails to publish their images in their own public repositories, see for example corretto, they lag 7 months now to publish the image: corretto/corretto-docker#47. |
|
I'm fine with paying docker even, I just want my organizations used images copied to ECR for security, reliability and redeployment purposes. The ability to not be able to use dockerhub as a mirror for pull through cache is pretty disappointing, especially because the images are even at a different path now within ECR's public gallery. Double down that AWS as an organization is terrible about putting their own images into their own public ECR. FROM docker/library/python:latest vs FROM python:latest feels like it kind of defeats the point. |
|
Hey folks, Yes at this time pull through cache can reliably and anonymously access any public image on ECR Public and Quay.io on a customer's behalf. This includes Docker Official Images which are now also hosted on ECR Public. This will let you cache the most popular public images or also AWS images into your own ECR registry for security, reliability, and redeployment purposes as you mentioned @JacobWeyer. Most AWS images are hosted on ECR Public, though there may be some exceptions which will eventually be there. Feel free to DM me on Twitter (@Sravan_R_) any specific examples, and I can also explain more how your performance will be different by changing your Dockerfile to pull from ECR or ECR Public instead. I've also created a new issue here for authenticated registries. ECR pull through cache would use these credentials to pull images from private registries or registries that require authentication to access higher pull limits. More information on our design and plans can be found here too - #1581 (comment) Hope this helps! |
|
Yeah, this is frustrating in the way it's only halfway there! Just realized that we still need Docker Hub for some circleci published images that are optimized for CI environments. https://hub.docker.com/u/circleci Not trying to avoid the Docker Hub subscription, but interested in staying 100% on the AWS network to maximize performance. |
@woodhull slightly off-topic, but you're using legacy images. CircleCI recommends migrating to |
|
Hello @srrengar, this feature has just been released and announced here. But it only allow support for public ECR or Quay.io upstream image repositories... Is there any plan to for enhancements with ANY private repositories like docker hub, redhat... that requires authentication for pulling image cache ? |
|
What about support for "vendor official" (as opposed to "Docker official) images like |
|
These two are on ECR Public already (e.g. https://gallery.ecr.aws/bitnami/prometheus, https://gallery.ecr.aws/bitnami/grafana) published through verified ECR Public third parties. Please keep telling us which images you need and we will work with the community and our partners to get them on ECR Public if they aren't there already. |
|
Thanks! While I have nothing against Bitnami in particular, I'd rather use "official" (images maintained by the same mainteners as the open source project) if possible. |
|
Adding to what @dserodio wrote, the bitnami images are not the same, and are sometimes different in unexpected ways. Using prometheus as an example, the most recent bitnami in the public gallery is 2.33.1 with reported size of 111 MB, while most recent prom/prometheus on docker hub is 2.33.3 with size of ~70 MB. By the way, it's great to see the library/node images (incl. alpine) in the public gallery. |
|
Is there any documentation on whether pull through caching is usable cross-account? We have a centrally managed ECR registry and would like other accounts to pull new tags. By clicking through the UI I found this IAM action |
|
@johanneswuerbach other accounts can pull new images the same way they do today as long as they have ecr:BatchImportUpstreamImage and ecr:CreateRepository (if caching a new image altogether). We are working on updating the documentation, but can you reach out to support if that's not working in the meantime? |
|
We've found it's not possible for other accounts to pull the image by default without assuming a role in the ECR repository account. |
Is this the case even if you have granted ecr:BatchImportUpstreamImage to all principals in your OU? I'm finding that even if I have a wide-open registry permission like the one below, I can only pull from the account in which the registry resides. And I'm not even trying to create new repositories; these are pre-provisioned repostories (pre-provisioned so we can enable CMK encryption and cross-account access) |
@coultn - I've been setting up pull-through caching on ECR (we're using it to contend with the fact that our EKS worker nodes run in a private subnet with no direct Internet egress). I have found many of the open-source containers that we install in our clusters on ECR Public, but I was surprised by the absence of the CSI Secrets Store Driver(the secrets-store-csi-driver-provider-aws is present, but not the driver itself)
Also I thought it was very strange that for the bitnami/cluster-autoscaler images, the "v" was dropped from the tags (e.g. "v1.20.0" is tagged with "1.20.0" in ECR Public). This just makes it harder to use the cluster-autoscaler out of the box. |
I am able to pull from other accounts (with enough permissions set up and with a lambda taking care of repository policies for cross-account access) and the pull-through cache is able to fetch new tags into an existing dynamic repository. |
|
You can pull |
|
@jpriebe You were able to pull from gcr using ecr ? I want to reduce my nat gateway traffic using ECR private link |
|
Has anyone managed to successfully pull an image (from public ecr) using pull through cache from another AWS account while the repository doesn't exist? (first time pull) |
|
Hi @ahmokhtari. Pull through cache can support this, but it needs to be configured, and there are some warts currently. First, the Second, the warts... once the repository is created, that secondary account won't actually have permissions to pull content. This is a bit weird - since ECR creates the repo automatically. But new repositories are created without permissions, so you need to double-back and set up repository permissions. You can set up cross-account read access on the new repo, see the "AllowPull" example under the "Allow another account" section in the 'Repository policy examples' section of the ECR docs. For this second part, we are working on a new feature that allows you to create configuration profiles to be automatically used for new repository creation, and that will make this a whole lot more seamless. Hope this helps! |
|
I have used lambda to setup default resource and lifecycle policies for new repos. This can automate the warty task mentioned by @jlbutler. In a SAM template the function definition would look something like the following |
Hi @jlbutler, is the new feature you're describing being tracked by this issue so we can follow? Based on that issue it looks like it was being worked on but no longer. And any idea on time scale (weeks, months, years) for the repository configuration profiles features to be GA? Thank you! |
|
Sorry I missed this @heyweswu . Yes that issue is a good one to track a new preview feature, called repository creation templates. |
Community Note
Tell us about your request
I would like to be able to store docker images that are usually hosted on third party registries in ECR.
Which service(s) is this request for?
ECR
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Our organization would like to be able to avoid being affected if/when those registries go down by having a copy of certain images cached in ECR. Today if Quay.io or some other public registry goes down we may not be able to scale up a cluster.
Some secondary benefits would be being able to limit which images can be used and also saving on network costs as we would not need every service pulling images from the internet when they can be pulled from ECR via private link.
I image this would work similar to CodeArtifact where you can have service pull libraries from upstream as needed.
Are you currently working around this issue?
How are you currently solving this problem?
Today we have to pull a list of images from our many k8s clusters and run a Codebuild job to pull those images and push them into ECR.
Additional context
Anything else we should know?
Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
The text was updated successfully, but these errors were encountered: