New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fargate: CannotPullContainer located on ECS registry #1128

Closed
bitliner opened this Issue Dec 4, 2017 · 14 comments

Comments

Projects
None yet
8 participants
@bitliner

bitliner commented Dec 4, 2017

Summary

I tried many times, and even if an image is on the ECS registry, I get the following error:

CannotPullContainerError: API error (500): Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

or

CannotPullContainerError: API error (500): Get https://XXX.dkr.ecr.us-east-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Description

I am using images on ECS registry with fargate.

Expected Behavior

Provisioning would finish and container status becomes "RUNNING"

Observed Behavior

It keeps constantly in PENDING status (5 minutes at least) until it throws the error

Environment Details

  • fargate
  • image located on ECS registry

Supporting Log Snippets

no log available - fargate does not provide any log while provisioning docker containers

@thogaw

This comment has been minimized.

Show comment
Hide comment
@thogaw

thogaw Dec 13, 2017

Same problem here. I also tried to pull from our private registry, but no option to get the credentials into fargate.

thogaw commented Dec 13, 2017

Same problem here. I also tried to pull from our private registry, but no option to get the credentials into fargate.

@samuelkarp

This comment has been minimized.

Show comment
Hide comment
@samuelkarp

samuelkarp Dec 13, 2017

Member

I am sorry to hear you are having problems.

The error you are seeing below is commonly due to lack of internet access to pull the image. The image pull occurs over the network interface used by the Task, and as such shares security group and routing rules.

Please check your configuration for the following:

  1. If you are launching a task without a public IP, make sure that the route table on the subnet has "0.0.0.0/0" going to a NAT Gateway or NAT instance to ensure access to the internet. If your route table has an internet gateway, this is acting like a firewall and preventing the connection from being made. If you are launching a task with a public IP, make sure that the route table on the subnet has "0.0.0.0/0" going to an internet gateway to ensure you will be able to use the public IP successfully for ingress traffic.
  2. Verify your security group rules for the Task allows for outbound access. The default here is typically All Traffic to 0.0.0.0/0.

If neither of those networking changes apply to you or if they do not fix your problem, please let us know so we can further assist.

Member

samuelkarp commented Dec 13, 2017

I am sorry to hear you are having problems.

The error you are seeing below is commonly due to lack of internet access to pull the image. The image pull occurs over the network interface used by the Task, and as such shares security group and routing rules.

Please check your configuration for the following:

  1. If you are launching a task without a public IP, make sure that the route table on the subnet has "0.0.0.0/0" going to a NAT Gateway or NAT instance to ensure access to the internet. If your route table has an internet gateway, this is acting like a firewall and preventing the connection from being made. If you are launching a task with a public IP, make sure that the route table on the subnet has "0.0.0.0/0" going to an internet gateway to ensure you will be able to use the public IP successfully for ingress traffic.
  2. Verify your security group rules for the Task allows for outbound access. The default here is typically All Traffic to 0.0.0.0/0.

If neither of those networking changes apply to you or if they do not fix your problem, please let us know so we can further assist.

@tklovett

This comment has been minimized.

Show comment
Hide comment
@tklovett

tklovett Dec 15, 2017

For anyone else who drops by here:
I wrestled with this for a while until I figured out that, in addition to what @samuelkarp said above, I needed to add AssignPublicIp: Enabled to my network configuration. After adding this, I stopped getting the Client.Timeout exceeded while awaiting headers error.

NetworkConfiguration:
    AwsvpcConfiguration:
        AssignPublicIp: 'ENABLED'

tklovett commented Dec 15, 2017

For anyone else who drops by here:
I wrestled with this for a while until I figured out that, in addition to what @samuelkarp said above, I needed to add AssignPublicIp: Enabled to my network configuration. After adding this, I stopped getting the Client.Timeout exceeded while awaiting headers error.

NetworkConfiguration:
    AwsvpcConfiguration:
        AssignPublicIp: 'ENABLED'
@byF

This comment has been minimized.

Show comment
Hide comment
@byF

byF Dec 21, 2017

@tklovett thanks!

@samuelkarp How are we supposed to prevent access to the public IP then?

byF commented Dec 21, 2017

@tklovett thanks!

@samuelkarp How are we supposed to prevent access to the public IP then?

@samuelkarp

This comment has been minimized.

Show comment
Hide comment
@samuelkarp

samuelkarp Dec 21, 2017

Member

@byF Security groups provide customizable rules to control inbound and outbound traffic.

Member

samuelkarp commented Dec 21, 2017

@byF Security groups provide customizable rules to control inbound and outbound traffic.

@hadsed

This comment has been minimized.

Show comment
Hide comment
@hadsed

hadsed Jan 2, 2018

I followed @samuelkarp 's instructions but that didn't help until I started a service with a public IP as @tklovett suggested. I don't understand why this should be the case--my service should not be open to the internet yet if I want to deploy any image it requires internet access which is only given if you make the service public? This seems like very poor security practice...

Edit: just saw the last two comments. Perhaps I don't understand this, but from a usability perspective I would like for the service to not have a public interface at all because it will never need it. But it looks like for this purpose it must have one. It is a mismatch from how things are done in EC2, where instances can be made private and no one has to worry about anything (like say, someone editing the security group, note that you can't add a public interface to an EC2 instance after it has been started).

hadsed commented Jan 2, 2018

I followed @samuelkarp 's instructions but that didn't help until I started a service with a public IP as @tklovett suggested. I don't understand why this should be the case--my service should not be open to the internet yet if I want to deploy any image it requires internet access which is only given if you make the service public? This seems like very poor security practice...

Edit: just saw the last two comments. Perhaps I don't understand this, but from a usability perspective I would like for the service to not have a public interface at all because it will never need it. But it looks like for this purpose it must have one. It is a mismatch from how things are done in EC2, where instances can be made private and no one has to worry about anything (like say, someone editing the security group, note that you can't add a public interface to an EC2 instance after it has been started).

@samuelkarp

This comment has been minimized.

Show comment
Hide comment
@samuelkarp

samuelkarp Jan 2, 2018

Member

@hadsed In order to pull the image, your ENI must have access to the registry. For Docker Hub and for Amazon ECR, this means your ENI must have access to reach the Internet. You can achieve access to the Internet in a few different ways, but the most common are an Internet Gateway and public IP address or using NAT and a private IP address. For NAT, you can use NAT instances or a NAT gateway.

If you want to disable Internet access entirely, you'll need to use a registry located inside your VPC instead of a registry that requires Internet access.

Member

samuelkarp commented Jan 2, 2018

@hadsed In order to pull the image, your ENI must have access to the registry. For Docker Hub and for Amazon ECR, this means your ENI must have access to reach the Internet. You can achieve access to the Internet in a few different ways, but the most common are an Internet Gateway and public IP address or using NAT and a private IP address. For NAT, you can use NAT instances or a NAT gateway.

If you want to disable Internet access entirely, you'll need to use a registry located inside your VPC instead of a registry that requires Internet access.

@hadsed

This comment has been minimized.

Show comment
Hide comment
@hadsed

hadsed Jan 2, 2018

It's just very limiting that I cannot restrict access to my services that may not be hardened against all types of internet traffic (that's what frontends like nginx, AWS ELB, etc. are for). So you can see how this is a problem: I either have to run my own registry (AWS ECR being useless for this case now) or I have to harden every service I'll ever deploy because it'll be open to the internet.

hadsed commented Jan 2, 2018

It's just very limiting that I cannot restrict access to my services that may not be hardened against all types of internet traffic (that's what frontends like nginx, AWS ELB, etc. are for). So you can see how this is a problem: I either have to run my own registry (AWS ECR being useless for this case now) or I have to harden every service I'll ever deploy because it'll be open to the internet.

@samuelkarp

This comment has been minimized.

Show comment
Hide comment
@samuelkarp

samuelkarp Jan 2, 2018

Member

@hadsed Security groups provide customizable rules to control inbound and outbound traffic. You can also choose to use NAT instead of adding a public IP address which will also let you restrict inbound traffic.

Member

samuelkarp commented Jan 2, 2018

@hadsed Security groups provide customizable rules to control inbound and outbound traffic. You can also choose to use NAT instead of adding a public IP address which will also let you restrict inbound traffic.

@byF

This comment has been minimized.

Show comment
Hide comment
@byF

byF Jan 3, 2018

byF commented Jan 3, 2018

@byF

This comment has been minimized.

Show comment
Hide comment
@byF

byF commented Jan 3, 2018

image

@panuhorsmalahti

This comment has been minimized.

Show comment
Hide comment
@panuhorsmalahti

panuhorsmalahti Jan 22, 2018

"You can also choose to use NAT instead of adding a public IP address which will also let you restrict inbound traffic."

How can this be configured? I'm launching the task into a private subnet with a route table to a NAT gateway in a public subnet. The VPC has an internet gateway. I can verify that EC2 instances in both the private and public subnet can pull the docker image (as they should be able to), but I'm still getting CannotPullContainerError. What am I missing?

EDIT: My problem was that the ECS service task's security group's outbound rule didn't allow pulling the image. I didn't notice that in Terraform a security group doesn't allow outbound traffic by default:
https://www.terraform.io/docs/providers/aws/r/security_group.html

panuhorsmalahti commented Jan 22, 2018

"You can also choose to use NAT instead of adding a public IP address which will also let you restrict inbound traffic."

How can this be configured? I'm launching the task into a private subnet with a route table to a NAT gateway in a public subnet. The VPC has an internet gateway. I can verify that EC2 instances in both the private and public subnet can pull the docker image (as they should be able to), but I'm still getting CannotPullContainerError. What am I missing?

EDIT: My problem was that the ECS service task's security group's outbound rule didn't allow pulling the image. I didn't notice that in Terraform a security group doesn't allow outbound traffic by default:
https://www.terraform.io/docs/providers/aws/r/security_group.html

@afedulov

This comment has been minimized.

Show comment
Hide comment
@afedulov

afedulov Jan 22, 2018

@samuelkarp, as mentioned by @tklovett and @hadsed, without assigning a public IP, Fargate does not get access to ECR. I have 2 services configured in exactly the same way: same VPC, same subnets, same security groups. 0.0.0.0 is pointing to an IGW. ACL rules allow all outbound traffic. The only difference - first service has Auto-assign public IP ENABLED, the second DISABLED. The first one successfully starts the task, the second one fails with CannotPullContainer exception. Could you, please consider reopening this ticket?

afedulov commented Jan 22, 2018

@samuelkarp, as mentioned by @tklovett and @hadsed, without assigning a public IP, Fargate does not get access to ECR. I have 2 services configured in exactly the same way: same VPC, same subnets, same security groups. 0.0.0.0 is pointing to an IGW. ACL rules allow all outbound traffic. The only difference - first service has Auto-assign public IP ENABLED, the second DISABLED. The first one successfully starts the task, the second one fails with CannotPullContainer exception. Could you, please consider reopening this ticket?

@samuelkarp

This comment has been minimized.

Show comment
Hide comment
@samuelkarp

samuelkarp Jan 22, 2018

Member

@afedulov You either need private IP + NAT or a public IP + IGW. In your example, the task that's failing has neither NAT nor a public IP. My earlier comment has the full information.

I'm going to lock this issue; please see this comment and this comment before opening a new issue.

Member

samuelkarp commented Jan 22, 2018

@afedulov You either need private IP + NAT or a public IP + IGW. In your example, the task that's failing has neither NAT nor a public IP. My earlier comment has the full information.

I'm going to lock this issue; please see this comment and this comment before opening a new issue.

@aws aws locked as resolved and limited conversation to collaborators Jan 22, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.