Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Credential should be scoped to a valid region, not 'eu-west-1'. " #74

Closed
seiffert opened this issue May 18, 2015 · 8 comments
Closed
Labels

Comments

@seiffert
Copy link

Hi,

I'm using the agent in a Docker container (using the official amazon/amazon-ecs-agent image from Docker Hub) on a CoreOS EC2 instance. (see the systemd unit file below) Right now, I'm having problems having the agents reliably register the instances at ECS. I can't really tell when exactly this happens, but in one out of three starts of the agent, it keeps logging the following lines and does not report the agent to be connected in the ECS Console.

2015-05-18T11:34:33Z [ERROR] Unable to discover poll endpoint module="acs handler" err="Credential should be scoped to a valid region, not 'eu-west-1'. "
2015-05-18T11:34:33Z [INFO] Error from acs; backing off module="acs handler" err="Credential should be scoped to a valid region, not 'eu-west-1'. "
2015-05-18T11:36:47Z [ERROR] Unable to discover poll endpoint module="acs handler" err="Credential should be scoped to a valid region, not 'eu-west-1'. "
2015-05-18T11:36:47Z [INFO] Error from acs; backing off module="acs handler" err="Credential should be scoped to a valid region, not 'eu-west-1'. "
2015-05-18T11:39:09Z [ERROR] Unable to discover poll endpoint module="acs handler" err="Credential should be scoped to a valid region, not 'eu-west-1'. "
2015-05-18T11:39:09Z [INFO] Error from acs; backing off module="acs handler" err="Credential should be scoped to a valid region, not 'eu-west-1'. "

The Systemd unit file I'm using is:

[Unit]
Description=The AWS ECS agent
After=docker.service
Requires=docker.service
Type=service
[Service]
TimeoutStartSec=0
TimeoutStopSec=0
Restart=on-failure
SyslogIdentifierg=ecs-agent
ExecStartPre=-/bin/mkdir -p /var/log/ecs /var/ecs-data
ExecStartPre=-/usr/bin/docker stop ecs-agent
ExecStartPre=-/usr/bin/docker pull amazon/amazon-ecs-agent
ExecStartPre=-/usr/bin/docker rm ecs-agent
ExecStart=/usr/bin/docker run --name ecs-agent -v /var/run/docker.sock:/var/run/docker.sock -v /var/log/ecs:/log -v /var/ecs-data:/data -p 127.0.0.1:51678:51678 --env-file /etc/ecs/ecs.config -e ECS_LOGFILE=/log/ecs-agent.log amazon/amazon-ecs-agent

/etc/ecs/ecs.config:

ECS_CLUSTER=<name of my existing cluster>
ECS_DATADIR=/data/
ECS_CHECKPOINT=true
AWS_DEFAULT_REGION=eu-west-1
@euank
Copy link
Contributor

euank commented May 19, 2015

I haven't been able to reproduce this unfortunately. I ran a script that launched and terminated batches of instances using user-data very similar to yours, but all of them ended up with agentConnected = true. I went through at least a couple hundred instances. You can find the hacky script I threw together here.

I am using a slightly newer build of the agent in the above, but it only has logging changes which I hoped would help debug this if I ran into it.

Can you give me more information about what you're doing? Your instances are in eu-west-1, right? How are you setting your ecs.config file (user-data? Can I see that portion too)?

It would be very helpful if you could add -e ECS_LOGLEVEL=debug to your run command and post the additional logged details as well.

Thanks,
Euan

@seiffert
Copy link
Author

Thanks for taking the time to investigate on this one!
I'll activate debug logging in a minute! Before that, I just got the same problem again. I'll just attach some more debug information.

$ docker inspect <ecs-agent-container-id>
https://gist.github.com/seiffert/504b4e949b28a37a8022

$ docker log <ecs-agent-container-id>
https://gist.github.com/seiffert/69b70915e93862a96279

Currently I'm running only three EC2 instances in the same ECS cluster in eu-west-1, yes. They created with a Cloudformation stack and provisioned using CloudInit.

AMI: CoreOS-alpha-668.2.0-hvm (ami-c5b7d8b2)
User Data: https://gist.github.com/seiffert/12af9bf0d092bd3d7a43

I'll enable debug logging now and keep you posted with more information as soon as the problem occurs again.

euank added a commit to euank/amazon-ecs-agent that referenced this issue May 19, 2015
The SDK can handle setting the endpoint better than the Config code did;
the endpoint behavior before was a legacy of a time with no sdk.

The specific issue was that the endpoint would be set incorrectly if the
ec2 metadata service did not return any value.

In line with this, if no region can be determined it now is a fatal
error.

Relates to aws#74
@seiffert
Copy link
Author

Hi,

the problem occurred again. This time we had debug logging turned on.

https://gist.github.com/seiffert/393b72bdde650df9450d

@euank: You referenced this issue in one of your commits. I see that it is merged into the dev branch now, is there a Docker-Image publicly available built from the dev branch?

Thanks,
Paul

@euank
Copy link
Contributor

euank commented May 28, 2015

Unfortunately the debug logging still doesn't show enough information 😞.

I've added more logging around suspect areas since then and, in the above referenced commit, fixed one possible cause of this behavior.

We don't currently have a public image of the dev branch, but you're welcome to use a build I have of it, euank/euank-test-agent:a0db7f2, or to create your own build.

I'm hopeful that if the above doesn't fix it, the debug output from the current dev branch / next agent release will end up being more useful.

Thanks for bearing with me so far and continuing to help figure this out,
Euan

Edit: You also might be able to avoid this issue by setting the environment variable ECS_BACKEND_HOST=ecs.eu-west-1.amazonaws.com:443 in the currently released agent. If this does work around it, that lends weight to my above fix being correct.

@seiffert
Copy link
Author

Thanks again for the effort you put into this!
I'm switching to a version of the agent built from the dev branch right now. (didn't see your Edit before setting this up) I'll let you know when there are any more errors. :)

@mchakravarty
Copy link

@euank I just made an observation that might help localise the underlying problem. Using the currently released version of the ecs agent (i.e., the amazon/amazon-ecs-agent image), I noticed that the agent dies with the cited error message Credential should be scoped to a valid region on every single invocation in several regions (I tried, ap-southeast-2, us-east-1, and us-west-2) if I do not use ECS_DATADIR and ECS_CHECKPOINT=false.

In other words, without checkpointing, but with AWS_DEFAULT_REGION set (to any valid region I tried), the agent fails to connect, whereas with checkpointing enabled, it works fine. (I guess, most people use checkpointing, which may explain why this issue remained fairly undiscovered.)

@seiffert
Copy link
Author

seiffert commented Jun 4, 2015

We use checkpointing from the beginning, so at least in our case this problem occurred with checkpointing enabled!

FYI: No crashes since I switched to a build of the dev branch. @euank I'll keep you posted...

@samuelkarp
Copy link
Contributor

Released with v1.2.0.

danehlim pushed a commit to danehlim/amazon-ecs-agent that referenced this issue Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants