Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fargate checks fails when there are stopped containers in the task #1955

Closed
zlangbert opened this issue Jul 28, 2018 · 4 comments
Closed

Fargate checks fails when there are stopped containers in the task #1955

zlangbert opened this issue Jul 28, 2018 · 4 comments

Comments

@zlangbert
Copy link

zlangbert commented Jul 28, 2018

I'm running the agent as a sidecar in a Fargate task as described here. The fargate check continuously fails with the following error:

[ AGENT ] 2018-07-27 06:30:33 UTC | ERROR | (runner.go:277 in work) | Error running check ecs_fargate: [{"message": "'NoneType' object has no attribute '__getitem__'", "traceback": "Traceback (most recent call last):
 File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/base.py\", line 303, in run
 self.check(copy.deepcopy(self.instances[0]))
 File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/ecs_fargate/ecs_fargate.py\", line 126, in check
 self.rate('ecs.fargate.cpu.system', container_stats['cpu_stats']['system_cpu_usage'], tags)
TypeError: 'NoneType' object has no attribute '__getitem__'"}]

After some investigation I found that the ecs stats endpoint returns stopped containers stats as null. I use a volume in the task and Fargate creates a ~internal~ecs-emptyvolume-source container that is immediately stopped, causing an stats entry with the container id but no stats. The agent then fails to handle the null vaule.

This applies not just to volumes but any container that is stopped. For example a container that runs a command and exits 0.

You can see an example of the metadata and stats output here.

Additional environment details (Operating System, Cloud provider, etc):

AWS Fargate platform version 1.1.0

Additional information you deem important (e.g. issue happens only occasionally):

This is related to support ticket 156356

@ofek
Copy link
Contributor

ofek commented Jul 29, 2018

@zlangbert Thanks for the detailed report! We'll get this fixed in short order.

@zlangbert
Copy link
Author

zlangbert commented Sep 8, 2018

@ofek I don't believe the code change resolved the issue. With 6.5.0-rc.3 of the agent I'm seeing the essentially same error:

[ AGENT ] 2018-09-08 16:55:25 UTC | ERROR | (runner.go:278 in work) | Error running check ecs_fargate: [{"message": "'NoneType' object has no attribute 'get'", "traceback": "Traceback (most recent call last):\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/checks/base.py\", line 352, in run\n self.check(copy.deepcopy(self.instances[0]))\n File \"/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/ecs_fargate/ecs_fargate.py\", line 125, in check\n cpu_stats = container_stats.get('cpu_stats', {})\nAttributeError: 'NoneType' object has no attribute 'get'\n"}]

I think as it loops over the stats object container_stats itself is null.

for container_id, container_stats in stats.iteritems():
tags = container_tags[container_id]
# CPU metrics
cpu_stats = container_stats.get('cpu_stats', {})

Let me know if you need any more info, thanks!

@ofek
Copy link
Contributor

ofek commented Sep 25, 2018

@zlangbert This should be fixed in 6.5.x final via #2206

Can you confirm?

@ofek ofek reopened this Sep 25, 2018
@zlangbert
Copy link
Author

@ofek Yes, all good on 6.5.1. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants