-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasing CPU usage from dockerd
when system is idle
#641
Comments
Hello, I'm encountering the same problem with more or less the same config than you. Did you find any issue to your problem ? Regards, |
I started to have same kind of a problem recently. My docker --version says: None of the docker interaction commands work: docker ps, version, info, etc. CPU usage is about 16% when idle, which is much more than it used to be (I don't remember even seeing dockerd on "top" before this started to happen) |
For me pruning docker manually helped, instructions here: https://coderwall.com/p/-vsmba/manually-remove-docker-containers-on-ubuntu (I also updated my docker-ce to "Docker version 18.06.3-ce, build d7080c1", but I had to prune before reinstallation was successful) |
We have exactly the same issue as this also, as seen on the graph below: dockerd using progressively more and more resources everynight. It is just strange that the CPU usage is the inverse of our application workload (i.e. high at night when there is very little traffic going through the system). We are also running Datadog and our cluster is hosted in EKS. Docker info: This has become such an issue that it is now affecting production workloads. Did anyone else work out what may be causing this issue? |
Can you take a CPU profile? |
I am seeing similar problems in production. One of our machines is running full 100% CPU and about 96% of that is being used by the docker daemon. There are about 15 docker containers running all of them being orchestrated by HashiCorp Nomad. But there is no increase in traffic which can relate to this. And even if traffic increases the docker container should use high CPU not the docker daemon. Below are the details Also, I'm not sure what is the purpose of this "for-linux" repo. Looks like this issue should be in the main moby project? |
This is happening to me as well when I run the Datadog agent on docker the CPU is always high. It seems to work fine when I pause the Datadog agent. |
I am seeing very similar behavior as well. No clue yet as to why. How would I determine what the dockerd is spending time on? |
I am seeing the similar behavior after restarting the docker deamon cpu usage drop down |
can you please provide cpu profile command |
I find the issue this is due to datadog agent when log is enabled in datadog agent config |
That was also our suspicion... Any idea what the agent is doing, as it's literally killing our prod system (but we can't turn DD logging off as we rely on it!). |
@anshul0915
|
FWIW, I spoke to the containers team at Datadog and they said they haven't seen any issues. They recommended I just reinstall Docker and the agent, so I did just reinstall both and things are working fine 💯 |
I can confirm this issue. I reproduced it 2 times (in staging and production). I enabled the log collection on the datadog agent. And I can reproduce it systematically. Do not hesitate to ask me some details. |
@chabou Thanks! Your issue seems to be during json decoding of the container logs. You might try the "local" log driver which isn't as CPU intensive as json (uses protobuf to encode log messages), but probably what will happen is it will just consume faster and still use up CPU. |
Probably the best way to handle this is to not stream every message, and do bulk processing, which is probably something that requires a major change, like moby/moby#40517 |
I'm going to close this issue since it seems too generic to be actionable. |
I recently came across what I suspect is the same root issue here - I don't use DataDog, but we have a service that's capturing logs in a stream from the docker API for a group of 20-40 containers, and long-lived instances were showing very high CPU usage for the dockerd process combined with pretty high read I/O for the same process. This problem seems to be exacerbated by the fact that docker does not, by default, place any constraint on the size of the .json log file for each container; as time goes on, it takes progressively more CPU and I/O time to parse the gigantic json files (hundreds of megs to a gig or two per container in our case, across 20-40 containers) that result from this. By setting up the As a gotcha, it appears that you must provide the |
The default driver can be inefficient [1] as it reads / parses / formats / writes a large JSON file over and over. Since all of the playground's communication goes over stdin / stdout, that can be a lot of junk logged! The `local` driver should be more efficient. [1]: docker/for-linux#641
The default driver can be inefficient [1] as it reads / parses / formats / writes a large JSON file over and over. Since all of the playground's communication goes over stdin / stdout, that can be a lot of junk logged! The `local` driver should be more efficient. [1]: docker/for-linux#641
Expected behavior
dockerd
uses very little CPU resources when idle.Actual behavior
dockerd
uses increasing amounts of CPU resources when idle.The behavior
I've recently discovered concerning CPU usage coming from
dockerd
; it seems to be using more and more CPU as time goes on, but mostly when the system is otherwise idle.The system is primarily running docker (18.06-ce) with node, redis, mongod, nginx, and datadog agent containers. The system sees very steady weekday traffic but drops during the night, as shown in the chart below.
This graph shows the total container CPU usage (in gray) and the total system CPU usage (in orange). The difference in docker container and system usage is always the
dockerd
process itself. Here's a sampletop
output during one of the more recent nights showingdockerd
using1214%
of the CPU.And for good measure, here's a closeup of the worst night
The docker daemon was restarted and updated (to 18.09-ce) last weekend and the CPU usage dropped back to normal, but is already showing the same symptoms.
I do not have the knowledge to figure out what
dockerd
is doing at these times. Other similar issues had hinted at long-running log or stat related issues, but all our logs are capped and I hope the datadog agent is causing the load somehow.Output of
docker version
:Output of
docker info
:The text was updated successfully, but these errors were encountered: