-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker OOM event handling inappropriate #670
Comments
An example of aforementioned logs that have insatiably devoured our system resources! We want a more appropriate handling, please.
Edit: Now imagine the following 58G is filled with that stuff. How is this not a problem of dockerd?
At 1:29pm user reported the service is down, till 3:12pm is what happens if we don't intervene. |
Why not throttle the stdout/stderr of dockerd rather than have dockerd assume it needs to be throttled at a random rate? |
@cpuguy83 Let me rephrase, it's not about the output rate, that'll be just kicking the can, given more time, the systems will still be down, imo it should've been implemented as something like a multi-set in the first place. Would someone go so far as suggesting dockerd taking away 79G/80G of mem and hundreds of GBs of disk due to an unresolved event (which itself is still under cgroup control, merely notification) and somehow bears no responsibility? From what I can tell, containerd just gave the FYI and dockerd simply screwed it up receiving that message. |
The timeline of events in the journal suggests that something terrible is happening between containerd, the kernel, and docker during a container stop that triggers the logging incident. At 4:15, an OOM event occurs in a student container, which is rather unsurprising. Everything seems quiet. The following message then repeats with a sub-millisecond delay until we are able to wrestle control from the system:
The
The rate at which this event occurs is mind boggling. It's not clear to me whether the events logging itself or the strain from attempting this failed operation so rapidly causes the system to eat through all available memory and grind to a halt. Messages in the journal about suppressing 500,000 to 1,000,000 messages from docker are very common while this is happening. After this event occurs, it takes a surprisingly long amount of time for the system to release the memory eaten up during this time, in the realm of hours I believe. |
I can only see a single "oom" event, using this as a repro: time docker run -m 500m --memory-swap=500m python python -c 'a = "asdfsadf".join(map(str, range(1, 1000000000)))' |
@kolyshkin Thank you for your attention & input. So he's only seeing one because the event producer (say containerd) generated one, and that's because the process received sig9 (as the exit code is 137) and died in time, which is great. But our processes did not die, I guess they are supposed to? (Is it so? @vilhelmen ) We tried to manually kill through Kubernetes and docker, both fail, this we were assuming is because the dockerd (not the container) is busy printing the logs, so did any kill sig went through at all? (We don't know). That concept is strengthened when we are able to kill all processes (by literally kill -9) from the container and observe dockerd continue to panic and log about oom even left with the empty container, 0 processes inside. Finally, we were able to murder it by So have the failure to kill played a part in this? Now, regardless, here's why failure to kill is not the key issue of interest and killing container is another angle we are already working on, we are running an education program and user codes that don't die upon oom should be allowed too. Because the user processes are already properly contained themselves and are no danger to our system. However, the user oom triggers dockerd+containerd to panic which causes a secondary system-wide oom by dockerd is what we are concerning about.
Which is to say many logging systems are frequently aware of the risk of logging and are capable of suppressing messages, we still consider this as a dockerd vulnerability. And hoping someone can propose a fix. (Or we'll have to attempt to fix for our case or explore docker replacements) It is not reasonable to rely on the oom offender being killed. What if other runtime are used and have behaviors that can trigger an incessant stream of events too? I think dockerd should fix the logging. |
Update: coming back to investigating this again today. I further found out that @kolyshkin was right, normally there's only one message and the process gets killed, as I tried the other day on my laptop, however, on our systems the exact same code produces an infinite amount of messages:
So what is wrong with our containers??
Can anybody help? |
And actually, it doesn't matter what we run that hit memory limit. Initially, we thought I/O needs to involved too. But now I'm using something as simple as this to replicate the issue as quickly as possible. And now we know for some reason, it so far only happens on our system...
And the procedure to recover from this is:
Why is this happening on our system and not on others? |
What is the (docker daemon and containerd)'s |
@ariesdevil I believe it's listed in the inspect output above (0) if you do a Ctrl-F. So? |
Sorry for missing it. |
Crosspost here also my issue #693 (comment) Anybody here have any fix/workaround for this problem? |
@aleozlx What are we doing anyway? I know we discussed patching the docker binary to NOP the event function, but I think you whipped up something to murder containers before they OOM'd? I mean, it's a terrible solution, but we don't have a lot of choices. Other systems respawn the containers on kill so it works for now. |
Hi I'm new to the game, but have an easy setup to reproduce this issue: Nginx immediately starts to consume a lot of CPU and soon reaching the memory limit. The continuous OOM events bubble up causing dockerd growing in memory endlessly. More details: kubernetes/kubernetes#69364 (comment) |
Expected behavior
Not logging things blindly, be more cautious about outputing.
Blindly logging everything can pose a severe denial of service risk by exhausting system resources. Suggest aggregating similar log messages in some way to handle (especially) OOM event more appropriately.
Actual behavior
containerd continues to bubble up OOM events and dockerd blindly log such events.
Referring to:
https://github.com/docker/docker-ce/blob/98ddba151eebbff519cc86342443c3a09f8ce334/components/engine/daemon/monitor.go#L50
For us this is a huge issue because the consequence has been dockerd generating a continuous rate of 10Mbps of log output observed on our systems and eventually exhausted all 80GB of system RAM per node + all disk spaces if not caught in time by us, which leads to a denial of service.
We limit our user's resources but we have no control over what they run, but once OOM occurs by resource limits, dockerd is at fault for consuming the rest and majority of system resources by logging recurring OOM events.
Steps to reproduce the behavior
Make a container with resource limits (we did this through RKE Kubernetes) and continually allocate memory till OOM triggers.
An actual example we caught was setting a container memory limit at 4G and load a 5G csv file using read.csv() from R language. And it has caused dockerd to continually generate a massive amount of logs eventually jammed our system.
Output of
docker version
:Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:48:22 2018
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.4
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: d14af54
Built: Wed Mar 27 18:04:46 2019
OS/Arch: linux/amd64
Experimental: false
Output of
docker info
:Containers: 42
Running: 23
Paused: 0
Stopped: 19
Images: 61
Server Version: 18.09.4
Storage Driver: devicemapper
Pool Name: docker-thinpool
Pool Blocksize: 524.3kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data Space Used: 112.2GB
Data Space Total: 765GB
Data Space Available: 652.9GB
Metadata Space Used: 70.76MB
Metadata Space Total: 8.049GB
Metadata Space Available: 7.978GB
Thin Pool Minimum Free Space: 76.5GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.149-RHEL7 (2018-07-20)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 5.0.7-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 18
Total Memory: 78.66GiB
Name: k-prod-cpu-6.dsa.lan
ID: S65O:V6T3:PQ2L:KKKW:PSC6:DTDJ:6ITH:KICE:EFWX:DQT3:NBSP:XSND
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: bridge-nf-call-ip6tables is disabled
WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
Additional environment details (AWS, VirtualBox, physical, etc.)
RKE+oVirt.
The text was updated successfully, but these errors were encountered: