New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Who stolen my VM's memory? #2516

Open
songyingjun opened this Issue Oct 26, 2018 · 10 comments

Comments

Projects
None yet
5 participants
@songyingjun

songyingjun commented Oct 26, 2018

Issue Report

My server‘s memory is gone

Bug

pcpu_get_vm_areas allocated much more memory

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1576.5.0
VERSION_ID=1576.5.0
BUILD_ID=2018-01-05-1121
PRETTY_NAME="Container Linux by CoreOS 1576.5.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"


$docker info
Containers: 31
 Running: 18
 Paused: 0
 Stopped: 13
Images: 163
Server Version: 17.09.0-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.14.11-coreos
Operating System: Container Linux by CoreOS 1576.5.0 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 13.68GiB
Name: dockerhost04_cluster01
ID: PQ73:SB3L:PK3V:OIC4:ATHD:LYON:5DD6:BD6E:ML7F:LGPI:H3OQ:PK2T
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
 http://44ae3461.m.daocloud.io/
Live Restore Enabled: false

Environment

Azure VM Standard DS3 (4 core、14 GiB mem)

Expected Behavior

Actual Behavior

kernel memory leak?

free -m
             total       used       free     shared    buffers     cached
Mem:         14006      12854       1152         39        265        691
-/+ buffers/cache:      11896       2109
Swap:        24575       2304      22271
ps aux|awk '{sum+=$6} END {print sum/1024}'
4301.11
grep pcpu_get_vm_areas /proc/vmallocinfo | awk '{total+=$2}; END {print total/1024/1024}'
6592
$ cat /proc/vmallocinfo|grep 33554432
0xffffd68cf1e00000-0xffffd68cf3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68cf7e00000-0xffffd68cf9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68cf9e00000-0xffffd68cfbe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d01e00000-0xffffd68d03e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d05e00000-0xffffd68d07e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d07e00000-0xffffd68d09e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d0de00000-0xffffd68d0fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d0fe00000-0xffffd68d11e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d13e00000-0xffffd68d15e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d15e00000-0xffffd68d17e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d19e00000-0xffffd68d1be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d1be00000-0xffffd68d1de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d35e00000-0xffffd68d37e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d37e00000-0xffffd68d39e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d3be00000-0xffffd68d3de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d3de00000-0xffffd68d3fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d3fe00000-0xffffd68d41e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d41e00000-0xffffd68d43e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d43e00000-0xffffd68d45e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d45e00000-0xffffd68d47e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d47e00000-0xffffd68d49e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d49e00000-0xffffd68d4be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d4be00000-0xffffd68d4de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d4de00000-0xffffd68d4fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d4fe00000-0xffffd68d51e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d51e00000-0xffffd68d53e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d53e00000-0xffffd68d55e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d55e00000-0xffffd68d57e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d57e00000-0xffffd68d59e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d59e00000-0xffffd68d5be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d5be00000-0xffffd68d5de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d5de00000-0xffffd68d5fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d5fe00000-0xffffd68d61e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d61e00000-0xffffd68d63e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d63e00000-0xffffd68d65e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d65e00000-0xffffd68d67e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d67e00000-0xffffd68d69e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d69e00000-0xffffd68d6be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d6be00000-0xffffd68d6de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d6de00000-0xffffd68d6fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d6fe00000-0xffffd68d71e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d71e00000-0xffffd68d73e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d73e00000-0xffffd68d75e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d75e00000-0xffffd68d77e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d77e00000-0xffffd68d79e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d79e00000-0xffffd68d7be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d7be00000-0xffffd68d7de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d7de00000-0xffffd68d7fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d7fe00000-0xffffd68d81e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d81e00000-0xffffd68d83e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d83e00000-0xffffd68d85e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d85e00000-0xffffd68d87e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d87e00000-0xffffd68d89e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d89e00000-0xffffd68d8be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d8be00000-0xffffd68d8de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d8de00000-0xffffd68d8fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d8fe00000-0xffffd68d91e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d91e00000-0xffffd68d93e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d93e00000-0xffffd68d95e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d95e00000-0xffffd68d97e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d97e00000-0xffffd68d99e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d99e00000-0xffffd68d9be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d9be00000-0xffffd68d9de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d9de00000-0xffffd68d9fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68d9fe00000-0xffffd68da1e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68da1e00000-0xffffd68da3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68da3e00000-0xffffd68da5e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68da5e00000-0xffffd68da7e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68da7e00000-0xffffd68da9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68da9e00000-0xffffd68dabe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dabe00000-0xffffd68dade00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dade00000-0xffffd68dafe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dafe00000-0xffffd68db1e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68db1e00000-0xffffd68db3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68db3e00000-0xffffd68db5e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68db5e00000-0xffffd68db7e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68db7e00000-0xffffd68db9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dbbe00000-0xffffd68dbde00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dbde00000-0xffffd68dbfe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dbfe00000-0xffffd68dc1e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dc1e00000-0xffffd68dc3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dc3e00000-0xffffd68dc5e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dc7e00000-0xffffd68dc9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dc9e00000-0xffffd68dcbe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dcbe00000-0xffffd68dcde00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dcde00000-0xffffd68dcfe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dcfe00000-0xffffd68dd1e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dd1e00000-0xffffd68dd3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dd3e00000-0xffffd68dd5e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dd5e00000-0xffffd68dd7e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dd7e00000-0xffffd68dd9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dd9e00000-0xffffd68ddbe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ddbe00000-0xffffd68ddde00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ddde00000-0xffffd68ddfe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ddfe00000-0xffffd68de1e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68de1e00000-0xffffd68de3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68de3e00000-0xffffd68de5e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68de5e00000-0xffffd68de7e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68de7e00000-0xffffd68de9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68de9e00000-0xffffd68debe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68debe00000-0xffffd68dede00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dede00000-0xffffd68defe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68defe00000-0xffffd68df1e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68df1e00000-0xffffd68df3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68df3e00000-0xffffd68df5e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68df5e00000-0xffffd68df7e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68df7e00000-0xffffd68df9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68df9e00000-0xffffd68dfbe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dfbe00000-0xffffd68dfde00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dfde00000-0xffffd68dffe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68dffe00000-0xffffd68e01e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e01e00000-0xffffd68e03e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e03e00000-0xffffd68e05e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e05e00000-0xffffd68e07e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e07e00000-0xffffd68e09e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e09e00000-0xffffd68e0be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e0be00000-0xffffd68e0de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e0de00000-0xffffd68e0fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e0fe00000-0xffffd68e11e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e11e00000-0xffffd68e13e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e13e00000-0xffffd68e15e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e15e00000-0xffffd68e17e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e17e00000-0xffffd68e19e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e19e00000-0xffffd68e1be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e1be00000-0xffffd68e1de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e1de00000-0xffffd68e1fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e1fe00000-0xffffd68e21e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e21e00000-0xffffd68e23e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e23e00000-0xffffd68e25e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e25e00000-0xffffd68e27e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e27e00000-0xffffd68e29e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e29e00000-0xffffd68e2be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e2be00000-0xffffd68e2de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e2de00000-0xffffd68e2fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e2fe00000-0xffffd68e31e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e31e00000-0xffffd68e33e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e33e00000-0xffffd68e35e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e35e00000-0xffffd68e37e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e37e00000-0xffffd68e39e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e39e00000-0xffffd68e3be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e3be00000-0xffffd68e3de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e3de00000-0xffffd68e3fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e3fe00000-0xffffd68e41e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e41e00000-0xffffd68e43e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e43e00000-0xffffd68e45e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e45e00000-0xffffd68e47e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e47e00000-0xffffd68e49e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e49e00000-0xffffd68e4be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e4be00000-0xffffd68e4de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e4de00000-0xffffd68e4fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e4fe00000-0xffffd68e51e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e51e00000-0xffffd68e53e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e53e00000-0xffffd68e55e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e55e00000-0xffffd68e57e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e57e00000-0xffffd68e59e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e59e00000-0xffffd68e5be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e5be00000-0xffffd68e5de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e5de00000-0xffffd68e5fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e5fe00000-0xffffd68e61e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e61e00000-0xffffd68e63e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e63e00000-0xffffd68e65e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e65e00000-0xffffd68e67e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e67e00000-0xffffd68e69e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e69e00000-0xffffd68e6be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e6be00000-0xffffd68e6de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e6de00000-0xffffd68e6fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e6fe00000-0xffffd68e71e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e71e00000-0xffffd68e73e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e73e00000-0xffffd68e75e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e75e00000-0xffffd68e77e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e77e00000-0xffffd68e79e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e79e00000-0xffffd68e7be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e7be00000-0xffffd68e7de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e7de00000-0xffffd68e7fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e7fe00000-0xffffd68e81e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e81e00000-0xffffd68e83e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e83e00000-0xffffd68e85e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e85e00000-0xffffd68e87e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e87e00000-0xffffd68e89e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e89e00000-0xffffd68e8be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e8be00000-0xffffd68e8de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e8de00000-0xffffd68e8fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e8fe00000-0xffffd68e91e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e91e00000-0xffffd68e93e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e93e00000-0xffffd68e95e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e95e00000-0xffffd68e97e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e97e00000-0xffffd68e99e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e99e00000-0xffffd68e9be00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e9be00000-0xffffd68e9de00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e9de00000-0xffffd68e9fe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68e9fe00000-0xffffd68ea1e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ea1e00000-0xffffd68ea3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ea3e00000-0xffffd68ea5e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ea5e00000-0xffffd68ea7e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ea7e00000-0xffffd68ea9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ea9e00000-0xffffd68eabe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68eabe00000-0xffffd68eade00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68eade00000-0xffffd68eafe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68eafe00000-0xffffd68eb1e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68eb1e00000-0xffffd68eb3e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68eb3e00000-0xffffd68eb5e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68eb5e00000-0xffffd68eb7e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68eb7e00000-0xffffd68eb9e00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68eb9e00000-0xffffd68ebbe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ebbe00000-0xffffd68ebde00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc
0xffffd68ebde00000-0xffffd68ebfe00000 33554432 pcpu_get_vm_areas+0x0/0x550 vmalloc

Reproduction Steps

Have no idea

Other Information

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Oct 26, 2018

Current stable is 1855.5.0; 1576.5.0 is no longer supported. Can you try with latest stable and see if the issue persists? It also is probably worth trying the current alpha since it's running a newer kernel.

@songyingjun

This comment has been minimized.

songyingjun commented Oct 29, 2018

@ajeddeloh Thanks for your advice,I'll do what you suggested。

@dcui

This comment has been minimized.

dcui commented Oct 31, 2018

The upstream v4.15 and v4.19 kernels have the same/similar issue and it looks people there have made patches:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792349
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792349/comments/7

Comment 7 for bug 1792349
Daniel McGinnes (djmcginnes) wrote on 2018-10-11:
#7
Hi, as per this update -> https://www.spinics.net/lists/cgroups/msg20660.html
I have a set of patches on top of Kernel 4.19.rc3 that appear to resolve the issue. What is the process for getting these backported to a 4.15 Kernel build for Ubuntu 18?
The list of patches is:
https://lkml.org/lkml/2018/10/7/84
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce7ea4af0838ffd4667ecad4eb5eec7a25342f1e
https://marc.info/?l=linux-netdev&m=153900037804969
010cb21d4ede math64: prevent double calculation of DIV64_U64_ROUND_UP()
arguments
f77d7a05670d mm: don't miss the last page because of round-off error
d18bf0af683e mm: drain memcg stocks on css offlining
71cd51b2e1ca mm: rework memcg kernel stack accounting
f3a2fccbce15 mm: slowly shrink slabs with a relatively small number of objects

Here the kernel version is v4.14.11-coreos, but obviously some of the patches listed in the above link are also required.

Can CoreOS please integrate these upstream fixes and generate a new build for songyingjun to test?
I hope the aforementioned fixes could also work for songyingjun.

@songyingjun

This comment has been minimized.

songyingjun commented Oct 31, 2018

@dcui Awesome! You are right.We run a cronjob container every 5 minutes on this server ,it's the same issue.

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Oct 31, 2018

Thanks for tracking this down.

With the exception of critical security bugs, Container Linux typically waits for patches to make it into kernel releases before including them in our own releases. Bad backports happen and we want to be sure they're reviewed/tested by the upstream kernel folks first.

Do you know if there are plans to backport these to 4.14.x upstream?

@dcui

This comment has been minimized.

dcui commented Oct 31, 2018

I believe the patches should be backported to recent longterm kernels including linux-4.14.y, but it would take some time. So far, the patches just went into the latest mainline (Linus's mater branch), and they have not gone into linux-4.19.y yet.

@jtyoung

This comment has been minimized.

jtyoung commented Nov 1, 2018

I believe this issue has explained some very odd behavior we've been seeing in our Kubernetes cluster recently. We run around 100 nodes that use Container Linux and run thousands of pods. Our infrastructure has been experiencing unexplained OOM events across the board recently, despite nothing changing on the workload or resourcing side.

With the exception of critical security bugs, Container Linux typically waits for patches to make it into kernel releases before including them in our own releases.

Do you have any estimates for how long that process typically takes? If this is a fix that could take on the order of weeks to months to have a patch for, we will need to switch our nodes to run a different OS as this is killing our application.

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Nov 1, 2018

Once the needed patches get in a 4.14.x and 4.19.x release we can do an release with those new kernels in beta/alpha. After the changes have baked in beta a bit we can promote that to stable (~1 week).

You might try messaging the linux stable mailing list to try to get the needed patches in. See also the stable kernel rules.

@dcui

This comment has been minimized.

dcui commented Nov 2, 2018

I asked about the mm patches: https://www.lkml.org/lkml/2018/11/2/160

The other patches have the "Cc: stable@vger.kernel.org" tag, so will be automatically backported to the longterm kernels including v4.14.y.

It looks the ipv6 fix (ipv6: fix memory leak on dst->_metrics) doesn't apply to v4.14.y.

But I guess the required backport for v4.14.y may not finish very fast -- at least a few weeks may be required...

@dcui

This comment has been minimized.

dcui commented Nov 2, 2018

More patches are required: https://lkml.org/lkml/2018/11/2/182
It looks we'll have to wait for some time, before the kernel stabilizes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment