Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running container fails with failed to add the host cannot allocate memory #1443

Open
hpakniamina opened this issue Nov 29, 2022 · 68 comments
Open

Comments

@hpakniamina
Copy link

hpakniamina commented Nov 29, 2022

OS: Red Hat Enterprise Linux release 8.7 (Ootpa)
Version:

$ sudo yum list installed | grep docker
containerd.io.x86_64                         1.6.9-3.1.el8                               @docker-ce-stable
docker-ce.x86_64                             3:20.10.21-3.el8                            @docker-ce-stable
docker-ce-cli.x86_64                         1:20.10.21-3.el8                            @docker-ce-stable
docker-ce-rootless-extras.x86_64             20.10.21-3.el8                              @docker-ce-stable
docker-scan-plugin.x86_64                    0.21.0-3.el8                                @docker-ce-stable

Out of hundreds os docker calls made over days, a few of them fails. This is the schema of the commandline:

/usr/bin/docker run \
-u 1771:1771 \
-a stdout \
-a stderr \
-v /my_path:/data \
--rm \
my_image:latest my_entry --my_args

The failure:

docker: Error response from daemon: failed to create endpoint recursing_aryabhata on network bridge: failed to add the host (veth6ad97f8) <=> sandbox (veth23b66ce) pair interfaces: cannot allocate memory.

It is not easily reproducible. The failure rate is less than one percent. At the time this error happens system has lots of free memory. Around the time that this failure happens, the application is making around 5 docker calls per second. Each call take about 5 to 10 seconds to complete.

@petergerten
Copy link

I have the same issue on Arch, also not consistently reproducible.
Docker version 20.10.23, build 715524332f

@hpakniamina
Copy link
Author

I have the same issue on Arch, also not consistently reproducible. Docker version 20.10.23, build 715524332f

I did not need the networking features of the container, therefore passing "--network none" to docker run commandline circumvented the problem:

docker run ... --network none ...

@henryborchers
Copy link

It's happening to me when I am building my images. Sadly, it too is not able to be reproduced consistently.

docker build ...

@nixon89
Copy link

nixon89 commented Feb 8, 2023

I have the same behavior with docker build command (cannot allocate memory)

# docker version

Client: Docker Engine - Community
 Version:           23.0.0
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        e92dd87
 Built:             Wed Feb  1 17:47:51 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.0
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       d7573ab
  Built:            Wed Feb  1 17:47:51 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.16
  GitCommit:        31aa4358a36870b21a992d3ad2bef29e1d693bec
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
# apt list --installed | grep docker

docker-buildx-plugin/jammy,now 0.10.2-1~ubuntu.22.04~jammy amd64 [installed,automatic]
docker-ce-cli/jammy,now 5:23.0.0-1~ubuntu.22.04~jammy amd64 [installed]
docker-ce-rootless-extras/jammy,now 5:23.0.0-1~ubuntu.22.04~jammy amd64 [installed,automatic]
docker-ce/jammy,now 5:23.0.0-1~ubuntu.22.04~jammy amd64 [installed]
docker-compose-plugin/jammy,now 2.15.1-1~ubuntu.22.04~jammy amd64 [installed,automatic]
docker-scan-plugin/jammy,now 0.23.0~ubuntu-jammy amd64 [installed,automatic]

@hostalp
Copy link

hostalp commented Feb 15, 2023

Exactly the same issue here during docker build,
Rocky Linux 8.7 (RHEL 8.7 clone), Docker 20.10.22-3.el8

@b-khouy
Copy link

b-khouy commented Feb 15, 2023

I fixed the prob using docker builder prune command then run the build again
https://docs.docker.com/engine/reference/commandline/builder_prune

@hpakniamina
Copy link
Author

I fixed the prob using docker builder prune command then run the build again https://docs.docker.com/engine/reference/commandline/builder_prune

If one is dealing with an intermittent problem, then there is no guarantee the issue is resolved.

@bendem
Copy link

bendem commented Apr 4, 2023

Same problem here, every x times, a build fails with failed to add the host ( ) <=> sandbox ( ) pair interfaces: cannot allocate memory.. System info :

$ dnf list --installed docker\* containerd\* | cat
Installed Packages
containerd.io.x86_64                    1.6.20-3.1.el8         @docker-ce-stable
docker-buildx-plugin.x86_64             0.10.4-1.el8           @docker-ce-stable
docker-ce.x86_64                        3:23.0.2-1.el8         @docker-ce-stable
docker-ce-cli.x86_64                    1:23.0.2-1.el8         @docker-ce-stable
docker-ce-rootless-extras.x86_64        23.0.2-1.el8           @docker-ce-stable
docker-compose-plugin.x86_64            2.17.2-1.el8           @docker-ce-stable
docker-scan-plugin.x86_64               0.23.0-3.el8           @docker-ce-stable

$ sudo docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.4
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 55
 Server Version: 23.0.2
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc version: v1.1.5-0-gf19387a
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 4.18.0-425.13.1.el8_7.x86_64
 Operating System: Rocky Linux 8.7 (Green Obsidian)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.4GiB
 Name: x
 ID: NUAJ:VDZR:RMDC:ASCP:5SEG:D4EF:OEIW:RY57:VXYI:5EZV:6F4F:D5RO
 Docker Root Dir: /opt/docker_data
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://x/
 Live Restore Enabled: false
 Default Address Pools:
   Base: 172.17.0.0/16, Size: 24
   Base: 172.20.0.0/16, Size: 24
   Base: 172.30.0.0/16, Size: 24

@bendem
Copy link

bendem commented Apr 4, 2023

If I understand correctly, this is the same as https://bbs.archlinux.org/viewtopic.php?id=282429 which is fixed by this patch queued here.

@henryborchers
Copy link

I don't know if this helps but it's happening to me on Rocky Linux 8.7 as well, just like @hostalp.

@pschoen-itsc
Copy link

We have the same issue on Ubuntu 20.04 since a few weeks.

@thaJeztah
Copy link
Member

/cc @akerouanton FYI (I see a potential kernel issue mentioned above)

@pschoen-itsc
Copy link

We have the problem with an older kernel (5.15), so I do not think that there is a connection with the mentioned kernel bug.

@XuNiLuS
Copy link

XuNiLuS commented Jun 29, 2023

I have the same problem with a debian 12 (6.1.0-9-amd64), but no problem with a debian 11 (5.10.0-21-amd64)

@utrotzek
Copy link

utrotzek commented Jul 3, 2023

Same Problem on Ubuntu 22.04

Linux 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

This error is really annoying since all of our CI/CD Pipelines fail randomly.

@skast96
Copy link

skast96 commented Jul 6, 2023

Same problem on Ubuntu 22.04 on a server

@WolleTD
Copy link

WolleTD commented Jul 14, 2023

I have the same problem with a debian 12 (6.1.0-9-amd64), but no problem with a debian 11 (5.10.0-21-amd64)

Same here. Moved a server to Debian 12 which is starting a new container once per day to create backups of docker volumes. After some days or weeks, starting the container fails until I restart the docker daemon.

@skast96
Copy link

skast96 commented Jul 17, 2023

I compared three of my servers with the failing one and the only thing that is different is that the vm.swappiness is set to 0 and that the server has no swap activated at all. If that helps

@pschoen-itsc
Copy link

I compared three of my servers with the failing one and the only thing that is different is that the vm.swappiness is set to 0 and that the server has no swap activated at all. If that helps

We disabled swap on our failing servers, but it did not help.

@skast96
Copy link

skast96 commented Jul 17, 2023

I compared three of my servers with the failing one and the only thing that is different is that the vm.swappiness is set to 0 and that the server has no swap activated at all. If that helps

We disabled swap on our failing servers, but it did not help.

I thought of the other way around and wanted to check if enabling the swap helps 🤔

@utrotzek
Copy link

I compared three of my servers with the failing one and the only thing that is different is that the vm.swappiness is set to 0 and that the server has no swap activated at all. If that helps

That was EXACTLY the case on our servers.

Changing the values in /etc/sysctl.conf

from

vm.swappiness = 0

to

vm.swappiness = 60

and applying it with sysctl -p

solved it for. You saved my life! ;) I forgot that I had set this value.

@skast96
Copy link

skast96 commented Jul 31, 2023

Did anyone fix that error without enabling the swap because on that specific server it is not possible to enable the swap....

@pschoen-itsc
Copy link

Configured swap as suggested here (disabled image pruning before each run) and the error appeared again after a few days.

@skast96
Copy link

skast96 commented Aug 13, 2023

I am still trying to fix this on Ubuntu 22.04 without a swap. My next guess is that i miss configured something in my compose files, which leads to a high number of network connections left open?!? I am not sure if that fixes that problem or if it really is due to a kernel error. I will report my findings here next week. If anyone has figured it out please feel free to comment.

@hpakniamina
Copy link
Author

My next guess is that i miss configured something in my compose files

As mentioned before, we did not need the networking, so "--network none" helped us to go around it. We don't have docker compose. We simply call docker a couple of thousand times. Docker container reads the input and writes into the output and the container is removed by "--rm". Our issue does not have anything to do with weird configurations or docker compose.

@AmirL
Copy link

AmirL commented Aug 17, 2023

Have the same problem. vm.swappiness = 60 helped for a while, but now the problem is back again.

@cimchd
Copy link

cimchd commented Aug 18, 2023

Same problem here. We have 89 services in our compose file. Running docker system prune before the build usually solves the problem temporary.

@WolleTD
Copy link

WolleTD commented Oct 13, 2023

I can also report that this doesn't appear to happen on an Arch Linux machine which is starting a lot of containers for CI jobs (which I keep in my production enviroment for things like this), while the Debian Bookworm next to it experiences this all the time. That would also support that patch as an underlying issue.

Bookworm is at 6.1.55 right now, so I guess I'll wait for the next kernel update and see if the problem disappears?

@pschoen-itsc
Copy link

We updated our Ubuntu 22.04 Servers to kernel 6.5 (New from the Ubuntu 23.10 release). This kernel has the above mentioned patch and so far the error did not appear anymore. I will report back when we see the error again.

@JonasAlfredsson
Copy link

Got a comment from a person that is more familiar with the kernel than me that the page allocation failure: order:5 seemed very large for the system we have, and suggested explicitly setting the nr_cpus kernel parameter at boot since this apparently have an effect on how much memory the system allocates for this call.

We have a VM with 12 cores exposed to it (while the physical host has many more), so I did the following:

echo 'GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT nr_cpus=12"' > /etc/default/grub.d/cpu-limit.cfg  
update-grub
reboot

and while the reboot definitely helped reset the system to a non-fragmented state, we have seen 36 hours of no errors. I will also update this post in case we see the error on this host again.

@pschoen-itsc
Copy link

pschoen-itsc commented Nov 2, 2023

We haven't got any errors since updating our kernel to 6.5, so i think this resolves the issue.

@CoyoteWAN
Copy link

We updated our Ubuntu 22.04 Servers to kernel 6.5 (New from the Ubuntu 23.10 release). This kernel has the above mentioned patch and so far the error did not appear anymore. I will report back when we see the error again.

We just did the same a few weeks ago, but updating the kernel did not fix our problem. Was there anything else you did after the kernel update?

@pschoen-itsc
Copy link

We updated our Ubuntu 22.04 Servers to kernel 6.5 (New from the Ubuntu 23.10 release). This kernel has the above mentioned patch and so far the error did not appear anymore. I will report back when we see the error again.

We just did the same a few weeks ago, but updating the kernel did not fix our problem. Was there anything else you did after the kernel update?

Nothing I'm aware of. Before that we also tried different things which were suggested here (mostly playing around with swap), but this did not have a direct effect. Now swap is enabled and vm.swappiness = 60

Kernel in use:
Linux ****** 6.5.0-1004-oem #4-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 15 19:52:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

@pschoen-itsc
Copy link

@CoyoteWAN
Today, one month later, the we had the same error again.

@PieterDeelen
Copy link

PieterDeelen commented Nov 14, 2023

We applied the nr_cpus workaround suggested by @JonasAlfredsson two weeks ago and have not seen the error since then. We were already running kernel 6.5 before, but that did not help in our case.

@JonasAlfredsson
Copy link

JonasAlfredsson commented Nov 14, 2023

Just checked the collected logs from the two servers I experimented on:

  • High churn server with applied nr_cpus patch => 28 days without the error
  • Low churn server without patch, but I rebooted it => lasted ~20 days before displaying this error again.

Will apply the patch to the other servers now and see what happens.

@acdha
Copy link

acdha commented Nov 15, 2023

We applied the workaround suggested by @JonasAlfredsson two weeks ago and have not seen the error since then. We were already running kernel 6.5 before, but that did not help in our case.

We have had similar results on RedHat 8.9 systems with this total kludge in cron:

*/5 * * * * sh -c 'echo 3 > /proc/sys/vm/drop_caches; echo 1 > /proc/sys/vm/compact_memory'

Update 2024-01-08: we still see sporadic failures but only during peak activity when we have a bunch of GitLab CI scheduled builds launching at the same time which keeps every CI runner fully loaded. It's still infrequent then so the workaround is clearly having some impact but is not a solution.

@akerouanton
Copy link
Member

I'm wondering; does anyone in this thread tried to report this issue to kernel devs or to their distro? This issue isn't actionable on our side until we're able to reproduce it, and AFAICT we're not; so there's little we can do at this point.

@attie-argentum
Copy link

attie-argentum commented Jan 19, 2024

I've been seeing this issue for some time too... I've tried tweaking swappiness and dropping caches, but it's never helped for long, or only improves the chances of things working - the only thing I've found that resolves this is a full reboot, and then it's just a matter of time until it happens again... I think I've tried restarting the docker daemon (and all containers), but I don't remember, so will give that a go this time.

My last boot was 2023-12-20, and it started occurring again on 2024-01-12... the probability appears to be zero for a while, then starts to increase over time, until docker is virtually unusable, and I'm forced to reboot.

As for reproducing it - I don't think an idle system will show this, but rather a system that creates / destroys many containers will probably work its way to this point over time (or quite possibly veth devices specifically).

From a quick look at metrics recorded by Telegraf, nothing in particular stands out - though I did notice a large number of defunct / zombie processes, so we'll see if dealing with them gives my system a new lease of life... 🤞 (I'm not hopeful)

The output in dmesg I see is similar to @JonasAlfredsson's (here):

dmesg output
[2548650.485135] docker0: port 7(vethadd18b9) entered blocking state
[2548650.485139] docker0: port 7(vethadd18b9) entered disabled state
[2548650.485181] device vethadd18b9 entered promiscuous mode
[2548650.549225] warn_alloc: 1 callbacks suppressed
[2548650.549228] dockerd: page allocation failure: order:4, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=docker.service,mems_allowed=0
[2548650.549236] CPU: 8 PID: 3830513 Comm: dockerd Tainted: P           O      5.15.0-91-generic #101-Ubuntu
[2548650.549239] Hardware name: ASUS System Product Name/Pro WS W790E-SAGE SE, BIOS 0506 04/28/2023
[2548650.549240] Call Trace:
[2548650.549241]  <TASK>
[2548650.549244]  show_stack+0x52/0x5c
[2548650.549251]  dump_stack_lvl+0x4a/0x63
[2548650.549257]  dump_stack+0x10/0x16
[2548650.549259]  warn_alloc+0x138/0x160
[2548650.549263]  __alloc_pages_slowpath.constprop.0+0xa44/0xa80
[2548650.549265]  __alloc_pages+0x311/0x330
[2548650.549267]  alloc_pages+0x9e/0x1e0
[2548650.549270]  kmalloc_order+0x2f/0xd0
[2548650.549273]  kmalloc_order_trace+0x1d/0x90
[2548650.549275]  __kmalloc+0x2b1/0x330
[2548650.549279]  veth_alloc_queues+0x25/0x80 [veth]
[2548650.549282]  veth_dev_init+0x72/0xd0 [veth]
[2548650.549284]  register_netdevice+0x119/0x650
[2548650.549287]  veth_newlink+0x258/0x440 [veth]
[2548650.549290]  __rtnl_newlink+0x77c/0xa50
[2548650.549293]  ? __find_get_block+0xe0/0x240
[2548650.549297]  ? __nla_validate_parse+0x12f/0x1b0
[2548650.549301]  ? __nla_validate_parse+0x12f/0x1b0
[2548650.549303]  ? netdev_name_node_lookup+0x36/0x80
[2548650.549306]  ? __dev_get_by_name+0xe/0x20
[2548650.549309]  rtnl_newlink+0x49/0x70
[2548650.549311]  rtnetlink_rcv_msg+0x15a/0x400
[2548650.549313]  ? rtnl_calcit.isra.0+0x130/0x130
[2548650.549315]  netlink_rcv_skb+0x53/0x100
[2548650.549319]  rtnetlink_rcv+0x15/0x20
[2548650.549322]  netlink_unicast+0x220/0x340
[2548650.549323]  netlink_sendmsg+0x24b/0x4c0
[2548650.549329]  sock_sendmsg+0x66/0x70
[2548650.549332]  __sys_sendto+0x113/0x190
[2548650.549334]  __x64_sys_sendto+0x24/0x30
[2548650.549335]  do_syscall_64+0x59/0xc0
[2548650.549338]  ? syscall_exit_to_user_mode+0x35/0x50
[2548650.549339]  ? __x64_sys_recvfrom+0x24/0x30
[2548650.549341]  ? do_syscall_64+0x69/0xc0
[2548650.549342]  ? do_syscall_64+0x69/0xc0
[2548650.549343]  ? exit_to_user_mode_prepare+0x37/0xb0
[2548650.549346]  ? syscall_exit_to_user_mode+0x35/0x50
[2548650.549348]  ? do_syscall_64+0x69/0xc0
[2548650.549349]  ? irqentry_exit+0x1d/0x30
[2548650.549350]  ? exc_page_fault+0x89/0x170
[2548650.549352]  entry_SYSCALL_64_after_hwframe+0x62/0xcc
[2548650.549355] RIP: 0033:0x5633739e548e
[2548650.549357] Code: 48 89 6c 24 38 48 8d 6c 24 38 e8 0d 00 00 00 48 8b 6c 24 38 48 83 c4 40 c3 cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[2548650.549359] RSP: 002b:000000c00a5c1908 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[2548650.549361] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00005633739e548e
[2548650.549362] RDX: 0000000000000074 RSI: 000000c0061d3180 RDI: 000000000000000c
[2548650.549363] RBP: 000000c00a5c1948 R08: 000000c0017a2d00 R09: 000000000000000c
[2548650.549363] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[2548650.549364] R13: 0000000000000024 R14: 000000c00f6dc820 R15: 000000c000d01400
[2548650.549366]  </TASK>
[2548650.549366] Mem-Info:
[2548650.549369] active_anon:10079171 inactive_anon:12477227 isolated_anon:0
                  active_file:138230 inactive_file:66237 isolated_file:0
                  unevictable:16 dirty:72 writeback:911
                  slab_reclaimable:412944 slab_unreclaimable:5207631
                  mapped:183355 shmem:87571 pagetables:168675 bounce:0
                  kernel_misc_reclaimable:0
                  free:6451381 free_pcp:1506 free_cma:0
[2548650.549373] Node 0 active_anon:40316684kB inactive_anon:49908908kB active_file:552920kB inactive_file:264948kB unevictable:64kB isolated(anon):0kB isolated(file):0kB mapped:733420kB dirty:288kB writeback:3644kB shmem:350284kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 415744kB writeback_tmp:0kB kernel_stack:180064kB pagetables:674700kB all_unreclaimable? no
[2548650.549377] Node 0 DMA free:11260kB min:0kB low:12kB high:24kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[2548650.549380] lowmem_reserve[]: 0 1359 257097 257097 257097
[2548650.549383] Node 0 DMA32 free:1022968kB min:356kB low:1744kB high:3132kB reserved_highatomic:0KB active_anon:23512kB inactive_anon:268248kB active_file:344kB inactive_file:0kB unevictable:0kB writepending:0kB present:1574044kB managed:1507828kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[2548650.549386] lowmem_reserve[]: 0 0 255737 255737 255737
[2548650.549388] Node 0 Normal free:24771296kB min:953664kB low:1215536kB high:1477408kB reserved_highatomic:0KB active_anon:40293172kB inactive_anon:49640660kB active_file:552576kB inactive_file:264948kB unevictable:64kB writepending:4428kB present:266338304kB managed:261884428kB mlocked:64kB bounce:0kB free_pcp:6500kB local_pcp:0kB free_cma:0kB
[2548650.549393] lowmem_reserve[]: 0 0 0 0 0
[2548650.549395] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 2*4096kB (M) = 11260kB
[2548650.549407] Node 0 DMA32: 10822*4kB (ME) 6396*8kB (ME) 3214*16kB (ME) 1269*32kB (ME) 442*64kB (UME) 266*128kB (UME) 196*256kB (UME) 100*512kB (UME) 85*1024kB (ME) 60*2048kB (ME) 113*4096kB (UME) = 1022968kB
[2548650.549415] Node 0 Normal: 145980*4kB (U) 1274711*8kB (UE) 677842*16kB (UE) 98444*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 24777288kB
[2548650.549421] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[2548650.549423] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[2548650.549423] 331547 total pagecache pages
[2548650.549424] 40340 pages in swap cache
[2548650.549425] Swap cache stats: add 3363822, delete 3323478, find 250191679/250345953
[2548650.549425] Free swap  = 5336824kB
[2548650.549426] Total swap = 16777208kB
[2548650.549427] 66982085 pages RAM
[2548650.549427] 0 pages HighMem/MovableOnly
[2548650.549427] 1130181 pages reserved
[2548650.549428] 0 pages hwpoisoned
[2548650.834172] eth0: renamed from veth47414c9
[2548650.994237] IPv6: ADDRCONF(NETDEV_CHANGE): vethadd18b9: link becomes ready
[2548650.994365] docker0: port 7(vethadd18b9) entered blocking state
[2548650.994368] docker0: port 7(vethadd18b9) entered forwarding state
System Info

Docker is not running inside a VM, but this system is running a number of KVM VMs alongside.

$ uname -a
Linux kenai 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

$ cat /proc/cpuinfo | sed -nre '/^model name/{p;q}'
model name      : Intel(R) Xeon(R) w7-3465X

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           251Gi       223Gi        24Gi       341Mi       2.7Gi        24Gi
Swap:           15Gi        11Gi       4.5Gi

$ dockerd --version
Docker version 24.0.7, build 311b9ff

$ systemctl status docker.service | head -n 11
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-12-20 13:05:20 GMT; 4 weeks 1 day ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
   Main PID: 11267 (dockerd)
      Tasks: 664
     Memory: 416.4M
        CPU: 14h 31min 12.125s
     CGroup: /system.slice/docker.service
             ├─  11267 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

@pschoen-itsc
Copy link

Setting the nr_cpu boot parameter resolved the issue for us permanently.

@JonasAlfredsson
Copy link

Same goes for us, after doing the steps from my comment above, with nr_cpu set to the threads available to the system (grep -c processor /proc/cpuinfo), we haven't seen the previously hourly occurring problem for 3 months straigt.

@attie-argentum
Copy link

Thanks both for your responses - I've put that in place, and will report back if the issue continues! 🤞

@mumbleskates
Copy link

Having never seen this before, I just had two gitlab-ci containers (launched by the native runner, not the docker-in-docker one) fail with this error at the same time. Only one allocation failure was logged to dmesg (seen below). The system is also running zfs, and the system root (and docker) are on btrfs. Swap is disabled, and the system has many gigabytes of free memory both before and after the page cache and the zfs ARC.

root@erebor ~ # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy
root@erebor ~ # uname -a
Linux erebor 6.5.0-15-generic #15~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 12 18:54:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
root@erebor ~ # zfs version
zfs-2.2.2-1
zfs-kmod-2.2.2-1
root@erebor ~ # 
dmesg logs
[906739.889741] dockerd: page allocation failure: order:5, mode:0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=docker.service,mems_allowed=0
[906739.889765] CPU: 52 PID: 1207114 Comm: dockerd Tainted: P           OE      6.5.0-15-generic #15~22.04.1-Ubuntu
[906739.889772] Hardware name: ASUS System Product Name/Pro WS WRX80E-SAGE SE WIFI, BIOS 1003 02/18/2022
[906739.889776] Call Trace:
[906739.889780]  
[906739.889786]  dump_stack_lvl+0x48/0x70
[906739.889796]  dump_stack+0x10/0x20
[906739.889801]  warn_alloc+0x174/0x1f0
[906739.889812]  ? __alloc_pages_direct_compact+0x20b/0x240
[906739.889822]  __alloc_pages_slowpath.constprop.0+0x914/0x9a0
[906739.889835]  __alloc_pages+0x31d/0x350
[906739.889847]  ? veth_dev_init+0x95/0x140 [veth]
[906739.889858]  __kmalloc_large_node+0x7e/0x160
[906739.889866]  __kmalloc.cold+0xc/0xa6
[906739.889875]  veth_dev_init+0x95/0x140 [veth]
[906739.889886]  register_netdevice+0x132/0x700
[906739.889895]  veth_newlink+0x190/0x480 [veth]
[906739.889931]  rtnl_newlink_create+0x170/0x3d0
[906739.889944]  __rtnl_newlink+0x70f/0x770
[906739.889959]  rtnl_newlink+0x48/0x80
[906739.889966]  rtnetlink_rcv_msg+0x170/0x430
[906739.889972]  ? srso_return_thunk+0x5/0x10
[906739.889980]  ? rmqueue+0x93d/0xf10
[906739.889985]  ? srso_return_thunk+0x5/0x10
[906739.889991]  ? __check_object_size.part.0+0x72/0x150
[906739.889999]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[906739.890005]  netlink_rcv_skb+0x5d/0x110
[906739.890020]  rtnetlink_rcv+0x15/0x30
[906739.890027]  netlink_unicast+0x1ae/0x2a0
[906739.890035]  netlink_sendmsg+0x25e/0x4e0
[906739.890047]  sock_sendmsg+0xcc/0xd0
[906739.890053]  __sys_sendto+0x151/0x1b0
[906739.890072]  __x64_sys_sendto+0x24/0x40
[906739.890078]  do_syscall_64+0x5b/0x90
[906739.890085]  ? srso_return_thunk+0x5/0x10
[906739.890091]  ? do_user_addr_fault+0x17a/0x6b0
[906739.890097]  ? srso_return_thunk+0x5/0x10
[906739.890102]  ? exit_to_user_mode_prepare+0x30/0xb0
[906739.890110]  ? srso_return_thunk+0x5/0x10
[906739.890116]  ? irqentry_exit_to_user_mode+0x17/0x20
[906739.890122]  ? srso_return_thunk+0x5/0x10
[906739.890128]  ? irqentry_exit+0x43/0x50
[906739.890133]  ? srso_return_thunk+0x5/0x10
[906739.890139]  ? exc_page_fault+0x94/0x1b0
[906739.890146]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[906739.890153] RIP: 0033:0x55d44da6700e
[906739.890190] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[906739.890194] RSP: 002b:000000c0013750c8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[906739.890201] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 000055d44da6700e
[906739.890206] RDX: 0000000000000074 RSI: 000000c001d0e880 RDI: 000000000000000c
[906739.890209] RBP: 000000c001375108 R08: 000000c0012a4910 R09: 000000000000000c
[906739.890213] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[906739.890216] R13: 000000c0016ba800 R14: 000000c00191c1a0 R15: 0000000000000011
[906739.890227]  
[906739.890231] Mem-Info:
[906739.890239] active_anon:4026084 inactive_anon:4572236 isolated_anon:0
                 active_file:356682 inactive_file:3106746 isolated_file:0
                 unevictable:7026 dirty:361241 writeback:0
                 slab_reclaimable:417679 slab_unreclaimable:1060505
                 mapped:3338536 shmem:3269641 pagetables:30618
                 sec_pagetables:8669 bounce:0
                 kernel_misc_reclaimable:0
                 free:651883 free_pcp:319 free_cma:0
[906739.890250] Node 0 active_anon:16104336kB inactive_anon:18288944kB active_file:1426728kB inactive_file:12426984kB unevictable:28104kB isolated(anon):0kB isolated(file):0kB mapped:13354144kB dirty:1444964kB writeback:0kB shmem:13078564kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 4063232kB writeback_tmp:0kB kernel_stack:32736kB pagetables:122472kB sec_pagetables:34676kB all_unreclaimable? no
[906739.890262] Node 0 DMA free:11260kB boost:0kB min:0kB low:12kB high:24kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[906739.890274] lowmem_reserve[]: 0 2713 257385 257385 257385
[906739.890289] Node 0 DMA32 free:1022764kB boost:0kB min:712kB low:3488kB high:6264kB reserved_highatomic:32768KB active_anon:678072kB inactive_anon:29120kB active_file:0kB inactive_file:64kB unevictable:0kB writepending:0kB present:2977184kB managed:2910992kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[906739.890301] lowmem_reserve[]: 0 0 254671 254671 254671
[906739.890315] Node 0 Normal free:1574012kB boost:0kB min:66864kB low:327648kB high:588432kB reserved_highatomic:839680KB active_anon:15426264kB inactive_anon:18259824kB active_file:1426728kB inactive_file:12426920kB unevictable:28104kB writepending:1444964kB present:265275392kB managed:260792020kB mlocked:28104kB bounce:0kB free_pcp:744kB local_pcp:0kB free_cma:0kB
[906739.890328] lowmem_reserve[]: 0 0 0 0 0
[906739.890340] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 2*4096kB (M) = 11260kB
[906739.890389] Node 0 DMA32: 2173*4kB (UM) 987*8kB (UM) 586*16kB (UM) 374*32kB (UM) 666*64kB (UM) 451*128kB (UM) 295*256kB (UM) 136*512kB (UM) 60*1024kB (UM) 3*2048kB (M) 164*4096kB (UM) = 1022764kB
[906739.890440] Node 0 Normal: 22973*4kB (UME) 42920*8kB (UME) 30209*16kB (UMEH) 8817*32kB (UMEH) 5762*64kB (UMH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1569508kB
[906739.890481] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[906739.890485] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[906739.890489] 6735316 total pagecache pages
[906739.890492] 0 pages in swap cache
[906739.890495] Free swap  = 0kB
[906739.890497] Total swap = 0kB
[906739.890500] 67067142 pages RAM
[906739.890503] 0 pages HighMem/MovableOnly
[906739.890505] 1137549 pages reserved
[906739.890508] 0 pages hwpoisoned

@dounoit
Copy link

dounoit commented Feb 19, 2024

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used -

interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root -

i tried creating a swapfile and activating it - i get permission denied

this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy

docker info

Client: Docker Engine - Community
Version: 25.0.3
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.12.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.24.5
Path: /usr/libexec/docker/cli-plugins/docker-compose

Server:
Containers: 19
Running: 17
Paused: 0
Stopped: 2
Images: 19
Server Version: 25.0.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: x
Is Manager: true
ClusterID: x
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 185.185.126.69
Manager Addresses:
x.x.x.x:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 5.2.0
Operating System: Ubuntu 22.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 72GiB
Name: xxxxx
ID: xxxx-xxx-xxx-xx-xxxx
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

@dounoit
Copy link

dounoit commented Feb 19, 2024

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used -

interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root -

i tried creating a swapfile and activating it - i get permission denied

this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

docker info

Client: Docker Engine - Community Version: 25.0.3 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.12.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.24.5 Path: /usr/libexec/docker/cli-plugins/docker-compose

Server: Containers: 19 Running: 17 Paused: 0 Stopped: 2 Images: 19 Server Version: 25.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: active NodeID: x Is Manager: true ClusterID: x Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 185.185.126.69 Manager Addresses: x.x.x.x:2377 Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin Kernel Version: 5.2.0 Operating System: Ubuntu 22.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 72GiB Name: xxxxx ID: xxxx-xxx-xxx-xx-xxxx Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

this is interesting - no cpu? haha:

lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 0
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPU family: 6
Model: 62
Thread(s) per core: 0
Core(s) per socket: 0
Socket(s): 0
Stepping: 4
BogoMIPS: 4400.16
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t
m pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmpe
rf eagerfpu cpuid_faulting pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_
1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vn
mi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_
l1d
Virtualization features:
Virtualization: VT-x
Hypervisor vendor: Parallels
Virtualization type: container

@dounoit
Copy link

dounoit commented Feb 19, 2024

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used -
interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root -
i tried creating a swapfile and activating it - i get permission denied
this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

docker info

Client: Docker Engine - Community Version: 25.0.3 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.12.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.24.5 Path: /usr/libexec/docker/cli-plugins/docker-compose
Server: Containers: 19 Running: 17 Paused: 0 Stopped: 2 Images: 19 Server Version: 25.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: active NodeID: x Is Manager: true ClusterID: x Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 185.185.126.69 Manager Addresses: x.x.x.x:2377 Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin Kernel Version: 5.2.0 Operating System: Ubuntu 22.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 72GiB Name: xxxxx ID: xxxx-xxx-xxx-xx-xxxx Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
WARNING: bridge-nf-call-ip6tables is disabled

this is interesting - no cpu? haha:

lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 0 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz CPU family: 6 Model: 62 Thread(s) per core: 0 Core(s) per socket: 0 Socket(s): 0 Stepping: 4 BogoMIPS: 4400.16 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t m pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmpe rf eagerfpu cpuid_faulting pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_ 1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vn mi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_ l1d Virtualization features: Virtualization: VT-x Hypervisor vendor: Parallels Virtualization type: container

and top shows 4cpu:

image

@shankerwangmiao
Copy link

shankerwangmiao commented Feb 20, 2024

Hi, all

I also met this problem. I might possibly identified the cause.

It might be because kernel changed its default behavior, creating queues for each possible cpus when creating veth pairs without explicitly specifying number of rx and tx queues. The original behavior is to create only one queue. A queue requires 768 bytes of memory on one side of a veth pair. As a result servers with larger numbers of cores tend to meet this issue. I've reported the issue to the kernel mailing list.

I wonder if docker can explicitly specify 1 tx and rx queue when creating the veth pair to fix this?

@CoyoteWAN
Copy link

@shankerwangmiao I saw the patch to veth.c based on what you reported. Is there any way you can walk us through applying the patch to module veth?

@shankerwangmiao
Copy link

shankerwangmiao commented Mar 1, 2024

@shankerwangmiao I saw the patch to veth.c based on what you reported. Is there any way you can walk us through applying the patch to module veth?

The patch will be included in linux 6.8 and backported to linux lts versions, so I suggest wait for the release of linux 6.8 and the lts releases, and also wait for the corresponding kernel release by your certain linux distribution.

If you are really affected by this bug, I recommend downgrading your kernel to versions <= 5.14 provided by your linux distribution.

TL;DR: Always sticking to the kernel versions provided by your linux distribution is a wise choice, either wait (I'll update this information when such releases are available) or downgrade.

Updates:

The fix has been included in the following kernel lts versions:

  • 5.15.151
  • 6.1.81
  • 6.6.21
  • 6.7.9
If downgrading is not possible, and this must be fixed... If downgrading is not possible, and this must be fixed, the following procedure can be taken to build a patched `veth.ko`. Please note that using a custom patched kernel module might lead to unexpected consequences and might be DANGEROUS if carried out by an inexperienced person. Always backup and run tests before massive deployment. Take your OWN RISK.
  1. Determine the current kernel version

  2. Download the source of the current kernel, and extract the veth.c from drivers/net/veth.c

    An alternative way to do this is to browse https://elixir.bootlin.com/linux/latest/source/drivers/net/veth.c, select the version on the left panel, and copy the source code on the right side.

  3. Install the development package of the current kernel version, which is provided by the linux distribution and contains header needed to build a kernel module.

    This can be confirmed by ensuring the existence of /lib/modules/$(uname -r)/build

  4. Apply the patch https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/patch/?id=1ce7d306ea63f3e379557c79abd88052e0483813 to the extracted veth.c

  5. Prepare a kbuild file for the building of the module:

    obj-m += veth.o
    
  6. Prepare the build environment:

    • Using a non-root user
    • Create a new empty directory
    • Only put in two files, the patched veth.c and the above kbuild file named Kbuild
  7. Build the patched kernel module:

    • Change current dir to the above directory
    • Execute: make -C "/lib/modules/$(uname -r)/build/" M="$(pwd)" modules in above directory
    • Ensure veth.ko is generated in above directory
  8. Install the patched kernel module:

    • Copy the generated veth.ko to /lib/modules/$(uname -r)/updates: sudo install -Dm 644 veth.ko -t "/lib/modules/$(uname -r)/updates"
    • Regenerate module dependencies: sudo depmod "$(uname -r)"
    • Ensure the original veth module is overridden: sudo modinfo -k "$(uname -r)" veth and inspect the filename: field, which should contain the veth.ko in updates/ directory, rather than the original one in kernel/drivers/net/
  9. Replace the current loaded veth module

    • Stop all docker containers
    • Stop dockerd (including docker.service and docker.socket systemd units) to prevent the creation of new containers during the process
    • Using ip link show type veth to ensure no veth interfaces are present
    • Execute sudo rmmod veth to unload the current loaded original veth module
    • Execute sudo modprobe -v veth to load the built patched veth module. The command should prints the path of the actual loaded veth module. Confirm the loaded module is the patched one
    • Start docker daemon and all containers needed

The change made above will persist across reboots, as long as the next boot kernel is exactly the same as the current running kernel. If the kernel version has been upgraded since this boot, execute the first 8 steps on the version of the kernel which will be boot into next time. Install the development package of that kernel in step 3, remember to create a fresh new directory in step 6, and replace all the $(uname -r) with the exact kernel release version of next boot.

To revert the changes, simply remove the installed veth.ko from the updates/ directory and re-run depmod and follow the 9th step to replace the current loaded veth module.

@attie-argentum
Copy link

Adding nr_cpu=56 in my case has allowed the system to run fine until yesterday... longer perhaps, but certainly not a "fix"

@bendem
Copy link

bendem commented Apr 15, 2024

If you are really affected by this bug, I recommend downgrading your kernel to versions <= 5.14 provided by your linux distribution.

RHEL8 is affected with kernel 4.18.0-513.18.1.el8_9.x86_64. Has someone reported the problem to them already? Guessing they won't care since they don't support docker in the first place, but it probably has impact on other things.

@ExpliuM
Copy link

ExpliuM commented Apr 16, 2024

We also suffer from this issue on RHEL 8.9 with kernel version4.18.0-513.11.1.el8_9.x86_64

@pschoen-itsc
Copy link

Adding nr_cpu=56 in my case has allowed the system to run fine until yesterday... longer perhaps, but certainly not a "fix"

The idea behind to nr_cpu workaround is to reduce to number of cpus the kernel thinks the machine has. This works well with VMs, because one VM normally has way less cores then the host system could provide. If you want to use 56 cores, then this workaround does not work well. For us, we have smaller VMs (4-6 cores) and it works without any problems.

@nblazincic
Copy link

We are facing this issue with docker 26 on ubuntu 22.0.4 LTS
As I can see it, neither the nrcpu or vm.swap fixed this issue.
Is this a confirmed kernel issue or a docker problem ?

@shankerwangmiao
Copy link

We are facing this issue with docker 26 on ubuntu 22.0.4 LTS As I can see it, neither the nrcpu or vm.swap fixed this issue. Is this a confirmed kernel issue or a docker problem ?

Can you look into the kernel startup log, and find the following line:

smpboot: Allowing XX CPUs, X hotplug CPUs

and see how many CPUS are allocated?

@nblazincic
Copy link

@shankerwangmiao Thank you for your quick reply.
kernel: smpboot: Allowing 240 CPUs, 238 hotplug CPUs
Do you think nrcpu or disabling cpu hot add on hypervisor could fix the issue ?
Machines have 2 vCpus assigned

@shankerwangmiao
Copy link

@shankerwangmiao Thank you for your quick reply. kernel: smpboot: Allowing 240 CPUs, 238 hotplug CPUs Do you think nrcpu or disabling cpu hot add on hypervisor could fix the issue ? Machines have 2 vCpus assigned

Yes, either specifying nr_cpus=2 or disabling cpu hot add on the hypervisor side should work this issue around.

Currently, Debian and Ubuntu neither releases kernel package including this path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests