Netlink: Received message overrun (No buffer space available) #392

jslocinski · 2016-08-08T13:26:36Z

Opening new ticket for clearance, mentioned about this issue here #390
How to tune the buffer size? Tried to increase buffers system-wide via sysctl - didn't help.
Message appears when playing with dozen or hundred of instances on different interfaces with or w-out VMAC option enabled. Starts at level of about 15 instances (with vmac setup) or ~200 instances (no-vmac setup) or at. I start seeing that error at the time of becoming MASTER and sending GARPs.
How can I tune it up properly?

PS.
Tested on 1.2.22 and 1.2.23. Used dozen kB of rmem_max/default buffer size up to few or even hundred of Megs.. same result.

Best regards!

pqarmitage · 2016-08-08T14:20:10Z

Could you please post both your with-VMAC and without-VMAC configs (or scripts that generate them), so that we can try to reproduce the problem. If you want to anonimise IP addresses etc, please just replace the first octet with 10; otherwise addresses are replaced with XXXs, we can't see the relationship between them.

It is interesting that you only seem to need a relatively small number of VMACs to generate the problem, compared to the number of non-VMAC interfaces.

I have a number of thoughts about what might be worth trying, but it would be really helpful to see your configs first to make sure that what I suggest makes sense.

jslocinski · 2016-08-08T15:35:42Z

Sure,
simple scripts:

test_keepalived.sh

#!/bin/bash
if [ $# -ne 2 ]; then
  echo "USAGE: $0 (up|down) NBR_INSTANCES"
  exit 1
fi

N=$2

if [ "$1" == "up" ]; then
  cat <<EOF >/run/conf/keepalived.conf
global_defs {
}
EOF
  ip link add bondZ type dummy
  ip link set up dev bondZ
elif [ "$1" == "down" ]; then
  service keepalived stop
  unlink /run/conf/keepalived.conf
fi

for i in $(seq 1 $N); do
  ifi=bondZ.$i
  if [ "$1" == "up" ]; then
    ip link add link bondZ name $ifi type vlan id $i
    ip link set up dev $ifi
    cat /root/keepalived.template.test | sed s"/__ID__/$i/g" >> /run/conf/keepalived.conf
  fi
done

if [ "$1" == "up" ]; then
  sleep 1
  service keepalived start
elif [ "$1" == "down" ]; then
  ip link del bondZ
fi

/root/keepalived.template.test:

vrrp_instance 4__ID__ {
    state BACKUP
    interface bondZ.__ID__
    use_vmac vrrp.__ID__.1
    virtual_router_id 1
    priority 120
    advert_int 2
    preempt_delay 60
    virtual_ipaddress {
        169.254.__ID__.254/24
    }
    mcast_src_ip 169.254.__ID__.254
    unicast_src_ip 169.254.__ID__.254
}

testing:

./test_keepalived.sh up 20
./test_keepalived.sh down 20

comment out "vmac" option in the template, then:

./test_keepalived.sh up 210
./test_keepalived.sh down 210

tail -f /var/log/syslog | grep 'No buffer space'

PS. Testing on dummy to simplify and isolate, but same results on physical bond or ethX.

Regards

pqarmitage · 2016-08-08T17:07:58Z

From looking at your config, the global vrrp_garp_master_delay and vrrp_garp_master_repeat are both set to the default, which is 5. This means that every time an interface comes up, 5 gratuitous ARP messages are sent immediately on that interface, and 5 seconds later another 5 messages are sent. With 200 interfaces, all coming up together, this means that 1000 GARP messages are sent near simultaneously, and a further 1000 GARP messages are sent near simultaneously 5 seconds later.

I would recommend setting vrrp_garp_master_repeat to 1 (so only 1 GARP message at a time is sent when an interface comes up), and set vrrp_garp_master_delay to 0 (so a second message isn't sent 5 seconds later for each interface).

If you are still getting problems, then the GARP messages can be rate limited. Add a garp_group block, as follows:
`garp_group {
garp_interval 0.001
gna_interval 0.001
interfaces {
bondZ.1
bondZ.2
.
.
bondZ.200 # include all your bondZ.n interfaces in the list
}
}
This will rate limit GARP messages to 1 per 0.001ms aggregated across all the bondZ.n interfaces, so to send 1 GARP message for each of 200 interfaces will take 0.2 seconds.

You can adjust the garp_interval setting to see what (if any) value helps resolve your problem.

It would be most helpful to have feedback for what you find.

jslocinski · 2016-08-10T16:30:58Z

From what I can see now, "No buffer space available" error message is just after entering BACKUP STATE and creation of vrrp interfaces (for 250 instances), and then, after a while GARPs are send.
The changes you mentioned about doesn't solve the issue. Tried even with intervals of 1s.. the same.

Seems this is not really related to GARPs, but the number of interfaces and changes done at once.

Regards

pqarmitage · 2016-08-12T10:54:42Z

The "Keepalived_vrrp[]: Netlink: Received message overrun (No buffer space available)" message isn't a kernel buffers problem, it relates to the netlink broadcast messages being sent from the kernel to keepalived for LINK and ADDR events (see kernel_netlink_init() in vrrp_netlink.c), which is the kernel notifying keepalived of links and addresses being added and deleted. keepalived isn't reading the messages on the socket quickly/soon enough, and so the receive buffers on the socket become exhausted, which causes recvmsg() in netlink_parse_info to return ENOBUFS.

If I add the following after the called of netlink_socket(&nl_kernel, ...) in kernel_netlink_init()
size_t rcv_size = 2 * 1024 * 1024;
if (setsockopt(nl_kernel.fd, SOL_SOCKET, SO_RCVBUF, &rcv_size, sizeof rcv_size))
log_message(LOG_INFO, "Set receive socket buffers failed - %d (%m)", errno);
then I stop getting the overrun message with 50 instances using vmacs, but it doesn't stop the error message with 200 non-vmac instances. Nevertheless, that adding the above stops the error with 50 vmac instances shows that this is where the problem lies.

Clearly the above simplistic approach isn't a solution to the problem, but at least we've found the right area. I suspect the solution is to call netlink_parse_info() while the vmacs and addresses are being added, but this needs further investigation.

Many thanks for reporting this and providing the detailed info. I'll continue to work on this to find a solution.

pqarmitage · 2016-08-13T08:13:35Z

The commit in the above pull request resolves the issue by polling for netlink messages after adding each interface, and thereby stop the large queue of messages building up, which was what was causing the message overrun.

I have tested this with 250 vmac interfaces, and also with 250 non-vmac interfaces, and no longer receive the Received message overrun message.

jslocinski · 2016-08-16T15:00:55Z

I couldn't compile the latest git version, but ported all things you merged connected with kernel_netlink_poll() to the latest stable 1.2.23. The result is BETTER :)
I ended with:

200 instances w. vmac: OK
250 instances w. vmac: sometimes I still get an error

But the weirdest part is when I add "notify_master" (almost empty), then:

200 instances w. vmac and calling notify_master script (no matter what the script does) : getting errors for 95% cases

the notify line:
notify_master "/tmp/null.sh"

the script itself (/tmp/null.sh, executable):

#!/bin/bash
echo 1 > /dev/null
exit 0

Dunno if this helps you, but strace near to the "No buffer space available" error:

sendto(4, "<143>Aug 16 16:55:25 Keepalived_"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
open("/tmp/null.sh", O_RDONLY)          = 411
close(411)                              = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f37e99db9d0) = 122184
sendmsg(406, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("224.0.0.18")}, msg_iov(1)=[{"E\300\0(\0\2\0\0\377pi\224\251\376\306\376\340\0\0\22!\1x\1\0\5\365\372\251\376\306\376"..., 40}], msg_controllen=0, msg_flags=0}, MSG_DONTROUTE) = 40
sendto(4, "<142>Aug 16 16:55:25 Keepalived_"..., 87, MSG_NOSIGNAL, NULL, 0) = 87
sendmsg(8, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{" \0\0\0\24\0\5\0\223+\263W\0\0\0\0\2\30\0\0\332\213\0\0\10\0\2\0\251\376\270\376", 32}], msg_controllen=0, msg_flags=0}, 0) = 32
--- SIGCHLD (Child exited) @ 0 (0) ---
write(6, "\21\0\0\0", 4)                = 4
rt_sigreturn(0x6)                       = 32
fcntl(8, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(8, F_SETFL, O_RDWR)               = 0
recvmsg(8, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"$\0\0\0\2\0\0\0\223+\263W\207\210\377\377\0\0\0\0 \0\0\0\24\0\5\0\223+\263W"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 36
fcntl(8, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
sendto(9, "\377\377\377\377\377\377\0\0^\0\1\1\10\6\0\1\10\0\6\4\0\1\0\0^\0\1\1\251\376\270\376"..., 42, 0, {sa_family=AF_PACKET, proto=0000, if35802, pkttype=PACKET_HOST, addr(6)={0, 00005e000101}, 20) = 42
sendto(4, "<143>Aug 16 16:55:25 Keepalived_"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
--- SIGCHLD (Child exited) @ 0 (0) ---
write(6, "\21\0\0\0", 4)                = 4
rt_sigreturn(0x6)                       = 78
open("/tmp/null.sh", O_RDONLY)          = 411
close(411)                              = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f37e99db9d0) = 122186
sendmsg(378, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("224.0.0.18")}, msg_iov(1)=[{"E\300\0(\0\2\0\0\377pw\224\251\376\270\376\340\0\0\22!\1x\1\0\5\3\373\251\376\270\376"..., 40}], msg_controllen=0, msg_flags=0}, MSG_DONTROUTE) = 40
recvmsg(7, 0x7ffe72239380, 0)           = -1 ENOBUFS (No buffer space available)
sendto(4, "<142>Aug 16 16:55:25 Keepalived_"..., 109, MSG_NOSIGNAL, NULL, 0) = 109
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\314*\263W\207\210\377\377\2\30\200\0)\213\0\0\10\0\1\0\251\376\7\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\315*\263W\207\210\377\377\2\30\200\0(\213\0\0\10\0\1\0\251\376\6\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\316*\263W\207\210\377\377\2\30\200\0'\213\0\0\10\0\1\0\251\376\5\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\317*\263W\207\210\377\377\2\30\200\0%\213\0\0\10\0\1\0\251\376\3\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\320*\263W\207\210\377\377\2\30\200\0#\213\0\0\10\0\1\0\251\376\1\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\321*\263W\207\210\377\377\2\30\200\0005\213\0\0\10\0\1\0\251\376\23\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\322*\263W\207\210\377\377\2\30\200\0004\213\0\0\10\0\1\0\251\376\22\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\323*\263W\207\210\377\377\2\30\200\0003\213\0\0\10\0\1\0\251\376\21\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\324*\263W\207\210\377\377\2\30\200\0001\213\0\0\10\0\1\0\251\376\17\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\325*\263W\207\210\377\377\2\30\200\0000\213\0\0\10\0\1\0\251\376\16\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\326*\263W\207\210\377\377\2\30\200\0/\213\0\0\10\0\1\0\251\376\r\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\327*\263W\207\210\377\377\2\30\200\0-\213\0\0\10\0\1\0\251\376\v\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\330*\263W\207\210\377\377\2\30\200\0,\213\0\0\10\0\1\0\251\376\n\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\331*\263W\207\210\377\377\2\30\200\0+\213\0\0\10\0\1\0\251\376\t\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\332*\263W\207\210\377\377\2\30\200\0&\213\0\0\10\0\1\0\251\376\4\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\333*\263W\207\210\377\377\2\30\200\0$\213\0\0\10\0\1\0\251\376\2\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
--- SIGCHLD (Child exited) @ 0 (0) ---
write(6, "\21\0\0\0", 4)                = 4
rt_sigreturn(0x6)                       = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\334*\263W\207\210\377\377\2\30\200\0D\213\0\0\10\0\1\0\251\376\"\376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84
recvmsg(7, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000010}, msg_iov(1)=[{"T\0\0\0\24\0\0\0\335*\263W\207\210\377\377\2\30\200\0B\213\0\0\10\0\1\0\251\376 \376"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 84

Don't know how I can test that more. Simplified all cases (at the beginning I had the error with notify scripts calling sysctl, but that's not the case as you can see above).

Regards!

pqarmitage · 2016-08-16T17:20:18Z

Do you have automake installed? If so, running automake --add-missing may resolve your build problem with the latest git version. If that doesn't resolve, could you post details of the error here and I'll have a look.

I'll have a further look at the "No buffer space available" problem later.

pqarmitage · 2016-08-17T09:23:17Z

Pull request #403 solves the particular cause of the netlink buffer overrun you were experiencing when transitioning to master. The reason for the problem was that adding a notify_master script means that as each vrrp instance transitions to master keepalived executes an open(), a close() and a fork() system call, the first and last of which are slow system calls, so the complexity of the script isn't the issue since the script itself is run asynchronously. This meant that netlink RTM_NEWADDR messages were being buffered, and eventually the buffer overran.

I think there is the possibility of further such problems, although I haven't tested them yet. Perhaps you can. One possibility is with a notify_fault script if all vrrp instances simultaneously go to fault state, e.g. by executing ip link set down bondZ. Adding a notify script as well (so that two scripts are executed on a state transition) can only make the situation worse. Another possibility to cause the problem might be to have a very large number of VIPs/eVIPs on a vrrp instance, since netlink messages are generated every time an address is added or deleted.

Could you please reopen this issue report, since there are some scenarios that at the very least still need further testing.

jslocinski · 2016-08-17T12:16:17Z

I think I can't reopen the case, @acassen would U?

pqarmitage · 2018-03-19T08:41:18Z

Commit 6255538 adds configuration options

    vrrp_netlink_cmd_rcv_bufs
    vrrp_netlink_cmd_rcv_bufs_force
    vrrp_netlink_monitor_rcv_bufs
    vrrp_netlink_monitor_rcv_bufs_force
    lvs_netlink_cmd_rcv_bufs
    lvs_netlink_cmd_rcv_bufs_force
    lvs_netlink_monitor_rcv_bufs
    lvs_netlink_monitor_rcv_bufs_force

to allow the netlink receive buffers to be increased in size in order to stop the ENOBUFS error.

pqarmitage · 2018-04-08T14:27:16Z

Issue #803 indicates that this issue has been resolved.

Closing this issue, and if there are any further occurrences please log them against issue #803.

pqarmitage mentioned this issue Aug 13, 2016

Stop "Netlink: Received message overrun (No buffer space available)" messages #401

Merged

acassen closed this as completed Aug 14, 2016

pqarmitage mentioned this issue Aug 17, 2016

Add polling of netlink messages when entering master state #403

Merged

acassen reopened this Aug 17, 2016

pqarmitage closed this as completed Apr 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Netlink: Received message overrun (No buffer space available) #392

Netlink: Received message overrun (No buffer space available) #392

jslocinski commented Aug 8, 2016 •

edited

Loading

pqarmitage commented Aug 8, 2016

jslocinski commented Aug 8, 2016 •

edited

Loading

pqarmitage commented Aug 8, 2016

jslocinski commented Aug 10, 2016 •

edited

Loading

pqarmitage commented Aug 12, 2016

pqarmitage commented Aug 13, 2016

jslocinski commented Aug 16, 2016 •

edited

Loading

pqarmitage commented Aug 16, 2016

pqarmitage commented Aug 17, 2016

jslocinski commented Aug 17, 2016 •

edited

Loading

pqarmitage commented Mar 19, 2018

pqarmitage commented Apr 8, 2018

Netlink: Received message overrun (No buffer space available) #392

Netlink: Received message overrun (No buffer space available) #392

Comments

jslocinski commented Aug 8, 2016 • edited Loading

pqarmitage commented Aug 8, 2016

jslocinski commented Aug 8, 2016 • edited Loading

pqarmitage commented Aug 8, 2016

jslocinski commented Aug 10, 2016 • edited Loading

pqarmitage commented Aug 12, 2016

pqarmitage commented Aug 13, 2016

jslocinski commented Aug 16, 2016 • edited Loading

pqarmitage commented Aug 16, 2016

pqarmitage commented Aug 17, 2016

jslocinski commented Aug 17, 2016 • edited Loading

pqarmitage commented Mar 19, 2018

pqarmitage commented Apr 8, 2018

jslocinski commented Aug 8, 2016 •

edited

Loading

jslocinski commented Aug 8, 2016 •

edited

Loading

jslocinski commented Aug 10, 2016 •

edited

Loading

jslocinski commented Aug 16, 2016 •

edited

Loading

jslocinski commented Aug 17, 2016 •

edited

Loading