You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
vrrp_instance eth0 {
interface eth0
virtual_router_id 10
virtual_ipaddress {
192.168.1.1/30
}
virtual_routes {
192.168.2.0/24 via 192.168.1.2 dev eth0
}
}
...and then reload keepalived with SIGHUP, the VRRP child process crashes with the following log messages showing up. (This test server was already in the MASTER state and there were no other VRRP speakers on eth0.)
Keepalived_healthcheckers[1740]: Initializing ipvs 2.6
Keepalived_healthcheckers[1740]: IPVS: Can't initialize ipvs: Protocol not available
Keepalived_healthcheckers[1740]: Registering Kernel netlink reflector
Keepalived_healthcheckers[1740]: Registering Kernel netlink command channel
Keepalived_healthcheckers[1740]: Opening file '/etc/keepalived/keepalived.conf'.
Keepalived_healthcheckers[1740]: Configuration is using : 3157 Bytes
Keepalived_healthcheckers[1740]: Using LinkWatch kernel netlink reflector...
Keepalived_vrrp[1741]: Registering Kernel netlink reflector
Keepalived_vrrp[1741]: Registering Kernel netlink command channel
Keepalived_vrrp[1741]: Registering gratuitous ARP shared channel
Keepalived_vrrp[1741]: Initializing ipvs 2.6
Keepalived_vrrp[1741]: IPVS: Can't initialize ipvs: Protocol not available
Keepalived_vrrp[1741]: Opening file '/etc/keepalived/keepalived.conf'.
kernel: [ 2650.704536] keepalived[1741]: segfault at 0 ip 0000000000410972 sp 00007fff34790dc8 error 4 in keepalived[400000+36000]
Keepalived[1738]: VRRP child process(1741) died: Respawning
Keepalived[1738]: Starting VRRP child process, pid=1787
Keepalived_vrrp[1787]: Registering Kernel netlink reflector
Keepalived_vrrp[1787]: Registering Kernel netlink command channel
Keepalived_vrrp[1787]: Registering gratuitous ARP shared channel
Keepalived_vrrp[1787]: Initializing ipvs 2.6
Keepalived_vrrp[1787]: IPVS: Can't initialize ipvs: Protocol not available
Keepalived_vrrp[1787]: Opening file '/etc/keepalived/keepalived.conf'.
Keepalived_vrrp[1787]: Configuration is using : 60250 Bytes
Keepalived_vrrp[1787]: Using LinkWatch kernel netlink reflector...
Keepalived_vrrp[1787]: VRRP_Instance(eth0) Entering BACKUP STATE
Keepalived_vrrp[1787]: VRRP_Instance(eth0) Transition to MASTER STATE
Keepalived_vrrp[1787]: VRRP_Instance(eth0) Entering MASTER STATE
Note the temporary transition to BACKUP state, which lasts for about five seconds. In this period, the address 192.168.1.1/30 and route to 192.168.2.0/24 are not removed from eth0 - it would appear that the crash makes keepalived forget that it had added them in the first place. This makes this a bug service-impacting one for us; those five seconds are sufficient for another VRRP speaker on the link to transition to MASTER state, while the keepalived process that was reloaded and crashed remains in the BACKUP state, but all the addresses and routes linger - creating an undesired active/active "split-brain" situation, leading to ARP flip-flopping, asymmetric routing, and so on.
git bisect identifies the bug as having been introduced by commit 494bd96:
494bd96adcc6982b8de387ebad1308c01ed097ab is the first bad commit
commit 494bd96adcc6982b8de387ebad1308c01ed097ab
Author: Alexandre Cassen <acassen@gmail.com>
Date: Tue Sep 3 14:38:54 2013 +0200
IPv6 support for virtual_routes and static_routes
gdb says (note this is from a different build than the syslog messages above):
Program received signal SIGSEGV, Segmentation fault.
0x000000000041610a in route_exist (l=0x182f110, iproute=0x182f390) at vrrp_iproute.c:285
285 if (ROUTE_ISEQ(ipr, iproute)) {
(gdb) bt full
#0 0x000000000041610a in route_exist (l=0x182f110, iproute=0x182f390) at vrrp_iproute.c:285
ipr = 0x182e460
e = 0x182e530
#1 0x00000000004163fc in clear_diff_routes (l=0x182f360, n=0x182f110) at vrrp_iproute.c:314
iproute = 0x182f390
tmp_str = 0x182e220 "\002"
e = 0x182f460
#2 0x000000000041d39a in clear_diff_vrrp_vroutes (old_vrrp=0x182e7f0) at vrrp.c:1319
vrrp = 0x182e220
#3 0x000000000041d69b in clear_diff_vrrp () at vrrp.c:1395
new_vrrp = 0x182e220
e = 0x182dba0
l = 0x18207b0
vrrp = 0x182e7f0
#4 0x00000000004183a4 in start_vrrp () at vrrp_daemon.c:137
No locals.
#5 0x00000000004185c8 in reload_vrrp_thread (thread=0x7fff5c58bcc0) at vrrp_daemon.c:232
No locals.
#6 0x00000000004271f4 in thread_call (thread=0x7fff5c58bcc0) at scheduler.c:755
No locals.
#7 0x0000000000427225 in launch_scheduler () at scheduler.c:778
thread = {id = 52, type = 3 '\003', next = 0x0, prev = 0x0, master = 0x181d010, func = 0x418515 <reload_vrrp_thread>,
arg = 0x0, sands = {tv_sec = 0, tv_usec = 0}, u = {val = 0, fd = 0, c = {pid = 0, status = 0}}}
#8 0x0000000000418817 in start_vrrp_child () at vrrp_daemon.c:335
pid = 0
ret = 0
#9 0x0000000000403c59 in start_keepalived () at main.c:85
No locals.
#10 0x0000000000404348 in main (argc=1, argv=0x7fff5c58be48) at main.c:303
No locals.
This is on Ubuntu 12.04.4 LTS, x86_64, with kernel 3.2.0-59-generic.
The text was updated successfully, but these errors were encountered:
If you have the following keepalived.conf:
...and then reload keepalived with SIGHUP, the VRRP child process crashes with the following log messages showing up. (This test server was already in the MASTER state and there were no other VRRP speakers on eth0.)
Note the temporary transition to BACKUP state, which lasts for about five seconds. In this period, the address 192.168.1.1/30 and route to 192.168.2.0/24 are not removed from eth0 - it would appear that the crash makes keepalived forget that it had added them in the first place. This makes this a bug service-impacting one for us; those five seconds are sufficient for another VRRP speaker on the link to transition to MASTER state, while the keepalived process that was reloaded and crashed remains in the BACKUP state, but all the addresses and routes linger - creating an undesired active/active "split-brain" situation, leading to ARP flip-flopping, asymmetric routing, and so on.
git bisect identifies the bug as having been introduced by commit 494bd96:
gdb says (note this is from a different build than the syslog messages above):
This is on Ubuntu 12.04.4 LTS, x86_64, with kernel 3.2.0-59-generic.
The text was updated successfully, but these errors were encountered: