Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRRP crashes on reload when using virtual_routes #81

Closed
toreanderson opened this issue Apr 30, 2014 · 1 comment
Closed

VRRP crashes on reload when using virtual_routes #81

toreanderson opened this issue Apr 30, 2014 · 1 comment

Comments

@toreanderson
Copy link
Contributor

If you have the following keepalived.conf:

vrrp_instance eth0 {
    interface eth0
    virtual_router_id 10
    virtual_ipaddress {
       192.168.1.1/30
    }
    virtual_routes {
        192.168.2.0/24 via 192.168.1.2 dev eth0
    }
}

...and then reload keepalived with SIGHUP, the VRRP child process crashes with the following log messages showing up. (This test server was already in the MASTER state and there were no other VRRP speakers on eth0.)

Keepalived_healthcheckers[1740]: Initializing ipvs 2.6
Keepalived_healthcheckers[1740]: IPVS: Can't initialize ipvs: Protocol not available
Keepalived_healthcheckers[1740]: Registering Kernel netlink reflector
Keepalived_healthcheckers[1740]: Registering Kernel netlink command channel
Keepalived_healthcheckers[1740]: Opening file '/etc/keepalived/keepalived.conf'.
Keepalived_healthcheckers[1740]: Configuration is using : 3157 Bytes
Keepalived_healthcheckers[1740]: Using LinkWatch kernel netlink reflector...
Keepalived_vrrp[1741]: Registering Kernel netlink reflector
Keepalived_vrrp[1741]: Registering Kernel netlink command channel
Keepalived_vrrp[1741]: Registering gratuitous ARP shared channel
Keepalived_vrrp[1741]: Initializing ipvs 2.6
Keepalived_vrrp[1741]: IPVS: Can't initialize ipvs: Protocol not available
Keepalived_vrrp[1741]: Opening file '/etc/keepalived/keepalived.conf'.
kernel: [ 2650.704536] keepalived[1741]: segfault at 0 ip 0000000000410972 sp 00007fff34790dc8 error 4 in keepalived[400000+36000]
Keepalived[1738]: VRRP child process(1741) died: Respawning
Keepalived[1738]: Starting VRRP child process, pid=1787
Keepalived_vrrp[1787]: Registering Kernel netlink reflector
Keepalived_vrrp[1787]: Registering Kernel netlink command channel
Keepalived_vrrp[1787]: Registering gratuitous ARP shared channel
Keepalived_vrrp[1787]: Initializing ipvs 2.6
Keepalived_vrrp[1787]: IPVS: Can't initialize ipvs: Protocol not available
Keepalived_vrrp[1787]: Opening file '/etc/keepalived/keepalived.conf'.
Keepalived_vrrp[1787]: Configuration is using : 60250 Bytes
Keepalived_vrrp[1787]: Using LinkWatch kernel netlink reflector...
Keepalived_vrrp[1787]: VRRP_Instance(eth0) Entering BACKUP STATE
Keepalived_vrrp[1787]: VRRP_Instance(eth0) Transition to MASTER STATE
Keepalived_vrrp[1787]: VRRP_Instance(eth0) Entering MASTER STATE

Note the temporary transition to BACKUP state, which lasts for about five seconds. In this period, the address 192.168.1.1/30 and route to 192.168.2.0/24 are not removed from eth0 - it would appear that the crash makes keepalived forget that it had added them in the first place. This makes this a bug service-impacting one for us; those five seconds are sufficient for another VRRP speaker on the link to transition to MASTER state, while the keepalived process that was reloaded and crashed remains in the BACKUP state, but all the addresses and routes linger - creating an undesired active/active "split-brain" situation, leading to ARP flip-flopping, asymmetric routing, and so on.

git bisect identifies the bug as having been introduced by commit 494bd96:

494bd96adcc6982b8de387ebad1308c01ed097ab is the first bad commit
commit 494bd96adcc6982b8de387ebad1308c01ed097ab
Author: Alexandre Cassen <acassen@gmail.com>
Date:   Tue Sep 3 14:38:54 2013 +0200

IPv6 support for virtual_routes and static_routes

gdb says (note this is from a different build than the syslog messages above):

Program received signal SIGSEGV, Segmentation fault.
0x000000000041610a in route_exist (l=0x182f110, iproute=0x182f390) at vrrp_iproute.c:285
285                     if (ROUTE_ISEQ(ipr, iproute)) {
(gdb) bt full
#0  0x000000000041610a in route_exist (l=0x182f110, iproute=0x182f390) at vrrp_iproute.c:285
        ipr = 0x182e460
        e = 0x182e530
#1  0x00000000004163fc in clear_diff_routes (l=0x182f360, n=0x182f110) at vrrp_iproute.c:314
        iproute = 0x182f390
        tmp_str = 0x182e220 "\002"
        e = 0x182f460
#2  0x000000000041d39a in clear_diff_vrrp_vroutes (old_vrrp=0x182e7f0) at vrrp.c:1319
        vrrp = 0x182e220
#3  0x000000000041d69b in clear_diff_vrrp () at vrrp.c:1395
        new_vrrp = 0x182e220
        e = 0x182dba0
        l = 0x18207b0
        vrrp = 0x182e7f0
#4  0x00000000004183a4 in start_vrrp () at vrrp_daemon.c:137
No locals.
#5  0x00000000004185c8 in reload_vrrp_thread (thread=0x7fff5c58bcc0) at vrrp_daemon.c:232
No locals.
#6  0x00000000004271f4 in thread_call (thread=0x7fff5c58bcc0) at scheduler.c:755
No locals.
#7  0x0000000000427225 in launch_scheduler () at scheduler.c:778
        thread = {id = 52, type = 3 '\003', next = 0x0, prev = 0x0, master = 0x181d010, func = 0x418515 <reload_vrrp_thread>,
          arg = 0x0, sands = {tv_sec = 0, tv_usec = 0}, u = {val = 0, fd = 0, c = {pid = 0, status = 0}}}
#8  0x0000000000418817 in start_vrrp_child () at vrrp_daemon.c:335
        pid = 0
        ret = 0
#9  0x0000000000403c59 in start_keepalived () at main.c:85
No locals.
#10 0x0000000000404348 in main (argc=1, argv=0x7fff5c58be48) at main.c:303
No locals.

This is on Ubuntu 12.04.4 LTS, x86_64, with kernel 3.2.0-59-generic.

@acassen
Copy link
Owner

acassen commented May 12, 2014

Hi Tore,

Sorry for long delay. Just fix this issue in current master branch. Was due to bad IP_ISEQ() macro.

Thanks for reporting and your time spent debugging.

Regs,
Alexandre

@acassen acassen closed this as completed May 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants