periodic master/backup flaps or spontaneous failovers #2220

aborrero · 2022-10-28T16:44:25Z

Describe the bug
We're experimenting weird periodic master/backup flaps or spontaneous fail overs.

We have 2 sets of servers in 2 different datacenters using the exact same configuration (via puppet, only address/nics differs) showing the exact same behavior.
The only particular bit this setup have is that the unicast_peer route uses a linux VRF (l3mdev) and the interface that keepalived uses (in interface) is also part of the VRF.

The network is not down. The servers are mostly idle. There are no packet loss. We've tested sending 1M icmp with 0% packet loss.
I can be convinced this is not a bug in keepalived but that we have something in our network triggering from time to time. If so, I don't know what or

To Reproduce
Start 2 daemons with the attached config.

Expected behavior
No flaps.

Keepalived version

Keepalived v2.2.7 (01/16,2022)

Copyright(C) 2001-2022 Alexandre Cassen, <acassen@gmail.com>

Built with kernel headers for Linux 5.10.84
Running on Linux 5.10.0-15-amd64 #1 SMP Debian 5.10.120-1 (2022-06-09)
Distro: Debian GNU/Linux 11 (bullseye)

configure options: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --enable-snmp --enable-sha1 --enable-snmp-rfcv2 --enable-snmp-rfcv3 --enable-dbus --enable-json --enable-bfd --enable-regex --with-init=systemd build_alias=x86_64-linux-gnu CFLAGS=-g -O2 -ffile-prefix-map=/build/keepalived-Ja6yBT/keepalived-2.2.7=. -fstack-protector-strong -Wformat -Werror=format-security LDFLAGS=-Wl,-z,relro CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2

Config options:  NFTABLES LVS REGEX VRRP VRRP_AUTH VRRP_VMAC JSON BFD OLD_CHKSUM_COMPAT SNMP_V3_FOR_V2 SNMP_VRRP SNMP_CHECKER SNMP_RFCV2 SNMP_RFCV3 DBUS INIT=systemd SYSTEMD_NOTIFY

System options:  VSYSLOG MEMFD_CREATE IPV6_MULTICAST_ALL IPV4_DEVCONF LIBNL3 RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPVS_NETLINK IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK

Distro (please complete the following information):

Name: Debian
Version: 11
Architecture: amd64
Linux: 5.10.120-1

Details of any containerisation or hosted service (e.g. AWS)
None.

Configuration file:
server A configuration:

global_defs {
}

vrrp_instance VRRP1 {
  state BACKUP
  interface eno2.2120
  virtual_router_id 52
  nopreempt
  priority 47
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass dummy
  }
  track_interface {
    eno2.2107
  }
  virtual_routes {
    185.15.57.0/29 table 10 nexthop via 185.15.57.10 dev eno2.2107 onlink
    185.15.57.16/29 table 10 nexthop via 185.15.57.10 dev eno2.2107 onlink
    172.16.128.0/24 table 10 nexthop via 185.15.57.10 dev eno2.2107 onlink
  }
  virtual_ipaddress {
    185.15.57.9/30 dev eno2.2107
    208.80.153.190/29 dev eno2.2120
  }
  unicast_peer {
    208.80.153.188
  }
}

Server B configuration:

global_defs {
}

vrrp_instance VRRP1 {
  state BACKUP
  interface eno1.2120
  virtual_router_id 52
  nopreempt
  priority 55
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass dummy
  }
  track_interface {
    eno1.2107
  }
  virtual_routes {
    185.15.57.0/29 table 10 nexthop via 185.15.57.10 dev eno1.2107 onlink
    185.15.57.16/29 table 10 nexthop via 185.15.57.10 dev eno1.2107 onlink
    172.16.128.0/24 table 10 nexthop via 185.15.57.10 dev eno1.2107 onlink
  }
  virtual_ipaddress {
    185.15.57.9/30 dev eno1.2107
    208.80.153.190/29 dev eno1.2120
  }
  unicast_peer {
    208.80.153.189
  }
}

Notify and track scripts
none.

System Log entries
Server A:

Oct 27 17:55:40 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Entering MASTER STATE
Oct 27 17:55:40 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Master received advert from 208.80.153.188 with higher priority 55, ours 47
Oct 27 17:55:40 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Entering BACKUP STATE
Oct 28 04:23:11 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Entering MASTER STATE
Oct 28 04:23:11 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Master received advert from 208.80.153.188 with higher priority 55, ours 47
Oct 28 04:23:11 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Entering BACKUP STATE
Oct 28 11:44:58 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Entering MASTER STATE
Oct 28 11:44:58 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Master received advert from 208.80.153.188 with higher priority 55, ours 47
Oct 28 11:44:58 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Entering BACKUP STATE

Server B:

Oct 27 17:55:40 cloudgw2003-dev Keepalived_vrrp[13732]: (VRRP1) Received advert from 208.80.153.189 with lower priority 47, ours 55, forcing new election
Oct 28 04:23:11 cloudgw2003-dev Keepalived_vrrp[13732]: (VRRP1) Received advert from 208.80.153.189 with lower priority 47, ours 55, forcing new election
Oct 28 11:44:58 cloudgw2003-dev Keepalived_vrrp[13732]: (VRRP1) Received advert from 208.80.153.189 with lower priority 47, ours 55, forcing new election

Did keepalived coredump?
No

Additional context
Server A network config:

aborrero@cloudgw2002-dev:~$ ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
eno1             UP             10.192.20.18/24 2620:0:860:118:10:192:20:18/64 fe80::2eea:7fff:fe7b:e104/64 
eno2             UP             fe80::2eea:7fff:fe7b:e105/64 
vrf-cloudgw      UP             
eno2.2107@eno2   UP             fe80::2eea:7fff:fe7b:e105/64 
eno2.2120@eno2   UP             208.80.153.189/29 fe80::2eea:7fff:fe7b:e105/64 

aborrero@cloudgw2002-dev:~ $ ip r
default via 10.192.20.1 dev eno1 onlink 
10.192.20.0/24 dev eno1 proto kernel scope link src 10.192.20.18

aborrero@cloudgw2002-dev:~ $ ip route list vrf vrf-cloudgw
default via 208.80.153.185 dev eno2.2120 onlink 
208.80.153.184/29 dev eno2.2120 proto kernel scope link src 208.80.153.189

Server B network config:

aborrero@cloudgw2003-dev:~ $ ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
enp175s0f0np0    DOWN           
enp175s0f1np1    DOWN           
eno1             UP             10.192.20.7/24 2620:0:860:118:10:192:20:7/64 fe80::d28e:79ff:fef5:8644/64 
eno2             DOWN           
vrf-cloudgw      UP             
eno1.2107@eno1   UP             185.15.57.9/30 fe80::d28e:79ff:fef5:8644/64 
eno1.2120@eno1   UP             208.80.153.188/29 208.80.153.190/29 fe80::d28e:79ff:fef5:8644/64 

aborrero@cloudgw2003-dev:~ $ ip r
default via 10.192.20.1 dev eno1 onlink 
10.192.20.0/24 dev eno1 proto kernel scope link src 10.192.20.7 

aborrero@cloudgw2003-dev:~ $ ip route list vrf vrf-cloudgw
default via 208.80.153.185 dev eno1.2120 onlink 
172.16.128.0/24 via 185.15.57.10 dev eno1.2107 proto keepalived onlink 
185.15.57.0/29 via 185.15.57.10 dev eno1.2107 proto keepalived onlink 
185.15.57.8/30 dev eno1.2107 proto kernel scope link src 185.15.57.9 
185.15.57.16/29 via 185.15.57.10 dev eno1.2107 proto keepalived onlink 
208.80.153.184/29 dev eno1.2120 proto kernel scope link src 208.80.153.188

The text was updated successfully, but these errors were encountered:

aborrero · 2022-10-31T13:00:26Z

I left a packet capture running, and here you can see the VRRP packet flow and related keepalived log entries.

capture on cloudgw2002-dev (208.80.153.189):

01:40:36.292219 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:37.292348 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:38.292464 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:39.292581 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:40.292686 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:41.292814 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:42.292936 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:43.293059 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:44.293192 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:45.293319 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:46.109987 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:47.110143 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:48.110257 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:49.110386 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:50.110491 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:51.110527 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:52.110644 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:53.110755 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:54.110882 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:55.110946 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24
01:40:56.111083 IP 208.80.153.188 > 208.80.153.189: VRRPv2, Advertisement, vrid 52, prio 55, authtype simple, intvl 1s, length 24

logs on cloudgw2002-dev:

Oct 31 01:40:46 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Entering MASTER STATE
Oct 31 01:40:46 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Master received advert from 208.80.153.188 with higher priority 55, ours 47
Oct 31 01:40:46 cloudgw2002-dev Keepalived_vrrp[146066]: (VRRP1) Entering BACKUP STATE

capture on cloudgw2003-dev (208.80.153.188):

01:40:46.109670 IP 208.80.153.189 > 208.80.153.188: VRRPv2, Advertisement, vrid 52, prio 47, authtype simple, intvl 1s, length 24

logs on cloudgw2003-dev:

Oct 31 01:40:46 cloudgw2003-dev Keepalived_vrrp[13732]: (VRRP1) Received advert from 208.80.153.189 with lower priority 47, ours 55, forcing new election

I've checked system logs, our internal datacenter logs and even other system logs. There were no relevant operations during this event. I'd be happy to check anything else you may suggest.

pqarmitage · 2022-10-31T14:31:49Z

Do you have the packet captures that include the adverts being sent by the system as well as the adverts being received?

aborrero · 2022-10-31T15:44:58Z

I believe I found the problem in a firewall misconfiguration in our side.

The firewall is stateful, but we didn't have an explicit rule accepting the VRRP traffic from the other peer.

When a node sends its own VRRP advert packet, it creates a local conntrack entry that the advert by the remote peer could use. That conntrack would eventually expire (no original direction traffic for too long), blocking any more incoming adverts. Then, the advert timeout would kick in, triggering the local keepalived to send its own VRRP advert again, opening the conntrack hole again, and starting the loop again.

This scenario explains also why this wasn't always the case despite the setup being the same for years: it depends on which node is master and which node sends the advert first. And bonus point: this may partially explain the issues I've experienced in the past with #2032

I just made the changes to fix the problem. I'll wait a few days then come back to confirm that this had nothing to do with keepalived.

pqarmitage · 2022-10-31T16:25:35Z

And issue #2209, which is referenced from issue #2032 identifies a firewall issue.

I really must remember to think about firewalls when these types of issue are raised.

Many thanks for sharing the resolution to the problem.

aborrero · 2023-01-26T17:09:26Z

Months later: the setup is extremely stable. Clearly the firewall rule was missing.

aborrero closed this as completed Oct 31, 2022

arudyavsky mentioned this issue Mar 20, 2023

The nopreempt option is ignored #2257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

periodic master/backup flaps or spontaneous failovers #2220

periodic master/backup flaps or spontaneous failovers #2220

aborrero commented Oct 28, 2022

aborrero commented Oct 31, 2022

pqarmitage commented Oct 31, 2022

aborrero commented Oct 31, 2022

pqarmitage commented Oct 31, 2022

aborrero commented Jan 26, 2023

periodic master/backup flaps or spontaneous failovers #2220

periodic master/backup flaps or spontaneous failovers #2220

Comments

aborrero commented Oct 28, 2022

aborrero commented Oct 31, 2022

pqarmitage commented Oct 31, 2022

aborrero commented Oct 31, 2022

pqarmitage commented Oct 31, 2022

aborrero commented Jan 26, 2023