Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For Almost 23 sec, contention in VRRP State for a particular VIP instance and its reflecting VRRP_MASTER #1810

Closed
rajivginotra opened this issue Dec 11, 2020 · 8 comments
Labels
Awaiting feedback Awaiting feedback from the originator of the issue

Comments

@rajivginotra
Copy link

Describe the bug
A clear and concise description of what the bug is.
We have 3 nodes in the cluster and for this VRRP instance vip_14.1.1.234, we see that for more than 23 sec the quorum was not there and all the 3 nodes are reporting VRRP master for the above VRRP instance.

Once the contention is resolved then 192.168.101.3 becomes the VRRP master but there is no change in the notification so the application depends on VRRP MASTER state goes for a toss.

192.168.101.1 Node

root@maglev-master-192-168-101-2:~# cat /tmp/kp.log | grep Entering | grep 14.1.1.234
Thu Dec 10 23:49:25 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) Entering MASTER STATE. ==========> Next VRRP state change after 23 Sec and all the 3 nodes declared as VRRP master
Thu Dec 10 23:49:53 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:56 2020: (vip_14.1.1.234) Entering FAULT STATE
Thu Dec 10 23:49:57 2020: (vip_14.1.1.234) Entering BACKUP STATE

192.168.101.2 Node

root@maglev-master-192-168-101-2:~# cat /tmp/kp.log | grep Entering | grep 14.1.1.234
Thu Dec 10 23:49:25 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) Entering MASTER STATE ==========> Next VRRP state change after 23 Sec and all the 3 nodes declared as VRRP master
Thu Dec 10 23:49:53 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:56 2020: (vip_14.1.1.234) Entering FAULT STATE
Thu Dec 10 23:49:57 2020: (vip_14.1.1.234) Entering BACKUP STATE

192.168.101.3 Node

cat /tmp/kp.log | grep Entering | grep 14.1.1.234
Thu Dec 10 23:49:25 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:29 2020: (vip_14.1.1.234) Entering MASTER STATE ==========> Next VRRP state change after 23 Sec and all the 3 nodes declared as VRRP master
To Reproduce
Any steps necessary to reproduce the behaviour:

Expected behavior
A clear and concise description of what you expected to happen.

Keepalived version
Output of keepalived -v
Keepalived v2.0.20 (01/22,2020)

Distro (please complete the following information):
Name [e.g. Fedora, Ubuntu] Ubuntu
Version [e.g. 29] 16.04.1-Ubuntu
Architecture [e.g. x86_64] x86_64
Linux 4.15.0-74-generic #83~16.04.1-Ubuntu SMP Wed Dec 18 04:56:23 UTC 2019 (built for Linux 4.4.211)

Details of any containerisation or hosted service (e.g. AWS)
If keepalived is being run in a container or on a hosted service, provide full details

Configuration file:
A full copy of the configuration file, obfuscated if necessary to protect passwords and IP addresses
192-168-101-1:/# cat /etc/keepalived/keepalived.conf

global_defs {
vrrp_version 3
vrrp_iptables MAGLEV-KEEPALIVED-VIP
enable_script_security
script_user keepalived_script
vrrp_garp_master_delay 40
vrrp_garp_master_refresh 60
}

vrrp_script node_health_check {
script "/node_health_check.py"
interval 60 # check every 60 seconds
timeout 40 # Script Timeout of 40 seconds
fall 3 # require 3 failures for FAULT Transition
}

vrrp_instance vip_10.199.193.234 {
state BACKUP
interface management
virtual_router_id 119
nopreempt
advert_int 1

track_interface {
management
}

virtual_ipaddress {
10.199.193.234 dev management scope global
}

unicast_src_ip 10.199.193.231
unicast_peer {
10.199.193.233
10.199.193.232
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

vrrp_instance vip_14.1.1.234 {
state BACKUP
interface enterprise
virtual_router_id 44
nopreempt
advert_int 1

track_interface {
enterprise
}

virtual_ipaddress {
14.1.1.234 dev enterprise scope global
}

unicast_src_ip 14.1.1.231
unicast_peer {
14.1.1.233
14.1.1.232
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

vrrp_instance vip_192.168.101.4 {
state BACKUP
interface cluster
virtual_router_id 41
nopreempt
advert_int 1

track_interface {
cluster
}

virtual_ipaddress {
192.168.101.4 dev cluster scope global
}

unicast_src_ip 192.168.101.1
unicast_peer {
192.168.101.3
192.168.101.2
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

192-168-101-2:/# cat /etc/keepalived/keepalived.conf

global_defs {
vrrp_version 3
vrrp_iptables MAGLEV-KEEPALIVED-VIP
enable_script_security
script_user keepalived_script
vrrp_garp_master_delay 40
vrrp_garp_master_refresh 60
}

vrrp_script node_health_check {
script "/node_health_check.py"
interval 60 # check every 60 seconds
timeout 40 # Script Timeout of 40 seconds
fall 3 # require 3 failures for FAULT Transition
}

vrrp_instance vip_10.199.193.234 {
state BACKUP
interface management
virtual_router_id 119
nopreempt
advert_int 1

track_interface {
management
}

virtual_ipaddress {
10.199.193.234 dev management scope global
}

unicast_src_ip 10.199.193.232
unicast_peer {
10.199.193.231
10.199.193.233
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

vrrp_instance vip_14.1.1.234 {
state BACKUP
interface enterprise
virtual_router_id 44
nopreempt
advert_int 1

track_interface {
enterprise
}

virtual_ipaddress {
14.1.1.234 dev enterprise scope global
}

unicast_src_ip 14.1.1.232
unicast_peer {
14.1.1.231
14.1.1.233
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

vrrp_instance vip_192.168.101.4 {
state BACKUP
interface cluster
virtual_router_id 41
nopreempt
advert_int 1

track_interface {
cluster
}

virtual_ipaddress {
192.168.101.4 dev cluster scope global
}

unicast_src_ip 192.168.101.2
unicast_peer {
192.168.101.1
192.168.101.3
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

192-168-101-3:/# cat /etc/keepalived/keepalived.conf

global_defs {
vrrp_version 3
vrrp_iptables MAGLEV-KEEPALIVED-VIP
enable_script_security
script_user keepalived_script
vrrp_garp_master_delay 40
vrrp_garp_master_refresh 60
}

vrrp_script node_health_check {
script "/node_health_check.py"
interval 60 # check every 60 seconds
timeout 40 # Script Timeout of 40 seconds
fall 3 # require 3 failures for FAULT Transition
}

vrrp_instance vip_10.199.193.234 {
state BACKUP
interface management
virtual_router_id 119
nopreempt
advert_int 1

track_interface {
management
}

virtual_ipaddress {
10.199.193.234 dev management scope global
}

unicast_src_ip 10.199.193.233
unicast_peer {
10.199.193.231
10.199.193.232
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

vrrp_instance vip_14.1.1.234 {
state BACKUP
interface enterprise
virtual_router_id 44
nopreempt
advert_int 1

track_interface {
enterprise
}

virtual_ipaddress {
14.1.1.234 dev enterprise scope global
}

unicast_src_ip 14.1.1.233
unicast_peer {
14.1.1.231
14.1.1.232
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

vrrp_instance vip_192.168.101.4 {
state BACKUP
interface cluster
virtual_router_id 41
nopreempt
advert_int 1

track_interface {
cluster
}

virtual_ipaddress {
192.168.101.4 dev cluster scope global
}

unicast_src_ip 192.168.101.3
unicast_peer {
192.168.101.1
192.168.101.2
}

track_script {
node_health_check
}

notify /keepalivednotify.py root
}

Notify and track scripts
If any notify or track scripts are in use, please provide copies of them

System Log entries
Full keepalived system log entries from when keepalived started

Did keepalived coredump?
If so, can you please provide a stacktrace from the coredump, using gdb.

Additional context
Add any other context about the problem here.

@rajivginotra rajivginotra changed the title For Almost 23 sec, contention is VRRP States and its reflecting VRRP_MASTER For Almost 23 sec, contention in VRRP State for a particular VIP instance and its reflecting VRRP_MASTER Dec 11, 2020
@rajivginotra
Copy link
Author

Logs

@rajivginotra
Copy link
Author

192.168.101.1

Thu Dec 10 23:49:24 2020: Starting Keepalived v2.0.20 (01/22,2020)
Thu Dec 10 23:49:24 2020: Running on Linux 5.4.0-52-generic #57~18.04.1-Ubuntu SMP Thu Oct 15 14:04:49 UTC 2020 (built for Linux 4.4.228)
Thu Dec 10 23:49:24 2020: Command line: '/usr/sbin/keepalived' '--vrrp' '--dont-fork' '--log-console' '--log-detail'
Thu Dec 10 23:49:24 2020: '--release-vips' '--pid' '/etc/keepalived/keepalived.pid' '&'
Thu Dec 10 23:49:24 2020: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 10 23:49:24 2020: Starting VRRP child process, pid=36
Thu Dec 10 23:49:24 2020: Registering Kernel netlink reflector
Thu Dec 10 23:49:24 2020: Registering Kernel netlink command channel
Thu Dec 10 23:49:24 2020: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) Ignoring track_interface management since own interface
Thu Dec 10 23:49:24 2020: Assigned address 10.199.193.231 for interface management
Thu Dec 10 23:49:24 2020: Assigned address fe80::a653:eff:fedd:4d72 for interface management
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) Ignoring track_interface enterprise since own interface
Thu Dec 10 23:49:24 2020: Assigned address 14.1.1.231 for interface enterprise
Thu Dec 10 23:49:24 2020: Assigned address fe80::3efd:feff:fee6:5d74 for interface enterprise
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) Ignoring track_interface cluster since own interface
Thu Dec 10 23:49:24 2020: Assigned address 192.168.101.1 for interface cluster
Thu Dec 10 23:49:24 2020: Assigned address fe80::3efd:feff:fee6:5d75 for interface cluster
Thu Dec 10 23:49:24 2020: Registering gratuitous ARP shared channel
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) removing VIPs.
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(6), family(IPv4), proto(112), unicast(1), fd(10,11)]
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(8), family(IPv4), proto(112), unicast(1), fd(12,13)]
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(9), family(IPv4), proto(112), unicast(1), fd(14,15)]
Thu Dec 10 23:49:25 2020: VRRP_Script(node_health_check) succeeded
Thu Dec 10 23:49:25 2020: (vip_10.199.193.234) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_10.199.193.234: sending gratuitous ARP for 10.199.193.231
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on management for 10.199.193.231
Thu Dec 10 23:49:25 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_14.1.1.234: sending gratuitous ARP for 14.1.1.231
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on enterprise for 14.1.1.231
Thu Dec 10 23:49:25 2020: (vip_192.168.101.4) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_192.168.101.4: sending gratuitous ARP for 192.168.101.1
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on cluster for 192.168.101.1
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Receive advertisement timeout
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Entering MASTER STATE
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) using locally configured advertisement interval (1000 milli-sec)
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) setting VIPs.
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Master received advert from 10.199.193.233 with same priority 100 but higher IP address than ours
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Entering BACKUP STATE
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) removing VIPs.
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) Receive advertisement timeout
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) Entering MASTER STATE
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) using locally configured advertisement interval (1000 milli-sec)
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) setting VIPs.
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:39 2020: Deassigned address 14.1.1.231 from interface enterprise
Thu Dec 10 23:49:39 2020: Assigned address 14.1.1.231 for interface enterprise
Thu Dec 10 23:49:53 2020: (vip_14.1.1.234) Master received advert from 14.1.1.233 with same priority 100 but higher IP address than ours
Thu Dec 10 23:49:53 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:53 2020: (vip_14.1.1.234) removing VIPs.
Thu Dec 10 23:49:56 2020: Netlink reports enterprise down
Thu Dec 10 23:49:56 2020: (vip_14.1.1.234) Entering FAULT STATE
Thu Dec 10 23:49:56 2020: Deassigned address 14.1.1.231 from interface enterprise
Thu Dec 10 23:49:56 2020: Assigned address 14.1.1.231 for interface enterprise
Thu Dec 10 23:49:56 2020: Netlink reports enterprise up
Thu Dec 10 23:49:56 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:56 2020: vip_14.1.1.234: sending gratuitous ARP for 14.1.1.231
Thu Dec 10 23:49:56 2020: Sending gratuitous ARP on enterprise for 14.1.1.231
Fri Dec 11 00:00:10 2020: Interface caliaea39a66dd2 added
Fri Dec 11 00:00:23 2020: Interface caliaea39a66dd2 deleted
Fri Dec 11 01:00:11 2020: Interface calibe21e4c762e added
Fri Dec 11 01:00:23 2020: Interface calibe21e4c762e deleted
Fri Dec 11 03:00:04 2020: Interface calif80bbf54eaf added
Fri Dec 11 03:00:18 2020: Interface calif80bbf54eaf deleted
Fri Dec 11 07:00:06 2020: Interface calic6d82dc2f3b added
Fri Dec 11 07:00:19 2020: Interface calic6d82dc2f3b deleted

@rajivginotra
Copy link
Author

192.168.101.2

Thu Dec 10 23:49:24 2020: Starting Keepalived v2.0.20 (01/22,2020)
Thu Dec 10 23:49:24 2020: Running on Linux 5.4.0-52-generic #57~18.04.1-Ubuntu SMP Thu Oct 15 14:04:49 UTC 2020 (built for Linux 4.4.228)
Thu Dec 10 23:49:24 2020: Command line: '/usr/sbin/keepalived' '--vrrp' '--dont-fork' '--log-console' '--log-detail'
Thu Dec 10 23:49:24 2020: '--release-vips' '--pid' '/etc/keepalived/keepalived.pid' '&'
Thu Dec 10 23:49:24 2020: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 10 23:49:24 2020: Starting VRRP child process, pid=31
Thu Dec 10 23:49:24 2020: Registering Kernel netlink reflector
Thu Dec 10 23:49:24 2020: Registering Kernel netlink command channel
Thu Dec 10 23:49:24 2020: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) Ignoring track_interface management since own interface
Thu Dec 10 23:49:24 2020: Assigned address 10.199.193.232 for interface management
Thu Dec 10 23:49:24 2020: Assigned address fe80::a653:eff:fead:27be for interface management
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) Ignoring track_interface enterprise since own interface
Thu Dec 10 23:49:24 2020: Assigned address 14.1.1.232 for interface enterprise
Thu Dec 10 23:49:24 2020: Assigned address fe80::3efd:feff:fee6:5e34 for interface enterprise
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) Ignoring track_interface cluster since own interface
Thu Dec 10 23:49:24 2020: Assigned address 192.168.101.2 for interface cluster
Thu Dec 10 23:49:24 2020: Assigned address fe80::3efd:feff:fee6:5e35 for interface cluster
Thu Dec 10 23:49:24 2020: Registering gratuitous ARP shared channel
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) removing VIPs.
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(6), family(IPv4), proto(112), unicast(1), fd(10,11)]
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(8), family(IPv4), proto(112), unicast(1), fd(12,13)]
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(9), family(IPv4), proto(112), unicast(1), fd(14,15)]
Thu Dec 10 23:49:25 2020: VRRP_Script(node_health_check) succeeded
Thu Dec 10 23:49:25 2020: (vip_10.199.193.234) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_10.199.193.234: sending gratuitous ARP for 10.199.193.232
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on management for 10.199.193.232
Thu Dec 10 23:49:25 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_14.1.1.234: sending gratuitous ARP for 14.1.1.232
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on enterprise for 14.1.1.232
Thu Dec 10 23:49:25 2020: (vip_192.168.101.4) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_192.168.101.4: sending gratuitous ARP for 192.168.101.2
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on cluster for 192.168.101.2
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Receive advertisement timeout
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Entering MASTER STATE
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) using locally configured advertisement interval (1000 milli-sec)
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) setting VIPs.
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Master received advert from 10.199.193.233 with same priority 100 but higher IP address than ours
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Entering BACKUP STATE
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) removing VIPs.
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) Receive advertisement timeout
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) Entering MASTER STATE
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) using locally configured advertisement interval (1000 milli-sec)
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) setting VIPs.
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:30 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:40 2020: Deassigned address 14.1.1.232 from interface enterprise
Thu Dec 10 23:49:40 2020: Assigned address 14.1.1.232 for interface enterprise
Thu Dec 10 23:49:53 2020: (vip_14.1.1.234) Master received advert from 14.1.1.233 with same priority 100 but higher IP address than ours
Thu Dec 10 23:49:53 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:53 2020: (vip_14.1.1.234) removing VIPs.
Thu Dec 10 23:49:56 2020: Netlink reports enterprise down
Thu Dec 10 23:49:56 2020: (vip_14.1.1.234) Entering FAULT STATE
Thu Dec 10 23:49:56 2020: Deassigned address 14.1.1.232 from interface enterprise
Thu Dec 10 23:49:56 2020: Assigned address 14.1.1.232 for interface enterprise
Thu Dec 10 23:49:57 2020: Netlink reports enterprise up
Thu Dec 10 23:49:57 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:57 2020: vip_14.1.1.234: sending gratuitous ARP for 14.1.1.232
Thu Dec 10 23:49:57 2020: Sending gratuitous ARP on enterprise for 14.1.1.232
Fri Dec 11 04:00:02 2020: Interface calibd4ffe8c412 added
Fri Dec 11 04:00:15 2020: Interface calibd4ffe8c412 deleted
Fri Dec 11 08:00:09 2020: Interface cali1114c9e34a0 added
Fri Dec 11 08:00:22 2020: Interface cali1114c9e34a0 deleted

@rajivginotra
Copy link
Author

192.168.101.3

Thu Dec 10 23:49:24 2020: Starting Keepalived v2.0.20 (01/22,2020)
Thu Dec 10 23:49:24 2020: Running on Linux 5.4.0-52-generic #57~18.04.1-Ubuntu SMP Thu Oct 15 14:04:49 UTC 2020 (built for Linux 4.4.228)
Thu Dec 10 23:49:24 2020: Command line: '/usr/sbin/keepalived' '--vrrp' '--dont-fork' '--log-console' '--log-detail'
Thu Dec 10 23:49:24 2020: '--release-vips' '--pid' '/etc/keepalived/keepalived.pid' '&'
Thu Dec 10 23:49:24 2020: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 10 23:49:24 2020: Starting VRRP child process, pid=31
Thu Dec 10 23:49:24 2020: Registering Kernel netlink reflector
Thu Dec 10 23:49:24 2020: Registering Kernel netlink command channel
Thu Dec 10 23:49:24 2020: Opening file '/etc/keepalived/keepalived.conf'.
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) Ignoring track_interface management since own interface
Thu Dec 10 23:49:24 2020: Assigned address 10.199.193.233 for interface management
Thu Dec 10 23:49:24 2020: Assigned address fe80::a653:eff:fedd:4b92 for interface management
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) Ignoring track_interface enterprise since own interface
Thu Dec 10 23:49:24 2020: Assigned address 14.1.1.233 for interface enterprise
Thu Dec 10 23:49:24 2020: Assigned address fe80::3efd:feff:fee6:5d44 for interface enterprise
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) Ignoring track_interface cluster since own interface
Thu Dec 10 23:49:24 2020: Assigned address 192.168.101.3 for interface cluster
Thu Dec 10 23:49:24 2020: Assigned address fe80::3efd:feff:fee6:5d45 for interface cluster
Thu Dec 10 23:49:24 2020: Registering gratuitous ARP shared channel
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_10.199.193.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_14.1.1.234) removing VIPs.
Thu Dec 10 23:49:24 2020: (vip_192.168.101.4) removing VIPs.
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(6), family(IPv4), proto(112), unicast(1), fd(10,11)]
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(8), family(IPv4), proto(112), unicast(1), fd(12,13)]
Thu Dec 10 23:49:24 2020: VRRP sockpool: [ifindex(9), family(IPv4), proto(112), unicast(1), fd(14,15)]
Thu Dec 10 23:49:25 2020: VRRP_Script(node_health_check) succeeded
Thu Dec 10 23:49:25 2020: (vip_10.199.193.234) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_10.199.193.234: sending gratuitous ARP for 10.199.193.233
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on management for 10.199.193.233
Thu Dec 10 23:49:25 2020: (vip_14.1.1.234) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_14.1.1.234: sending gratuitous ARP for 14.1.1.233
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on enterprise for 14.1.1.233
Thu Dec 10 23:49:25 2020: (vip_192.168.101.4) Entering BACKUP STATE
Thu Dec 10 23:49:25 2020: vip_192.168.101.4: sending gratuitous ARP for 192.168.101.3
Thu Dec 10 23:49:25 2020: Sending gratuitous ARP on cluster for 192.168.101.3
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Receive advertisement timeout
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Entering MASTER STATE
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) using locally configured advertisement interval (1000 milli-sec)
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) setting VIPs.
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_14.1.1.234) Receive advertisement timeout
Thu Dec 10 23:49:29 2020: (vip_14.1.1.234) Entering MASTER STATE
Thu Dec 10 23:49:29 2020: (vip_14.1.1.234) using locally configured advertisement interval (1000 milli-sec)
Thu Dec 10 23:49:29 2020: (vip_14.1.1.234) setting VIPs.
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:49:29 2020: (vip_192.168.101.4) Receive advertisement timeout
Thu Dec 10 23:49:29 2020: (vip_192.168.101.4) Entering MASTER STATE
Thu Dec 10 23:49:29 2020: (vip_192.168.101.4) using locally configured advertisement interval (1000 milli-sec)
Thu Dec 10 23:49:29 2020: (vip_192.168.101.4) setting VIPs.
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:49:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Received advert from 10.199.193.231 with lower priority 100, ours 100, forcing new election
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Received advert from 10.199.193.232 with lower priority 100, ours 100, forcing new election
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:49:39 2020: Deassigned address 14.1.1.233 from interface enterprise
Thu Dec 10 23:49:39 2020: Assigned address 14.1.1.233 for interface enterprise
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:50:09 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:50:09 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:09 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:50:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:50:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:50:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:50:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:50:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:51:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:51:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:51:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:51:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:51:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:51:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:52:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:52:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:52:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:52:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:52:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:52:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:53:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:53:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:53:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:53:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:53:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:53:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:54:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:54:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:54:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:54:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:54:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:54:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:55:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:55:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:55:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:55:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:55:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:55:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:56:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:56:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:56:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:56:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:56:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:56:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:57:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:57:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:57:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:57:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:57:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:57:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:58:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:58:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:58:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:58:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:58:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:58:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Thu Dec 10 23:59:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Thu Dec 10 23:59:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Thu Dec 10 23:59:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Thu Dec 10 23:59:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Thu Dec 10 23:59:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Thu Dec 10 23:59:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234
Fri Dec 11 00:00:10 2020: Interface cali83f17552d42 added
Fri Dec 11 00:00:29 2020: Sending gratuitous ARP on enterprise for 14.1.1.234
Fri Dec 11 00:00:29 2020: (vip_14.1.1.234) Sending/queueing gratuitous ARPs on enterprise for 14.1.1.234
Fri Dec 11 00:00:29 2020: Sending gratuitous ARP on cluster for 192.168.101.4
Fri Dec 11 00:00:29 2020: (vip_192.168.101.4) Sending/queueing gratuitous ARPs on cluster for 192.168.101.4
Fri Dec 11 00:00:29 2020: Sending gratuitous ARP on management for 10.199.193.234
Fri Dec 11 00:00:29 2020: (vip_10.199.193.234) Sending/queueing gratuitous ARPs on management for 10.199.193.234

@rajivginotra
Copy link
Author

@pqarmitage : Can you please take a look at this issue? Appreciate any help in this regard.

@pqarmitage
Copy link
Collaborator

@rajivginotra The first thing you need to do is to set the priorities of the VRRP instances appropriately. With your current configuration which vrrp instance is master is sometimes determined by which system has the higher IP address on the interface being used. So although the VRRP protocol supports the same priority being used by the different nodes with the same vrrp instance, when the MASTER instance stops being master, the other two nodes will both try to become master at the same time. After that the one with the lower priority will back off and revert to backup; this will all cause some flapping. You can see this happening in the logs at 23:49:29 when vip_10.199.193.234 becomes master simultaneously on all three systems, and then the ones with IP addresses 10.199.193.231 and 232 drop back to backup when the advert is received from 10.199.193.233. If the priorities are different, then the next higher priority vrrp instance will take over as master cleanly. If you don't want one vrrp instance to take over as MASTER when another one is already MASTER, then use the nopreempt configuration option. This won't solve your problem, but it will make keepalived operate more cleanly.

Somehow keepalived is starting up at exactly the same time (give or take a few milliseconds) on all three nodes. An understanding of the environment this is all running in might be helpful.

Now I am guessing here, but since the PIDs of the keepalived processes are 36, 31 and 31, which makes me think that keepalived is starting at system boot time (or is it being run in containers since the PIDs are so low?). We quite often see problems when that happens due to the network taking time to settle down and pass traffic reliably, and that is what appears to be happening here.

The reason that all three instances of vip_14.1.1.234 are in MASTER state is that none of them is seeing adverts from the other two nodes. At 23:49:53 traffic starts being received on the 14.1.1.0/24 network, and so the two nodes with lower IP addresses revert to backup. Using tcpdump or wireshark might help see what is happening.

@rajivginotra
Copy link
Author

@rajivginotra The first thing you need to do is to set the priorities of the VRRP instances appropriately. With your current configuration which vrrp instance is master is sometimes determined by which system has the higher IP address on the interface being used. So although the VRRP protocol supports the same priority being used by the different nodes with the same vrrp instance, when the MASTER instance stops being master, the other two nodes will both try to become master at the same time. After that the one with the lower priority will back off and revert to backup; this will all cause some flapping. You can see this happening in the logs at 23:49:29 when vip_10.199.193.234 becomes master simultaneously on all three systems, and then the ones with IP addresses 10.199.193.231 and 232 drop back to backup when the advert is received from 10.199.193.233. If the priorities are different, then the next higher priority vrrp instance will take over as master cleanly. If you don't want one vrrp instance to take over as MASTER when another one is already MASTER, then use the nopreempt configuration option. This won't solve your problem, but it will make keepalived operate more cleanly.

RG> Thanks for @pqarmitage prompt response as always. One interesting finding which I have share that with v2.0.18 image we don't see this issue coming there in recent past we migrated to v2.0.20 version. And we if you see above keepalived.conf file we are already using 'nopreempt' option already but yes we can explore for 'priority' option.

Somehow keepalived is starting up at exactly the same time (give or take a few milliseconds) on all three nodes. An understanding of the environment this is all running in might be helpful.

**RG> So we are running K8s cluster with 3 nodes and each node has 3 interfaces named as management, enterprise, and cluster interface. All the 3 interface has the VIP which Keepalived is managing it and we are running Keepalived pods (as daemon sets) so all the 3 nodes are running once instance of keepalived daemon running inside the containers.

But I have one suspect case where Keepalived is restarting at same time I will check on this and will update you **

Now I am guessing here, but since the PIDs of the keepalived processes are 36, 31 and 31, which makes me think that keepalived is starting at system boot time (or is it being run in containers since the PIDs are so low?). We quite often see problems when that happens due to the network taking time to settle down and pass traffic reliably, and that is what appears to be happening here.

RG> Already mentioned in the last point

The reason that all three instances of vip_14.1.1.234 are in MASTER state is that none of them is seeing adverts from the other two nodes. At 23:49:53 traffic starts being received on the 14.1.1.0/24 network, and so the two nodes with lower IP addresses revert to backup. Using tcpdump or wireshark might help see what is happening.

RG> Sure will try to fetch the network dumps next time we see this issue again.

@pqarmitage pqarmitage added the Awaiting feedback Awaiting feedback from the originator of the issue label Dec 15, 2020
@pqarmitage
Copy link
Collaborator

I am closing this issue now since there has been no update for over 1 month. If the problem recurs and you can post your network dumps, then we can reopen this issue if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting feedback Awaiting feedback from the originator of the issue
Projects
None yet
Development

No branches or pull requests

2 participants