Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BFD Breaks FRR Reload #6511

Closed
mrpaulmanley opened this issue Jun 3, 2020 · 7 comments
Closed

BFD Breaks FRR Reload #6511

mrpaulmanley opened this issue Jun 3, 2020 · 7 comments

Comments

@mrpaulmanley
Copy link

I have a problem with BFD breaking frr-reload. It's either a bug with how frr-reload parses the BFD node in a saved config, or I just misunderstand how BFD works with integrated config.
I'm running vanilla install of FRR 7.3.1 on 2 instances of Debian 10 that are directly connected. I enabled bgpd and bfdd in the daemons file. If I load the following configs, BFD appears to work by dynamically creating the peers (no BFD node is created in the running-config or frr.conf.) using default timers etc.

router1

frr defaults traditional
hostname debian10-1
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 65536
 bgp router-id 169.254.1.0
 bgp bestpath as-path multipath-relax
 neighbor 169.254.1.1 remote-as 65536
 neighbor 169.254.1.1 bfd
!
 address-family ipv4 unicast
 exit-address-family
!
line vty

router2

frr defaults traditional
hostname debian10-2
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 65536
 bgp router-id 169.254.1.1
 bgp bestpath as-path multipath-relax
 neighbor 169.254.1.0 remote-as 65536
 neighbor 169.254.1.0 bfd
!
 address-family ipv4 unicast
 exit-address-family
!
line vty

But if I add any config for the peer under the BFD node, FRR can't even reload it's own config, saved by issuing a write mem. Say I change the detect multiplier and the issue a write mem. Here is the resulting file written by frr. Notice the section at the end that declares BFD peer with config.

router1 with bfd peer declared

frr version 7.3
frr defaults traditional
hostname debian10-1
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 65536
 bgp router-id 169.254.1.0
 bgp bestpath as-path multipath-relax
 neighbor 169.254.1.1 remote-as 65536
 neighbor 169.254.1.1 bfd
!
line vty
!
bfd
 peer 169.254.1.1
  detect-multiplier 4
 !
!

Here is the resulting log from a subsequent frr-reload using this config that FRR wrote.

2020-05-13 17:03:50,246  INFO: Called via "Namespace(bindir='/usr/bin', confdir='/etc/frr', daemon='', debug=False, filename='/etc/frr/frr.conf', input=None, overwrite=False, reload=True, rundir='/var/run/frr', stdout=False, test=False)"
2020-05-13 17:03:50,246  INFO: Loading Config object from file /etc/frr/frr.conf
2020-05-13 17:03:50,378  INFO: Loading Config object from vtysh show running
2020-05-13 17:03:50,530  INFO: /var/run/frr/reload-BP4H29.txt content
['\nbfd', '\nbfd\n peer 169.254.1.1', '\nbfd\n detect-multiplier 4']
2020-05-13 17:03:50,673 WARNING: frr-reload.py failed due to
b''
2020-05-13 17:03:50,674  INFO: Loading Config object from vtysh show running
2020-05-13 17:03:50,818  INFO: /var/run/frr/reload-30ZKIA.txt content
['\nbfd\n detect-multiplier 4',
 '\nbfd',
 '\nbfd\n peer 169.254.1.1',
 '\nbfd\n detect-multiplier 4']
2020-05-13 17:03:50,955 WARNING: frr-reload.py failed due to
b''

This results in frr-reload failing. I'm thinking this is a bug with parsing the bfd node in a saved config. vtysh wrote that section and then fails to subsequently reload it.

@rzalamena
Copy link
Member

I can confirm the issue... I've hit the exact same problem when testing the FRR gentoo ebuilds recently. Looks like a problem in frr-reload.py and I didn't do any further investigation.

NOTE: the issue only happened to me in 7.3.1, it works on FRR 7.2.1.

@mrpaulmanley
Copy link
Author

I can confirm the issue... I've hit the exact same problem when testing the FRR gentoo ebuilds recently. Looks like a problem in frr-reload.py and I didn't do any further investigation.

NOTE: the issue only happened to me in 7.3.1, it works on FRR 7.2.1.

I just installed FRR 7.2.1 on ubuntu 20.04 and it has the same problem. I haven't found a version where this works for me.

@mrpaulmanley
Copy link
Author

mrpaulmanley commented Jul 7, 2020

@rzalamena I think the first problem is in frr-reload.py as you stated, but I think it goes deeper into vtysh not understanding the bfd config structure when parsing with the -m option.

Problem 1 appears to be that frr-reload.py is not treating bfd peers as a subcontext of bfd.

['\nbfd\n detect-multiplier 4',
 '\nbfd',
 '\nbfd\n peer 169.254.1.1',
 '\nbfd\n detect-multiplier 4']

the last line should be

 '\nbfd\n peer 169.254.1.1\n detect-multiplier 4'

Adding "peer" to this conditional fixes that, but I'm not sure if that's the correct fix
https://github.com/FRRouting/frr/blob/frr-7.3.1/tools/frr-reload.py#L495

            elif (line.startswith("address-family ") or
                  line.startswith("vnc defaults") or
                  line.startswith("vnc l2-group") or
                  line.startswith("vnc nve-group") or
                  line.startswith("peer") or
                  line.startswith("member pseudowire")):

Problem 2 seems to be in vtysh as it doesn't add any of the context ending/exit markers for bfd.
See what happens when I run an frr config through vtysh -m with bfd peers included.

bfd
 peer 169.254.1.1
  detect-multiplier 4
 !
 peer 169.254.1.3
  detect-multiplier 4
 !
!
sudo vtysh -m -f /etc/frr/frr.conf
...
line vty
!
end
bfd
 peer 169.254.1.1
  detect-multiplier 4
 !
end
 peer 169.254.1.3
  detect-multiplier 4
 !
!

end

vtysh -m is ending after the first bfd peer when it should be exiting the peer only.

@mrpaulmanley
Copy link
Author

I think I've found where to modify vtysh.c as well. This bit of logic attempts to apply config lines in the order they are written in the config file. If a command produces an error, then the command is retried one level higher in the config hierarchy. This logic is completely unaware of the bfd peer config node.

		 * If command doesn't succeeded in current node, try to walk up
		 * in node tree. Changing vty->node is enough to try it just
		 * out without actual walkup in the vtysh.
		 */
		while (ret != CMD_SUCCESS && ret != CMD_SUCCESS_DAEMON
		       && ret != CMD_WARNING && ret != CMD_WARNING_CONFIG_FAILED
		       && vty->node > CONFIG_NODE) {
			vty->node = node_parent(vty->node);
			ret = cmd_execute_command_strict(vline, vty, &cmd);
			tried++;
		}

		/*
		 * If command succeeded in any other node than current (tried >
		 * 0) we have to move into node in the vtysh where it
		 * succeeded.
		 */
		if (ret == CMD_SUCCESS || ret == CMD_SUCCESS_DAEMON
		    || ret == CMD_WARNING) {
			if ((prev_node == BGP_VPNV4_NODE
			     || prev_node == BGP_VPNV6_NODE
			     || prev_node == BGP_IPV4_NODE
			     || prev_node == BGP_IPV6_NODE
			     || prev_node == BGP_FLOWSPECV4_NODE
			     || prev_node == BGP_FLOWSPECV6_NODE
			     || prev_node == BGP_IPV4L_NODE
			     || prev_node == BGP_IPV6L_NODE
			     || prev_node == BGP_IPV4M_NODE
			     || prev_node == BGP_IPV6M_NODE
			     || prev_node == BGP_EVPN_NODE)
			    && (tried == 1)) {
				vty_out(vty, "exit-address-family\n");
			} else if ((prev_node == BGP_EVPN_VNI_NODE)
				   && (tried == 1)) {
				vty_out(vty, "exit-vni\n");
			} else if ((prev_node == KEYCHAIN_KEY_NODE)
				   && (tried == 1)) {
				vty_out(vty, "exit\n");
			} else if (tried) {
				vty_out(vty, "end\n");
			}
		}

I'm going to test by adding

			} else if ((prev_node == BFD_PEER_NODE)
				   && (tried == 1)) {
				vty_out(vty, "exit\n");

@rzalamena
Copy link
Member

Reminder: I think it might be worth backporting this, at least to stable/7.4, but I don't know if there are conflicts.

@mrpaulmanley
Copy link
Author

Reminder: I think it might be worth backporting this, at least to stable/7.4, but I don't know if there are conflicts.

Is there anything I can do to assist backporting? A PR or testing?

rzalamena pushed a commit to opensourcerouting/frr that referenced this issue Jul 14, 2020
add lines starting with 'peer' to the list of sub-contexts that are handled by frr-reload.py.

FRRouting#6511 (comment)

Signed-off-by: Paul Manley <paul.manley@wholefoods.com>
(cherry picked from commit 1c23a0a)
rzalamena pushed a commit to opensourcerouting/frr that referenced this issue Jul 14, 2020
vtysh needs to be aware of how to properly exit a bfd peer when subsequent commands only succeed in a higher context.

FRRouting#6511 (comment)

Signed-off-by: Paul Manley <paul.manley@wholefoods.com>
(cherry picked from commit b727c12)
@rzalamena
Copy link
Member

@mrpaulmanley first of all thank you for your contribution.

I opened a PR against the 7.4 stable branch with your changes so we can have it on the next release (probably 7.4.1). If you want to you can test it and give your feedback on the PR.

PR: #6737 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants