Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC5549 ipv6 next hop reachability with dynamic neighbors broken in 10.1 #16572

Closed
CS-BryanP opened this issue Aug 13, 2024 · 13 comments
Closed

Comments

@CS-BryanP
Copy link

Description

All IP Addresses are RFC1918 space or ipv6 link-local addresses, there is no sensitive data.
I have an arista switch doing bgp peering with a VM running frr over ipv6 link-local neighbors. Prior to frr 10.1 The ipv4 address of the loopback would be learned by the arista switch and advertised out to the rest of the network.

The arista would see it like this:
sh ip bgp 10.162.43.183/32
BGP routing table information for VRF default
Router identifier 10.160.0.7, local AS number 4260167780
BGP routing table entry for 10.162.43.183/32
Paths: 1 available
65333
fe80::250:56ff:febf:a800%Vl2222 from fe80::250:56ff:febf:a800%Vl2222 (10.162.43.183)
Origin INCOMPLETE, metric 0, localpref 100, IGP metric 1, weight 0, tag 0
Received 00:13:55 ago, valid, external, best
Rx SAFI: Unicast

An excerpt of the FRR show ip bgp neighbors

External BGP neighbor may be up to 1 hops away.
Local host: fe80::250:56ff:febf:a800, Local port: 179
Foreign host: fe80::febd:67ff:fe30:71c7, Foreign port: 43351
Nexthop: 10.162.43.183
Nexthop global: fe80::250:56ff:febf:a800 <--- Global address doesn't exist, so it is assigned to the link-local
Nexthop local: fe80::250:56ff:febf:a800

This ran fine and the image would be upgraded sequentially to the latest release with no problems.

Once frr 10.1 was installed, peering establishes, however the next hop is no-long a (valid) Link-Local address. It's an invalid Global Address.

Arista Switches sees this.
sh ip bgp 10.162.43.182/32
BGP routing table information for VRF default
Router identifier 10.160.0.8, local AS number 4260167780
BGP routing table entry for 10.162.43.182/32
Paths: 1 available
65333
::ffff:10.162.43.182 from fe80::250:56ff:febf:c5a2%Vl2222 (10.162.43.182)
Origin INCOMPLETE, metric 0, localpref 100, IGP metric -, weight 0, tag 0
Received 7d16h ago, invalid, external
Rx SAFI: Unicast

FRR Neighbor sees this:

sh bgp neighbor

External BGP neighbor may be up to 1 hops away.
Local host: fe80::250:56ff:febf:c5a2, Local port: 36972
Foreign host: fe80::febd:67ff:fe30:4ca5, Foreign port: 179
Nexthop: 10.162.43.182
Nexthop global: ::ffff:aa2:2bb6 <--- This is an invalid next-hop with the neighbor-id converted to hex.
Nexthop local: fe80::250:56ff:febf:c5a2

Version

infra-proxy02# sh ver
FRRouting 10.1 (infra-proxy02) on Linux(5.15.0-88-generic).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'

How to reproduce

Attempt to establish RFC5549 dynamic peering with a non-frr peer while running frr version 10.1

Here is the FRR config i'm using, I've used this to peer with both Arista and Cisco.

infra-proxy02# sh run
Building configuration...

Current configuration:
!
frr version 10.1
frr defaults traditional
hostname infra-proxy02
log file /var/log/frr/frr.log
log stdout
log syslog
no ip forwarding
no ipv6 forwarding
bgp send-extra-data zebra
service integrated-vtysh-config
!
ip prefix-list allow-in seq 10 permit 0.0.0.0/0
ip prefix-list allow-in seq 100 deny 0.0.0.0/32 le 32
ip prefix-list host-routes-out seq 10 permit 10.160.0.0/14 ge 32
ip prefix-list host-routes-out seq 20 permit 172.18.128.0/32 ge 32
ip prefix-list host-routes-out seq 100 deny 0.0.0.0/32 le 32
!
debug bgp neighbor-events
!
router bgp 65333
 bgp log-neighbor-changes
 no bgp ebgp-requires-policy
 no bgp network import-check
 neighbor torswitch peer-group
 neighbor torswitch remote-as 4200000005
 neighbor torswitch description Internal Fabric Network
 neighbor torswitch bfd
 neighbor torswitch bfd profile 1sec
 neighbor torswitch capability dynamic
 neighbor torswitch capability extended-nexthop
 neighbor ens192 interface peer-group torswitch
 neighbor ens224 interface peer-group torswitch
 bgp fast-convergence
 !
 address-family ipv4 unicast
  redistribute connected route-map Loopback
  neighbor torswitch prefix-list allow-in in
  neighbor torswitch prefix-list host-routes-out out
 exit-address-family
 !
 address-family ipv6 unicast
  neighbor torswitch activate
 exit-address-family
exit
!
route-map Loopback permit 10
 match ip address prefix-len 32
exit
!
route-map Loopback deny 100
exit
!
bfd
 profile 1sec
  transmit-interval 1000
  receive-interval 1000
 exit
 !
exit
!
end

Here is the arista configuration that hasn't been changed.

service routing protocols model multi-agent

vlan 1222
   name BGP-VMS
   
interface Vlan1222
   mtu 9000
   ipv6 enable
   ipv6 nd ra interval msec 4000 3000
   ipv6 nd router-preference high

ip routing ipv6 interfaces

router bgp 4260167780
   router-id 10.160.0.8
   bgp default ipv4-unicast transport ipv6
   bgp default ipv6-unicast
   timers bgp 20 60
   graceful-restart restart-time 120
   maximum-paths 8
   neighbor BGP-VMS peer group
   neighbor BGP-VMS local-as 4200000005 no-prepend replace-as
   neighbor BGP-VMS bfd
   neighbor BGP-VMS route-map server-loopbacks in
   neighbor BGP-VMS route-map default-route out
   neighbor BGP-VMS maximum-routes 3 warning-limit 67 percent warning-only
   redistribute connected
   neighbor interface Vl1222,2222 peer-group BGP-VMS remote-as 65333
   !
   address-family ipv4
      bgp next-hop address-family ipv6
     ```
     
     The server interfaces ens192 and ens224 do not have configured ipv4 addresses, nor any routable ipv6 addresses.

### Expected behavior

I expect the behavior of <10.1 where the route being learned from the FRR BGP peer has a valid link-local next-hop instead of an invalid global hexthop.

### Actual behavior

Actual behavior is that the route is learned on the Arista, however because of the invalid next hop, the route cannot be installed into the routing table.

### Additional context

I have a packet capture of the bgp open and update taken from the server running FRR and I see the invalid next-hop being set in packet# 185.

[bgp_establish.pcap.zip](https://github.com/user-attachments/files/16593034/bgp_establish.pcap.zip)


### Checklist

- [X] I have searched the open issues for this bug.
- [X] I have not included sensitive information in this report.
@CS-BryanP CS-BryanP added the triage Needs further investigation label Aug 13, 2024
@ne-vlezay80
Copy link
Contributor

use next hop self on peer router

@toreanderson
Copy link
Contributor

Seeing the same thing here. 9.1.1-01.el7 from https://rpm.frrouting.org/ works, while 10.1-01.el7 is broken. This breaks BGP Unnumbered interop with Cumulus Linux at least.

My config is as follows:

!
frr version 9.1.1
frr defaults traditional
hostname node31-h23-osl3
!
interface lo
 ip address 100.83.23.31/32
exit
!
router bgp 4283023031
 bgp router-id 100.83.23.31
 no bgp default ipv4-unicast
 neighbor leafs peer-group
 neighbor leafs remote-as external
 neighbor eth1 interface peer-group leafs
 !
 address-family ipv4 unicast
  network 100.83.23.31/32
  neighbor leafs activate
  neighbor leafs soft-reconfiguration inbound
  neighbor leafs prefix-list default in
  neighbor leafs prefix-list router-id out
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor leafs activate
  neighbor leafs soft-reconfiguration inbound
 exit-address-family
exit
!
ip prefix-list default seq 5 permit 0.0.0.0/0
ip prefix-list router-id seq 5 permit 100.83.23.31/32
!
end

This results in the following working route seen on the Cumulus switch:

leaf2# show ip bgp 100.83.23.31/32 
…
  4283023031
    fe80::a6bf:1ff:fe2d:689a from node31-h23-osl3(swp32) (100.83.23.31)
    (fe80::a6bf:1ff:fe2d:689a) (used)
      Origin IGP, metric 0, valid, external, bestpath-from-AS 4283023031, best (First path received)
      Last update: Tue Aug 13 08:27:12 2024

If I upgrade to frr-10, then this changes as follows:

leaf2# show ip bgp 100.83.23.31/32 
…
  4283023031
    ::ffff:6453:171f (inaccessible) from node31-h23-osl3(swp32) (100.83.23.31)
    (fe80::a6bf:1ff:fe2d:689a) (used)
      Origin IGP, metric 0, invalid, external
      Last update: Tue Aug 13 08:36:46 2024

Wondering if this could be the same or related to #15610 somehow.

@toreanderson
Copy link
Contributor

FRR 10.0.1-01 works fine. So this bug must have been introduced in the 10.1 minor release.

@ton31337
Copy link
Member

@louis-6wind isn't this related to 0325116?

@ton31337
Copy link
Member

@toreanderson would you be able to test a custom rpm/deb? Especially with this fix: #16439. Taking rpm/deb from the artifacts here: https://ci1.netdef.org/browse/FRR-PULLREQ3-4302/artifact.

@toreanderson
Copy link
Contributor

@ton31337 Tried frr-10.2_dev_20240725_git.368abf0-01.el7.x86_64.rpm, it does not solve the problem. The upstream devices still see the advertised route with a ::ffff:6453:171f (inaccessible) next-hop.

@ton31337
Copy link
Member

Regarding mapped IPv4... Could you check this also? https://ci1.netdef.org/browse/FRR-PULLREQ3-4501/artifact (Once the packages are built, now building...)

@toreanderson
Copy link
Contributor

I can confirm that frr-10.2_dev_20240814_git.d506417-01.el7.x86_64.rpm fixes the problem! 👏

@Cellebyte
Copy link

@ton31337 is it planned to backport a bugfix into 10.1 stable release?

@ton31337
Copy link
Member

@ton31337 is it planned to backport a bugfix into 10.1 stable release?

Actually, we discussed shortly that it's better to back out this IPv4 mapping into IPv6 change at all from 10.1, since it's not clear what was the real issue we were solving. Revert PR is here: #16587.

@ton31337
Copy link
Member

@toreanderson could you test from these artifacts? https://ci1.netdef.org/browse/FRR-PULLREQ3-4527/artifact

@toreanderson
Copy link
Contributor

@toreanderson could you test from these artifacts? https://ci1.netdef.org/browse/FRR-PULLREQ3-4527/artifact

LGTM 👍

@ton31337
Copy link
Member

ton31337 commented Sep 4, 2024

This has already backed out from 10.1, but we are still waiting for the release of 10.1.1. Hence closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants