-
Notifications
You must be signed in to change notification settings - Fork 1.5k
zebra: avoid redundant NHG kernel install for singleton-equivalent groups #20023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
zebra: avoid redundant NHG kernel install for singleton-equivalent groups #20023
Conversation
raja-rajasekar
commented
Nov 12, 2025
…oups
Duplicate nexthop (Ex: EPVN routes)
Problem:
When zebra receives duplicate nexthops (e.g., two paths resolving
to the same 192.168.1.1 for EVPN routes), it can end up installing a
singleton NHG pointing to another singleton NHG.
For ex: ip nexthop group (bgp_evpn_rt5-test_evpn_multipath)
id 34 via 192.168.1.1 dev bridge-101 scope link proto zebra onlink
id 44 group 34 proto zebra
id 45 group 33 proto zebra
How zebra assumes is that the
- NHG 34 is a singleton NHG
- NHG 44 is a singleton NHG which has a duplicate nexthop and depends
on NHG 34.
However kernel and lower layers dont care about the duplicate nexthops
and this redundant NHG installation can cause resource exhaustion at scale.
Fix:
Add intelligence to detect and skip kernel installation of multipath
NHGs that are functionally equivalent to a singleton.
What this means is that the NHG zebra creates with received NHs are
still maintained, but the installed NHG differs from the displayed i.e.
root@r2:/tmp/topotests/bgp_evpn_rt5.test_bgp_evpn/r2# vtysh -c "sh ip route vrf vrf-101 10.0.101.1/32 ne"
Routing entry for 10.0.101.1/32
Known via "bgp", distance 200, metric 0, vrf vrf-101, best
Last update 00:00:32 ago
Flags: Recursion iBGP Selected
Status: None
Nexthop Group ID: 108
Installed Nexthop Group ID: 34 >>>>>>>>>
Received Nexthop Group ID: 108 >>>>>>>>>
192.168.1.1, via bridge-101 onlink, weight 1
192.168.1.1, via bridge-101 (duplicate nexthop removed) onlink, weight 1
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
9fe045e to
2c97b98
Compare
|
Test locally: Without Fix: Baseline With Fix: |
d5ed5eb to
56dacda
Compare
… EVPN tests Validate singleton-equivalent NHG optimization in existing EVPN tests Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Add test for singleton-equivalent for multipath scenarios (4paths 2 sets of duplicates NH) i.e. NHG X [A,A,B,B] Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
56dacda to
e95517a
Compare
|
|
||
| def _bgp_check_nexthop(): | ||
| output = json.loads(r1.vtysh_cmd("show ip route 10.10.10.10/32 json")) | ||
| # With singleton-equivalent NHG optimization applied to duplicate nexthops, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we don't need these at all. Just remove installed, fib, and that should be fine, because we are checking just the addresses of nexthop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…mization Make changes in bgp_dynamic_capability to adapt to singleton-equivalent NHG optimization Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
e95517a to
0621fb0
Compare
|
The question is when shoiuld the deduplication pass actually happen? Should we have this happen at reception of hte nexthops from the upper level protocol? |
My original approach was to do it even before we create the NHG i.e. upon reception of the nexthops form upper level protocol. But this means we DONT preserve what upper level protocol sends us. But upon discussing with you/mark, to Preserve what upper protocols sent , i am doing the deduplication at kernel installation time - zebra maintains the full NHG internally but installs only the singleton-equivalent to the kernel. |
|
yeah, I think some of this probably comes from the history, the legacy of the logic that existed before the NHG concepts came in, when the code thought about individual nexthops only. it sort of feels like we're missing a step, and that legacy code is pushing us into increasingly complicated workarounds - in this area, in the "caching" PR that's also open. maybe it would make sense to think about a less nexthop-oriented approach, maybe just at the point where we've done the nexthop resolution/validity check. |
Agree mark, I think the NHG is convoluted to such an extent that maybe a sit down and re-design it might help. Else, we will have to fix complexing them furhter.. But as for this PR, let me know what needs to be done @donaldsharp @mjstapp |
|
some tests in evpn are failing internally.. let me rework on the fix |