Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zebra not able to get interfaces addresses and kernel route #10404

Closed
fluboi opened this issue Jan 22, 2022 · 15 comments
Closed

Zebra not able to get interfaces addresses and kernel route #10404

fluboi opened this issue Jan 22, 2022 · 15 comments
Labels
triage Needs further investigation

Comments

@fluboi
Copy link

fluboi commented Jan 22, 2022

Describe the bug

Zebra is not able to get kernel route and interfaces addresses. sh ip route return nothing. And in sh int brief result, Addresses collum is emtpy.

tc2# sh ip route
tc2# 
tc2# sh int brief
Interface       Status  VRF             Addresses
---------       ------  ---             ---------
dummy0          up      default         
eno2            up      default         
lo              up      default         
tap103i0        up      default         
tap103i1        up      default         
tap113i0        up      default         
tap113i1        up      default         
tap113i2        up      default         
tap115i0        up      default         
tap116i0        up      default         
tap117i0        up      default         
tap118i0        up      default         
tap119i0        up      default         
tap120i0        up      default         
vmbr5           up      default         
vmbr5.2         up      default         

It does not match the kernel state:

lo               UNKNOWN        127.0.0.1/8 ::1/128 
eno2             UP             
vmbr5            UP             fe80::842f:9bff:fea6:488d/64 
vmbr5.2@vmbr5    UP             10.8.2.202/24 2a00:XX/64 fe80::842f:9bff:fea6:488d/64 
tap115i0         UNKNOWN        
tap119i0         UNKNOWN        
tap120i0         UNKNOWN        
tap103i0         UNKNOWN        
tap103i1         UNKNOWN        
tap116i0         UNKNOWN        
tap117i0         UNKNOWN        
tap113i0         UNKNOWN        
tap113i1         UNKNOWN        
tap113i2         UNKNOWN        
tap118i0         UNKNOWN        
dummy0           UNKNOWN        172.22.55.12/32 fe80::7ccb:a9ff:feca:5d52/64

Manually launching Zebra with --log-level debug shows:

root@tc2:~# /usr/lib/frr/zebra -t -F traditional -A 127.0.0.1 -s 90000000 --log-level debug
2022/01/22 14:24:48 ZEBRA: [KQNKJ-R5QVV][EC 4043309092] netlink-cmd (NS 0) error: data remnant size 32768
2022/01/22 14:24:48 ZEBRA: [KQNKJ-R5QVV][EC 4043309092] netlink-cmd (NS 0) error: data remnant size 32768
2022/01/22 14:24:48 ZEBRA: [WVJCK-PPMGD][EC 4043309093] netlink-cmd (NS 0) error: Device or resource busy, type=RTM_GETADDR(22), seq=4, pid=3468439451
2022/01/22 14:24:48 ZEBRA: [NNACN-54BDA][EC 4043309110] Disabling MPLS support (no kernel support)

[X] Did you check if this is a duplicate issue?
[ ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce
I see the same bug on 2 different nodes, but they are in the same Proxmox cluster, with the same hardware...
I'm not able to reproduce on an other Proxmox host (sames version Kernel/PVE/FRR...)

It might be linked to vmbr5, a vlan_aware bridge and the sub interface vmbr5.2

auto vmbr5
iface vmbr5 inet manual
        bridge_ports eno2
        bridge_stp off
        bridge_fd 0
        bridge_vlan_aware yes
        up echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier
        up echo 0 > /sys/devices/virtual/net/$IFACE/bridge/multicast_snooping

auto vmbr5.2
iface vmbr5.2 inet static
        address 10.8.2.202/24
        gateway 10.8.2.1

Versions

  • OS Version: Proxmox PVE 7.1-10 (based on debian 11.2)
  • Kernel: 5.13.19-3-pve
  • FRR Version: 8.1
@fluboi fluboi added the triage Needs further investigation label Jan 22, 2022
@tufeigunchu
Copy link

When you kill zebra task and let it restart, does the status become normal?

@fluboi
Copy link
Author

fluboi commented Jan 27, 2022

No, it does not change anything to killor kill -9 zebra and let watchfrr restart it.

@fluboi
Copy link
Author

fluboi commented Jan 27, 2022

Indeed it might be linked with #10423
Interestingly, if I launch Zebra, and then, ifup a dummy interface, this dummy interface correctly appears in zebra. All other interfaces still not.

@tufeigunchu
Copy link

What about changing ip address of tap?

@fluboi
Copy link
Author

fluboi commented Jan 27, 2022

If I add a new ip address on any interface when zebra is already running, it appears in frr/zebra. All the others addresses, that was here before zebra startup, are still missing.

@fluboi
Copy link
Author

fluboi commented Jan 27, 2022

Issue is not present using FRR version 7.4.
It is broken using 8.1, 8.0.1 and 7.5.1

@mjstapp
Copy link
Contributor

mjstapp commented Jan 27, 2022

I'm not reproducing this with a fairly vanilla linux, ubuntu20 for example, so it sounds like there's something special going on in your environment.

@fluboi
Copy link
Author

fluboi commented Jan 27, 2022

Ok I finally manage to understand and reproduce.

Vanilla Ubuntu 20.04.3 LTS
Kernel: 5.4.0-96-generic
FRR: 8.1

To reproduce:

ip link add vmbr0 type bridge vlan_filtering 1
for i in {10..20}; do  
  ip link add dummy$i type dummy ;  
  ip link set dev dummy$i up ;  
  ip link set dummy$i master vmbr0 ;  
  bridge vlan del dev dummy$i vid 1 ;  
  bridge vlan add dev dummy$i vid 2-4094 ; 
done

systemctl restart frr

vtysh -c "sh int brief"
vtysh -c "sh ip route"

The issue occurs when there are too many vlan.
In my real world proxmox setup, I have 2 VM with no VLAN ID defined in the GUI, on their NIC. The goal is to create a trunk with all VLAN available for the VM. (Virtual FW/router VM).

@donaldsharp
Copy link
Member

I have reproduced this.

donaldsharp added a commit to donaldsharp/frr that referenced this issue Feb 2, 2022
Currently when the kernel sends netlink messages to FRR
the buffers to receive this data is of fixed length.
The kernel, with certain configurations, will send
netlink messages that are larger than this fixed length.
This leads to situations where, on startup, zebra gets
really confused about the state of the kernel.  Effectively
the current algorithm is this:

read up to buffer in size
while (data to parse)
     get netlink message header, look at size
        parse if you can

The problem is that there is a 32k buffer we read.
We get the first message that is say 1k in size,
subtract that 1k to 31k left to parse.  We then
get the next header and notice that the length
of the message is 33k.  Which is obviously larger
than what we read in.  FRR has no recover mechanism
nor is there a way to know, a priori, what the maximum
size the kernel will send us.

Modify FRR to look at the kernel message and see if the
buffer is large enough, if not, make it large enough to
read in the message.

This code has to be per netlink socket because of the usage
of pthreads.  So add to `struct nlsock` the buffer and current
buffer length.  Growing it as necessary.

Fixes: FRRouting#10404
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
donaldsharp added a commit to donaldsharp/frr that referenced this issue Feb 2, 2022
Currently when the kernel sends netlink messages to FRR
the buffers to receive this data is of fixed length.
The kernel, with certain configurations, will send
netlink messages that are larger than this fixed length.
This leads to situations where, on startup, zebra gets
really confused about the state of the kernel.  Effectively
the current algorithm is this:

read up to buffer in size
while (data to parse)
     get netlink message header, look at size
        parse if you can

The problem is that there is a 32k buffer we read.
We get the first message that is say 1k in size,
subtract that 1k to 31k left to parse.  We then
get the next header and notice that the length
of the message is 33k.  Which is obviously larger
than what we read in.  FRR has no recover mechanism
nor is there a way to know, a priori, what the maximum
size the kernel will send us.

Modify FRR to look at the kernel message and see if the
buffer is large enough, if not, make it large enough to
read in the message.

This code has to be per netlink socket because of the usage
of pthreads.  So add to `struct nlsock` the buffer and current
buffer length.  Growing it as necessary.

Fixes: FRRouting#10404
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
donaldsharp added a commit to donaldsharp/frr that referenced this issue Feb 4, 2022
Currently when the kernel sends netlink messages to FRR
the buffers to receive this data is of fixed length.
The kernel, with certain configurations, will send
netlink messages that are larger than this fixed length.
This leads to situations where, on startup, zebra gets
really confused about the state of the kernel.  Effectively
the current algorithm is this:

read up to buffer in size
while (data to parse)
     get netlink message header, look at size
        parse if you can

The problem is that there is a 32k buffer we read.
We get the first message that is say 1k in size,
subtract that 1k to 31k left to parse.  We then
get the next header and notice that the length
of the message is 33k.  Which is obviously larger
than what we read in.  FRR has no recover mechanism
nor is there a way to know, a priori, what the maximum
size the kernel will send us.

Modify FRR to look at the kernel message and see if the
buffer is large enough, if not, make it large enough to
read in the message.

This code has to be per netlink socket because of the usage
of pthreads.  So add to `struct nlsock` the buffer and current
buffer length.  Growing it as necessary.

Fixes: FRRouting#10404
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
donaldsharp added a commit to donaldsharp/frr that referenced this issue Feb 8, 2022
Currently when the kernel sends netlink messages to FRR
the buffers to receive this data is of fixed length.
The kernel, with certain configurations, will send
netlink messages that are larger than this fixed length.
This leads to situations where, on startup, zebra gets
really confused about the state of the kernel.  Effectively
the current algorithm is this:

read up to buffer in size
while (data to parse)
     get netlink message header, look at size
        parse if you can

The problem is that there is a 32k buffer we read.
We get the first message that is say 1k in size,
subtract that 1k to 31k left to parse.  We then
get the next header and notice that the length
of the message is 33k.  Which is obviously larger
than what we read in.  FRR has no recover mechanism
nor is there a way to know, a priori, what the maximum
size the kernel will send us.

Modify FRR to look at the kernel message and see if the
buffer is large enough, if not, make it large enough to
read in the message.

This code has to be per netlink socket because of the usage
of pthreads.  So add to `struct nlsock` the buffer and current
buffer length.  Growing it as necessary.

Fixes: FRRouting#10404
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
@riw777 riw777 closed this as completed in 2cf7651 Feb 9, 2022
plsaranya pushed a commit to plsaranya/frr that referenced this issue Feb 28, 2022
Currently when the kernel sends netlink messages to FRR
the buffers to receive this data is of fixed length.
The kernel, with certain configurations, will send
netlink messages that are larger than this fixed length.
This leads to situations where, on startup, zebra gets
really confused about the state of the kernel.  Effectively
the current algorithm is this:

read up to buffer in size
while (data to parse)
     get netlink message header, look at size
        parse if you can

The problem is that there is a 32k buffer we read.
We get the first message that is say 1k in size,
subtract that 1k to 31k left to parse.  We then
get the next header and notice that the length
of the message is 33k.  Which is obviously larger
than what we read in.  FRR has no recover mechanism
nor is there a way to know, a priori, what the maximum
size the kernel will send us.

Modify FRR to look at the kernel message and see if the
buffer is large enough, if not, make it large enough to
read in the message.

This code has to be per netlink socket because of the usage
of pthreads.  So add to `struct nlsock` the buffer and current
buffer length.  Growing it as necessary.

Fixes: FRRouting#10404
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
@cosmedd
Copy link

cosmedd commented Mar 30, 2022

Will this backported to frr 8.2.x?

@aderumier
Copy link

@fluboi

Hi,
I'm the proxmox frr package maintenair, I'll try to see if I can backport it to current proxmox 8.0.1. (Another user have reported same kind of netlink error)

@fluboi
Copy link
Author

fluboi commented Apr 24, 2022

Hi,
Thanks @aderumier !
Currently the fix is in master, but not in 8.2.2 nor 8.3-dev.
@donaldsharp Any idea of when could we expect that fix to be in a stable version ?

@aderumier
Copy link

@fluboi
I had tried to backport to 8.0.1 but, they are 2 others patches
(#10482)
and I'm not sure about the stability.

I think I'll update to 8.2.2 + patches (seem they apply fine, I have done fast tests, I don't see evpn regression).

If you want to test, here a build with the 3 patches

wget https://mutulin1.odiso.net/frr_8.2.2-1+pve1_amd64.deb
wget https://mutulin1.odiso.net/frr_8.2.2-1+pve1_amd64.deb
dpkg -i frr_8.2.2-1+pve1_amd64.deb frr_8.2.2-1+pve1_amd64.deb
systemctl restart frr

For the record, I have another proxmox user on the forum with same kind of problem

Apr 22 09:06:55 parker zebra[1597466]: [WVJCK-PPMGD][EC 4043309093] netlink-cmd (NS 0) error: Device or resource busy, type=RTM_GETROUTE(26), seq=5, pid=2594392672

Apr 22 11:01:49 parker bgpd[1632074]: [VX6SM-8YE5W][EC 33554460] 10.0.10.4: nexthop_set failed, resetting connection - intf 0x0

fhttps://forum.proxmox.com/threads/implementations-of-sdn-networking.99628/page-2

@fluboi
Copy link
Author

fluboi commented Apr 25, 2022

@aderumier
Just tried your package, it fix the issue and evpn seems to just work as expected. (but it's a small Lab) :)

@aderumier
Copy link

@fluboi
thanks. It's also fixing forum user bug. I'm currently testing 8.2.2 on a big test cluster for 7-10 days. If it's ok, I'll update the proxmox repo to 8.2.2.

patrasar pushed a commit to patrasar/frr that referenced this issue Apr 28, 2022
Currently when the kernel sends netlink messages to FRR
the buffers to receive this data is of fixed length.
The kernel, with certain configurations, will send
netlink messages that are larger than this fixed length.
This leads to situations where, on startup, zebra gets
really confused about the state of the kernel.  Effectively
the current algorithm is this:

read up to buffer in size
while (data to parse)
     get netlink message header, look at size
        parse if you can

The problem is that there is a 32k buffer we read.
We get the first message that is say 1k in size,
subtract that 1k to 31k left to parse.  We then
get the next header and notice that the length
of the message is 33k.  Which is obviously larger
than what we read in.  FRR has no recover mechanism
nor is there a way to know, a priori, what the maximum
size the kernel will send us.

Modify FRR to look at the kernel message and see if the
buffer is large enough, if not, make it large enough to
read in the message.

This code has to be per netlink socket because of the usage
of pthreads.  So add to `struct nlsock` the buffer and current
buffer length.  Growing it as necessary.

Fixes: FRRouting#10404
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
gpnaveen pushed a commit to gpnaveen/frr that referenced this issue Jun 7, 2022
Currently when the kernel sends netlink messages to FRR
the buffers to receive this data is of fixed length.
The kernel, with certain configurations, will send
netlink messages that are larger than this fixed length.
This leads to situations where, on startup, zebra gets
really confused about the state of the kernel.  Effectively
the current algorithm is this:

read up to buffer in size
while (data to parse)
     get netlink message header, look at size
        parse if you can

The problem is that there is a 32k buffer we read.
We get the first message that is say 1k in size,
subtract that 1k to 31k left to parse.  We then
get the next header and notice that the length
of the message is 33k.  Which is obviously larger
than what we read in.  FRR has no recover mechanism
nor is there a way to know, a priori, what the maximum
size the kernel will send us.

Modify FRR to look at the kernel message and see if the
buffer is large enough, if not, make it large enough to
read in the message.

This code has to be per netlink socket because of the usage
of pthreads.  So add to `struct nlsock` the buffer and current
buffer length.  Growing it as necessary.

Fixes: FRRouting#10404
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

6 participants