Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Docker, zebra crashed when interface state moved from up->down->up #13523

Closed
2 tasks
skaliassk opened this issue May 15, 2023 · 5 comments
Closed
2 tasks
Labels
triage Needs further investigation

Comments

@skaliassk
Copy link

skaliassk commented May 15, 2023

While running FRR on Docker, zebra crashed when interface state moved from up->down->up

Topology:

Peer Device -------------17.0.0.0/8------------------- FRR on Docker (with macvlan mode)

Crash log

2023/05/11 14:19:52 BGP: [WNKP5-SN018] Found existing bnc 17.222.0.254/32(0)(VRF default) flags 0xa ifindex 0 #paths 0 peer 0x7f212afd7010
2023/05/11 14:20:09 BGP: [N25MR-FXT2C] Rx Intf down VRF 0 IF eth0
2023/05/11 14:20:09 BGP: [N25MR-FXT2C] Rx Intf down VRF 0 IF eth0
2023/05/11 14:20:09 BGP: [KGTKH-FVHEW] Rx Router Id update VRF 0 Id 0.0.0.0/32
2023/05/11 14:20:09 BGP: [WMCA1-27995] RID change : vrf VRF default(0), RTR ID 0.0.0.0
2023/05/11 14:20:09 BGP: [ZN4WJ-AVQKV] Rx Intf address del VRF 0 IF eth0 addr 17.0.0.2/8
2023/05/11 14:20:09 ZEBRA: [HSYZM-HV7HF] Extended Error: Carrier for nexthop device is down
2023/05/11 14:20:09 ZEBRA: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Network is down, type=RTM_NEWNEXTHOP(104), seq=5, pid=4194334435
2023/05/11 14:20:09 ZEBRA: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (4[17.0.0.1 if 8203]) into the kernel
2023/05/11 14:20:42 BGP: [ZXFVW-H54SV] Rx Intf up VRF 0 IF eth0
2023/05/11 14:20:42 BGP: [ZXFVW-H54SV] Rx Intf up VRF 0 IF eth0
ZEBRA: Received signal 11 at 1683795042 (si_addr 0xc8, PC 0x563dd6992a04); aborting...
2023/05/11 14:20:42 BGP: [KGTKH-FVHEW] Rx Router Id update VRF 0 Id 17.0.0.2/32
2023/05/11 14:20:42 BGP: [WMCA1-27995] RID change : vrf VRF default(0), RTR ID 17.0.0.2
2023/05/11 14:20:42 BGP: [GYPW0-GVZQ8] Rx Intf address add VRF 0 IF eth0 addr 17.0.0.2/8
ZEBRA: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6d) [0x7fc8e41adccd]
ZEBRA: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf3) [0x7fc8e41aded3]
ZEBRA: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xce631) [0x7fc8e41da631]
ZEBRA: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fc8e40d9730]
ZEBRA: /usr/lib/frr/zebra(zebra_vxlan_macvlan_up+0x24) [0x563dd6992a04]
ZEBRA: /usr/lib/frr/zebra(if_up+0x248) [0x563dd6914238]
ZEBRA: /usr/lib/frr/zebra(netlink_link_change+0xc6b) [0x563dd690e9ab]
ZEBRA: /usr/lib/frr/zebra(netlink_parse_info+0x14b) [0x563dd691a25b]
ZEBRA: /usr/lib/frr/zebra(+0x954ea) [0x563dd691a4ea]
ZEBRA: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7fc8e41ec4ed]
ZEBRA: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7fc8e41a6178]
ZEBRA: /usr/lib/frr/zebra(main+0x3a3) [0x563dd6907333]
ZEBRA: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fc8e3f2b09b]
ZEBRA: /usr/lib/frr/zebra(_start+0x2a) [0x563dd6907f6a]
ZEBRA: in thread kernel_read scheduled from ../zebra/kernel_netlink.c:505 kernel_read()
2023/05/11 14:20:47 STATIC: [MRN6F-AYZC4] Terminating on signal
2023/05/11 14:20:48 BGP: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/05/11 14:20:52 BGP: [TNK7N-FJF7K] Registering VRF 0
2023/05/11 14:20:52 BGP: [HKBB3-YX6A9] Rx Intf add VRF 0 IF eth0
2023/05/11 14:20:52 BGP: [HKBB3-YX6A9] Rx Intf add VRF 0 IF lo
2023/05/11 14:20:52 BGP: [HKBB3-YX6A9] Rx Intf add VRF 0 IF eth0
2023/05/11 14:20:52 BGP: [GYPW0-GVZQ8] Rx Intf address add VRF 0 IF eth0 addr 17.0.0.2/8
2023/05/11 14:20:52 BGP: [HKBB3-YX6A9] Rx Intf add VRF 0 IF lo
2023/05/11 14:20:52 BGP: [KGTKH-FVHEW] Rx Router Id update VRF 0 Id 17.0.0.2/32
2023/05/11 14:20:52 BGP: [WMCA1-27995] RID change : vrf VRF default(0), RTR ID 17.0.0.2
2023/05/11 14:20:52 BGP: [HKBB3-YX6A9] Rx Intf add VRF 0 IF eth0
2023/05/11 14:20:52 BGP: [GYPW0-GVZQ8] Rx Intf address add VRF 0 IF eth0 addr 17.0.0.2/8
2023/05/11 14:20:52 BGP: [HKBB3-YX6A9] Rx Intf add VRF 0 IF lo
2023/05/11 14:20:52 BGP: [MTH7E-8CG2C] Label Chunk assign: 16 - 143 (0)
2023/05/11 14:21:52 BGP: [WNKP5-SN018] Found existing bnc 17.222.0.254/32(0)(VRF default) flags 0xa ifindex 0 #paths 0 peer 0x7f212afd7010
2023/05/11 14:23:52 BGP: [WNKP5-SN018] Found existing bnc 17.222.0.254/32(0)(VRF default) flags 0xa ifindex 0 #paths 0 peer 0x7f212afd7010

Configuration :

# show  running-config
Building configuration...

Current configuration:
!
frr version 8.5.1
frr defaults traditional
hostname 81ce0b40637c
log syslog informational
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 100
 no bgp suppress-duplicates
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 no bgp network import-check
 neighbor 17.222.0.254 remote-as 100
 !
 address-family ipv4 unicast
  network 200.0.1.1/32
  network 200.0.1.2/32
  network 200.0.1.3/32
  network 200.0.1.4/32
  network 200.0.1.5/32
  neighbor 17.222.0.254 route-map DENY_ALL in
 exit-address-family
exit
!
route-map DENY_ALL deny 1
exit
!
end

Describe the bug

  • Did you check if this is a duplicate issue?
  • Did you test it on the latest FRRouting/frr master branch?

To Reproduce
Once BGP neighbor-ship is established bring down interface of peer device by issuing "shutdown" and then "no shutdown"

Expected behavior
BGP should re-establish the neighbor ship and zebra should not crash

Versions

  • OS Version: Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
  • Kernel: Linux (5.10.0-21-amd64)
  • FRR Version: FRRouting 8.5.1 (81ce0b40637c)

  • Docker : 20.10.5+dfsg1

@skaliassk skaliassk added the triage Needs further investigation label May 15, 2023
@tlsalmin
Copy link
Contributor

tlsalmin commented Jun 8, 2023

Same here. link_ifp is NULL

(gdb) p link_ifp
$1 = (struct interface *) 0x0
(gdb) p *zif
$3 = {ifp = 0x55555599caa0, flags = 0, shutdown = 2 '\002', multicast = 0 '\000', mpls = false, linkdown = false, linkdownv6 = false, v4mcast_on = false, v6mcast_on = false, rtadv_enable = 0 '\000', ipv4_subnets = 0x55555599cf00, nhg_dependents = {rr = {rbt_root = 0x555555af9c10, count = 2}}, up_count = 1,
up_last = "2023/06/07 14:13:02.15", '\000' <repeats 17 times>, down_count = 0, down_last = '\000' <repeats 39 times>, rtadv = {AdvSendAdvertisements = 0, MaxRtrAdvInterval = 600000, MinRtrAdvInterval = 198000, AdvIntervalTimer = 0, AdvManagedFlag = 0, lastadvmanagedflag = {tv_sec = 0, tv_usec = 0}, AdvOtherConfigFlag = 0, lastadvotherconfigflag = {
tv_sec = 0, tv_usec = 0}, AdvLinkMTU = 0, AdvReachableTime = 0, lastadvreachabletime = {tv_sec = 0, tv_usec = 0}, AdvRetransTimer = 0, lastadvretranstimer = {tv_sec = 0, tv_usec = 0}, AdvCurHopLimit = 64, lastadvcurhoplimit = {tv_sec = 0, tv_usec = 0}, AdvDefaultLifetime = -1, prefixes = {{rr = {rbt_root = 0x55555599f140, count = 1}}},
AdvHomeAgentFlag = 0, HomeAgentPreference = 0, HomeAgentLifetime = -1, AdvIntervalOption = 0, DefaultPreference = 0, AdvRDNSSList = 0x55555599bf40, AdvDNSSLList = 0x55555599ced0, UseFastRexmit = true, inFastRexmit = 0 '\000', ra_configured = 0 '\000', NumFastReXmitsRemain = 0}, ra_sent = 0, ra_rcvd = 0, irdp = 0x0, ptm_enable = 0 '\000',
zif_type = ZEBRA_IF_MACVLAN, zif_slave_type = ZEBRA_IF_SLAVE_NONE, l2info = {br = {vlan_aware = 0 '\000'}, vl = {vid = 0}, vxl = {vni = 0, vtep_ip = {s_addr = 0}, access_vlan = 0, mcast_grp = {s_addr = 0}, ifindex_link = 0, link_nsid = 0}, gre = {vtep_ip = {s_addr = 0}, vtep_ip_remote = {s_addr = 0}, ikey = 0, okey = 0, ifindex_link = 0, link_nsid = 0}},
brslave_info = {bridge_ifindex = 0, br_if = 0x0, ns_id = 0}, bondslave_info = {bond_ifindex = 0, bond_if = 0x0}, bond_info = {mbr_zifs = 0x0}, es_info = {sysmac = {octet = "\000\000\000\000\000"}, lid = 0, esi = {val = "\000\000\000\000\000\000\000\000\000"}, df_pref = 0, flags = 0 '\000', es = 0x0}, vlan_bitmap = {data = 0x0, n = 0, m = 0},
protodown_rc = 0, mac_list = 0x0, link_ifindex = 32, link = 0x0, speed_update_count = 0 '\000', speed_update = 0x55555599cf40, v6_2_v4_ll_neigh_entry = false, neigh_mac = "\000\000\000\000\000", v6_2_v4_ll_addr6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, desc = 0x0}

Copy link

github-actions bot commented Dec 6, 2023

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

@frrbot
Copy link

frrbot bot commented Dec 6, 2023

This issue will be automatically closed in the specified period unless there is further activity.

@frrbot frrbot bot closed this as completed Dec 13, 2023
@frrbot frrbot bot closed this as completed Dec 13, 2023
@frrbot frrbot bot closed this as completed Dec 13, 2023
@frrbot frrbot bot removed autoclose labels Dec 13, 2023
@tlsalmin
Copy link
Contributor

The problem seems to be that whenever there's a link down/up change in an interface which is a macvlan, where the parent interface isn't visible in the network namespace the zebra process is running, the zebra_if_update_link call will leave zif->link as null, which will then crash in if_up->zebra_vxlan_macvlan_up as that assumes the link pointer is non-null.

@tlsalmin
Copy link
Contributor

Fixing in #15010

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

2 participants