Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FRR crash using macvlan interface in a lxc/lxd/incus container #15370

Closed
1 of 2 tasks
joolli opened this issue Feb 14, 2024 · 27 comments · Fixed by #15399
Closed
1 of 2 tasks

FRR crash using macvlan interface in a lxc/lxd/incus container #15370

joolli opened this issue Feb 14, 2024 · 27 comments · Fixed by #15399
Labels
triage Needs further investigation

Comments

@joolli
Copy link

joolli commented Feb 14, 2024

Hi, I have a BGP-unnumbered setup going to two juniper QFX5120. That is working okay except, sometimes peering takes a long time to form (don't know if that's related). FRR is running in a LXD/incus container with two macvlan interfaces going to each switch. Those macvlan interfaces are configured with a vlan tag in the profile for the container. This container is running as a router-reflector and whenever one of the switches are restarted, zebra crashes and FRR restarts, causing an outage.

2024-02-13 15:52:50.400 [CRIT] zebra: Received signal 11 at 1707839570 (si_addr 0xd0, PC 0x55c30fb69be4); aborting...
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6f) [0x7f1f2a4bce7f]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7f1f2a4bd085]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xf2321) [0x7f1f2a4f2321]
2024-02-13 15:52:50.401 [CRIT] zebra: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7f1f2a25b050]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/frr/zebra(zebra_vxlan_macvlan_up+0x24) [0x55c30fb69be4]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/frr/zebra(if_up+0x2b8) [0x55c30fae2708]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/frr/zebra(zebra_if_dplane_result+0x12f3) [0x55c30fae4133]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/frr/zebra(+0xf5bd9) [0x55c30fb40bd9]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(event_call+0x81) [0x7f1f2a504741]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xc0) [0x7f1f2a4b4c00]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/frr/zebra(main+0x3be) [0x55c30fad555e]
2024-02-13 15:52:50.401 [CRIT] zebra: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f1f2a24624a]
2024-02-13 15:52:50.401 [CRIT] zebra: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f1f2a246305]
2024-02-13 15:52:50.401 [CRIT] zebra: /usr/lib/frr/zebra(_start+0x21) [0x55c30fad6501]
2024-02-13 15:52:50.401 [CRIT] zebra: in thread rib_process_dplane_results scheduled from ../zebra/zebra_rib.c:4954 rib_dplane_results()
2024-02-13 15:52:50.403 [ERR!] zebra: log monitor connection closed unexpectedly
2024-02-13 15:52:50.403 [ERR!] watchfrr: [HD38Q-0HBRT][EC 268435457] zebra state -> down : read returned EOF
2024-02-13 15:52:55.396 [INFO] watchfrr: [YFT0P-5Q5YX] Forked background command [pid 479]: /usr/lib/frr/watchfrr.sh restart all
2024-02-13 15:52:55.405 [NTFY] mgmtd: [J2RAS-MZ95C] Terminating on signal
2024-02-13 15:52:55.406 [ERR!] mgmtd: log monitor connection closed unexpectedly
2024-02-13 15:52:55.406 [ERR!] watchfrr: [HD38Q-0HBRT][EC 268435457] mgmtd state -> down : read returned EOF
2024-02-13 15:52:55.405 [ERR!] staticd: [X3G8F-PM93W] BE-client: mgmt_msg_read: got EOF/disconnect
2024-02-13 15:52:55.406 [NTFY] staticd: [MRN6F-AYZC4] Terminating on signal
2024-02-13 15:52:55.406 [ERR!] staticd: log monitor connection closed unexpectedly
2024-02-13 15:52:55.406 [ERR!] watchfrr: [HD38Q-0HBRT][EC 268435457] staticd state -> down : read returned EOF

This initially happened on FRR 8.4.2, which I got from Debian stable

FRRouting 8.4.2 (IC-RR1) on Linux(6.1.0-12-amd64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-
option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--localstatedir=/var/run/frr' '--sbindi
r=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tr
acking' '--enable-rpki' '--disable-scripting' '--disable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-os
pfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_
64-linux-gnu' 'PYTHON=python3'

Then I installed FRR 9.1 from frr-stable, from the FRR repo

FRRouting 9.1 (IC-RR1) on Linux(6.1.0-12-amd64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-
option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--localstatedir=/var/run/frr' '--sbindi
r=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tr
acking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-osp
fapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_6
4-linux-gnu' 'PYTHON=python3'

The problem persisted. Both versions on kernel 6.1.0-12-amd64.

The system is Debian 12.1

To Reproduce

  • Fire up debian 12 on LXD/incus
  • Have it connect to two switches (juniper QFX5120?) over tagged macvlan ("vlan: " under the device)
  • Setup bgp-unnumbered between them for eBGP
  • Setup iBGP peering between the loopbacks
  • Maybe setup some macvrf's on the Junipers (don't know if it matters)
  • turn on debug bgp neighbor-events
  • restart one of the switches
  • Did you check if this is a duplicate issue?
  • Did you test it on the latest FRRouting/frr master branch?

Possibly the same issue:
#13523

Expected behavior
FRR to stay up and keep other bgp-unnumbered peer sessions going.

Versions

  • OS Version: Debian 12.1
  • Kernel: 6.1.0-12-amd64 and 6.1.0-13-amd64
  • FRR version: 8.4.2 and 9.1
@joolli joolli added the triage Needs further investigation label Feb 14, 2024
louis-6wind added a commit to louis-6wind/frr that referenced this issue Feb 20, 2024
Linux kernel can leave a macvlan interface attached to a removed
link-interface. This situation is buggy. However, it must not result in
a zebra crash.

> 6  0x0000559d77a329d3 in zebra_vxlan_macvlan_up (ifp=0x559d798b8e00) at /root/frr/zebra/zebra_vxlan.c:4676
> 4676		link_zif = link_ifp->info;
> (gdb) list
> 4671		struct interface *link_ifp, *link_if;
> 4672
> 4673		zif = ifp->info;
> 4674		assert(zif);
> 4675		link_ifp = zif->link;
> 4676		link_zif = link_ifp->info;
> 4677		assert(link_zif);
> 4678
> (gdb) p zif->link
> $2 = (struct interface *) 0x0
> (gdb) p zif->link_ifindex
> $3 = 15

Fix the crash by returning when the macvlan link-interface is not found.
No need to go further because the macvlan interface is not operational.

Link: FRRouting#15370
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
@louis-6wind
Copy link
Contributor

@joolli

Could you give us the "ip link show" output for all the namespaces ?

louis-6wind added a commit to louis-6wind/frr that referenced this issue Feb 21, 2024
A macvlan interface can have its underlying link-interface in another
namespace (aka. netns). However, by default, zebra does not know the
interface from the other namespaces. It results in a crash the pointer
to the link interface is NULL.

> 6  0x0000559d77a329d3 in zebra_vxlan_macvlan_up (ifp=0x559d798b8e00) at /root/frr/zebra/zebra_vxlan.c:4676
> 4676		link_zif = link_ifp->info;
> (gdb) list
> 4671		struct interface *link_ifp, *link_if;
> 4672
> 4673		zif = ifp->info;
> 4674		assert(zif);
> 4675		link_ifp = zif->link;
> 4676		link_zif = link_ifp->info;
> 4677		assert(link_zif);
> 4678
> (gdb) p zif->link
> $2 = (struct interface *) 0x0
> (gdb) p zif->link_ifindex
> $3 = 15

Fix the crash by returning when the macvlan link-interface is in another
namespace. No need to go further because any vxlan under the macvlan
interface would not be accessible by zebra.

Link: FRRouting#15370
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
@louis-6wind
Copy link
Contributor

@joolli please test the fix. It does not crash anymore when the macvlan link-interface is in another namespace

@joolli
Copy link
Author

joolli commented Feb 21, 2024

@louis-6wind I want to but I have barely basic knowledge of git... Do you know of instructions on how I would get your patch?

@louis-6wind
Copy link
Contributor

You can do

git clone https://github.com/louis-6wind/frr/
cd frr
git checkout fix-macvlan-crash

@joolli
Copy link
Author

joolli commented Feb 27, 2024

I finally managed to test. I won't take as long next time.

# vtysh -c "show version"
FRRouting 10.1-dev (IC-RR2) on Linux(6.1.0-18-amd64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'
2024-02-27 11:11:03.788 [CRIT] zebra: Received signal 11 at 1709032263 (si_addr 0xd0, PC 0x5646d3e5a7d2); aborting...
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6f) [0x7f8ebf6c2dcf]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7f8ebf6c2fd5]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xfc5d1) [0x7f8ebf6fc5d1]
2024-02-27 11:11:03.789 [CRIT] zebra: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7f8ebf321050]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/frr/zebra(zebra_vxlan_macvlan_up+0x32) [0x5646d3e5a7d2]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/frr/zebra(if_up+0x230) [0x5646d3dd5560]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/frr/zebra(zebra_if_dplane_result+0x12fd) [0x5646d3dd701d]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/frr/zebra(+0xeda69) [0x5646d3e34a69]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(event_call+0x81) [0x7f8ebf70eca1]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xc0) [0x7f8ebf6ba710]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/frr/zebra(main+0x3e6) [0x5646d3dcb566]
2024-02-27 11:11:03.789 [CRIT] zebra: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f8ebf30c24a]
2024-02-27 11:11:03.789 [CRIT] zebra: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f8ebf30c305]
2024-02-27 11:11:03.789 [CRIT] zebra: /usr/lib/frr/zebra(_start+0x21) [0x5646d3dcc511]
2024-02-27 11:11:03.789 [CRIT] zebra: in thread rib_process_dplane_results scheduled from ../zebra/zebra_rib.c:4998 rib_dplane_results()
2024-02-27 11:11:03.790 [ERR!] zebra: log monitor connection closed unexpectedly
2024-02-27 11:11:03.790 [ERR!] watchfrr: [HD38Q-0HBRT][EC 268435457] zebra state -> down : read returned EOF
2024-02-27 11:11:03.790 [ERR!] mgmtd: [X3G8F-PM93W] BE-adapter: mgmt_msg_read: got EOF/disconnect
2024-02-27 11:11:08.788 [INFO] watchfrr: [YFT0P-5Q5YX] Forked background command [pid 39836]: /usr/lib/frr/watchfrr.sh restart all
2024-02-27 11:11:08.797 [NTFY] bgpd: [ZW1GY-R46JE] Terminating on signal
2024-02-27 11:11:08.797 [NTFY] mgmtd: [J2RAS-MZ95C] Terminating on signal

@louis-6wind
Copy link
Contributor

Please provide us the full backtrace.

apt-get install systemd-coredump
echo "|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e"> /proc/sys/kernel/core_pattern

Redo the crash.
Then open gdb and get the backtrace output with bt

coredumpctl gdb
bt

mergify bot pushed a commit that referenced this issue Feb 27, 2024
A macvlan interface can have its underlying link-interface in another
namespace (aka. netns). However, by default, zebra does not know the
interface from the other namespaces. It results in a crash the pointer
to the link interface is NULL.

> 6  0x0000559d77a329d3 in zebra_vxlan_macvlan_up (ifp=0x559d798b8e00) at /root/frr/zebra/zebra_vxlan.c:4676
> 4676		link_zif = link_ifp->info;
> (gdb) list
> 4671		struct interface *link_ifp, *link_if;
> 4672
> 4673		zif = ifp->info;
> 4674		assert(zif);
> 4675		link_ifp = zif->link;
> 4676		link_zif = link_ifp->info;
> 4677		assert(link_zif);
> 4678
> (gdb) p zif->link
> $2 = (struct interface *) 0x0
> (gdb) p zif->link_ifindex
> $3 = 15

Fix the crash by returning when the macvlan link-interface is in another
namespace. No need to go further because any vxlan under the macvlan
interface would not be accessible by zebra.

Link: #15370
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 44e6e38)
mergify bot pushed a commit that referenced this issue Feb 27, 2024
A macvlan interface can have its underlying link-interface in another
namespace (aka. netns). However, by default, zebra does not know the
interface from the other namespaces. It results in a crash the pointer
to the link interface is NULL.

> 6  0x0000559d77a329d3 in zebra_vxlan_macvlan_up (ifp=0x559d798b8e00) at /root/frr/zebra/zebra_vxlan.c:4676
> 4676		link_zif = link_ifp->info;
> (gdb) list
> 4671		struct interface *link_ifp, *link_if;
> 4672
> 4673		zif = ifp->info;
> 4674		assert(zif);
> 4675		link_ifp = zif->link;
> 4676		link_zif = link_ifp->info;
> 4677		assert(link_zif);
> 4678
> (gdb) p zif->link
> $2 = (struct interface *) 0x0
> (gdb) p zif->link_ifindex
> $3 = 15

Fix the crash by returning when the macvlan link-interface is in another
namespace. No need to go further because any vxlan under the macvlan
interface would not be accessible by zebra.

Link: #15370
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 44e6e38)
mergify bot pushed a commit that referenced this issue Feb 27, 2024
A macvlan interface can have its underlying link-interface in another
namespace (aka. netns). However, by default, zebra does not know the
interface from the other namespaces. It results in a crash the pointer
to the link interface is NULL.

> 6  0x0000559d77a329d3 in zebra_vxlan_macvlan_up (ifp=0x559d798b8e00) at /root/frr/zebra/zebra_vxlan.c:4676
> 4676		link_zif = link_ifp->info;
> (gdb) list
> 4671		struct interface *link_ifp, *link_if;
> 4672
> 4673		zif = ifp->info;
> 4674		assert(zif);
> 4675		link_ifp = zif->link;
> 4676		link_zif = link_ifp->info;
> 4677		assert(link_zif);
> 4678
> (gdb) p zif->link
> $2 = (struct interface *) 0x0
> (gdb) p zif->link_ifindex
> $3 = 15

Fix the crash by returning when the macvlan link-interface is in another
namespace. No need to go further because any vxlan under the macvlan
interface would not be accessible by zebra.

Link: #15370
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 44e6e38)
mergify bot pushed a commit that referenced this issue Feb 27, 2024
A macvlan interface can have its underlying link-interface in another
namespace (aka. netns). However, by default, zebra does not know the
interface from the other namespaces. It results in a crash the pointer
to the link interface is NULL.

> 6  0x0000559d77a329d3 in zebra_vxlan_macvlan_up (ifp=0x559d798b8e00) at /root/frr/zebra/zebra_vxlan.c:4676
> 4676		link_zif = link_ifp->info;
> (gdb) list
> 4671		struct interface *link_ifp, *link_if;
> 4672
> 4673		zif = ifp->info;
> 4674		assert(zif);
> 4675		link_ifp = zif->link;
> 4676		link_zif = link_ifp->info;
> 4677		assert(link_zif);
> 4678
> (gdb) p zif->link
> $2 = (struct interface *) 0x0
> (gdb) p zif->link_ifindex
> $3 = 15

Fix the crash by returning when the macvlan link-interface is in another
namespace. No need to go further because any vxlan under the macvlan
interface would not be accessible by zebra.

Link: #15370
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 44e6e38)
@joolli
Copy link
Author

joolli commented Feb 28, 2024

I've been unable to provide a backtrace. The crash does not happen as consistently now (at least the debug output doesn't show) but FRR restarts every time.
FRR did output the same debug as before once or twice, but I didn't get a coredump. I'm having trouble getting a coredump inside an unprivileged container.

@louis-6wind
Copy link
Contributor

You should modify the sysctl on the host on which the dockers are running on

@joolli
Copy link
Author

joolli commented Feb 28, 2024

I modified /proc/sys/kernel/core_pattern on the host. I'm using LXD/incus, but it should be the same.

@joolli
Copy link
Author

joolli commented Feb 28, 2024

Finally got it:
(gdb) bt
#0 0x00007fce2d76fe2c in ?? ()
#1 0x00007fff3c35ad30 in ?? ()
#2 0x785d594d86c38400 in ?? ()
#3 0x000000000000000b in ?? ()
#4 0x00007fff3c35ae70 in ?? ()
#5 0x0000559dcb9b67d2 in ?? ()
#6 0x00007fff3c35ad30 in ?? ()
#7 0x00007fff3c35afb0 in ?? ()
#8 0x00007fce2d720fb2 in ?? ()
#9 0x000000000000000b in ?? ()
#10 0x00007fce2dafc60c in ?? ()
#11 0x0000000000002400 in ?? ()
#12 0x0000000000000000 in ?? ()

@louis-6wind
Copy link
Contributor

??
It seems that your coredump is old and differs from the current version

@joolli
Copy link
Author

joolli commented Feb 28, 2024

The coredump resides on the host and I executed coredumpctl there. I'll see if I can get it to the container

@joolli
Copy link
Author

joolli commented Feb 28, 2024

I couldn't get the coredump to the container on first glance, I installed the patched version onto the host. Hope that suffices:

(gdb) bt
#0 __pthread_kill_implementation (threadid=, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007fce2d76fe8f in __pthread_kill_internal (signo=11, threadid=) at ./nptl/pthread_kill.c:78
#2 0x00007fce2d720fb2 in __GI_raise (sig=sig@entry=11) at ../sysdeps/posix/raise.c:26
#3 0x00007fce2dafc60c in core_handler (signo=11, siginfo=0x7fff3c35afb0, context=) at ../lib/sigevent.c:248
#4
#5 zebra_vxlan_macvlan_up (ifp=ifp@entry=0x559dccb125e0) at ../zebra/zebra_vxlan.c:5166
#6 0x0000559dcb931560 in if_up (ifp=0x559dccb125e0, install_connected=install_connected@entry=true) at ../zebra/interface.c:995
#7 0x0000559dcb93301d in zebra_if_dplane_ifp_handling (ctx=) at ../zebra/interface.c:2160
#8 zebra_if_dplane_result (ctx=) at ../zebra/interface.c:2249
#9 0x0000559dcb990a69 in rib_process_dplane_results (thread=) at ../zebra/zebra_rib.c:4936
#10 0x00007fce2db0eca1 in event_call (thread=thread@entry=0x7fff3c35bd60) at ../lib/event.c:2011
#11 0x00007fce2daba710 in frr_run (master=0x559dcc87f6f0) at ../lib/libfrr.c:1214
#12 0x0000559dcb927566 in main (argc=8, argv=0x7fff3c35c088) at ../zebra/main.c:512

@louis-6wind
Copy link
Contributor

then

frame 5
list

@joolli
Copy link
Author

joolli commented Feb 28, 2024

(gdb) frame 5
#5 __pthread_kill_implementation (threadid=, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 in ./nptl/pthread_kill.c
(gdb) list
39 in ./nptl/pthread_kill.c

@louis-6wind
Copy link
Contributor

not the same backtrace

@joolli
Copy link
Author

joolli commented Feb 28, 2024

(gdb) bt
#0 __pthread_kill_implementation (threadid=, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007fce2d76fe8f in __pthread_kill_internal (signo=11, threadid=) at ./nptl/pthread_kill.c:78
#2 0x00007fce2d720fb2 in __GI_raise (sig=sig@entry=11) at ../sysdeps/posix/raise.c:26
#3 0x00007fce2dafc60c in core_handler (signo=11, siginfo=0x7fff3c35afb0, context=) at ../lib/sigevent.c:248
#4
#5 zebra_vxlan_macvlan_up (ifp=ifp@entry=0x559dccb125e0) at ../zebra/zebra_vxlan.c:5166
#6 0x0000559dcb931560 in if_up (ifp=0x559dccb125e0, install_connected=install_connected@entry=true) at ../zebra/interface.c:995
#7 0x0000559dcb93301d in zebra_if_dplane_ifp_handling (ctx=) at ../zebra/interface.c:2160
#8 zebra_if_dplane_result (ctx=) at ../zebra/interface.c:2249
#9 0x0000559dcb990a69 in rib_process_dplane_results (thread=) at ../zebra/zebra_rib.c:4936
#10 0x00007fce2db0eca1 in event_call (thread=thread@entry=0x7fff3c35bd60) at ../lib/event.c:2011
#11 0x00007fce2daba710 in frr_run (master=0x559dcc87f6f0) at ../lib/libfrr.c:1214
#12 0x0000559dcb927566 in main (argc=8, argv=0x7fff3c35c088) at ../zebra/main.c:512
(gdb) frame 5
#5 zebra_vxlan_macvlan_up (ifp=ifp@entry=0x559dccb125e0) at ../zebra/zebra_vxlan.c:5166
5166 ../zebra/zebra_vxlan.c: No such file or directory.
(gdb) list
5161 in ../zebra/zebra_vxlan.c

@louis-6wind
Copy link
Contributor

Can you do in frame 5

p *ifp
p *zif
p link_ifp
p *link_ifp

@joolli
Copy link
Author

joolli commented Feb 28, 2024

(gdb) p *ifp
$1 = {name_entry = {rbt_parent = 0x559dccb12c20, rbt_left = 0x0, rbt_right = 0x0, rbt_color = 0}, index_entry = {rbt_parent = 0x0, rbt_left = 0x559dccbf2730, rbt_right = 0x559dccb12c40, rbt_color = 0}, 
  name = "eth0", '\000' <repeats 11 times>, ifindex = 73, oldifindex = 0, link_ifindex = 0, status = 5 '\005', flags = 69699, metric = 0, speed = 25000, txqlen = 1000, mtu = 9000, mtu6 = 9000, ll_type = ZEBRA_LLT_ETHER, 
  hw_addr = "\000\026>)\b}", '\000' <repeats 13 times>, hw_addr_len = 6, bandwidth = 0, link_params = 0x0, desc = 0x0, connected = {{dh = {hitem = {next = 0x559dccbf3fa0, prev = 0x559dccbf3fa0}, count = 1}}}, 
  nbr_connected = 0x559dccb12700, info = 0x559dccb12730, ptm_enable = 0 '\000', ptm_status = 2 '\002', node = 0x559dccb12ba0, vrf = 0x559dccae5eb0, configured = false, qobj_node = {nid = 6021160604443149314, nodehash = {hi = {
        next = 0x559dcccf6860, hashval = 325649410}}, type = 0x7fce2db47220 <qobj_t_interface>}}
(gdb) p *zif
$2 = {ifp = 0x559dccb125e0, flags = 0, shutdown = 0 '\000', multicast = 0 '\000', mpls = false, mpls_config = 0 '\000', linkdown = false, linkdownv6 = false, v4mcast_on = false, v6mcast_on = false, rtadv_enable = 0 '\000', 
  ipv4_subnets = 0x559dccb12a00, nhg_dependents = {rr = {rbt_root = 0x0, count = 0}}, up_count = 1, up_last = "2024/02/28 11:43:52.50", '\000' <repeats 17 times>, down_count = 1, 
  down_last = "2024/02/28 11:39:24.93", '\000' <repeats 17 times>, rtadv = {AdvSendAdvertisements = 1, MaxRtrAdvInterval = 10000, MinRtrAdvInterval = 198000, AdvIntervalTimer = 10000, AdvManagedFlag = 0, lastadvmanagedflag = {
      tv_sec = 0, tv_usec = 0}, AdvOtherConfigFlag = 0, lastadvotherconfigflag = {tv_sec = 0, tv_usec = 0}, AdvLinkMTU = 0, AdvReachableTime = 0, lastadvreachabletime = {tv_sec = 0, tv_usec = 0}, AdvRetransTimer = 0, 
    lastadvretranstimer = {tv_sec = 0, tv_usec = 0}, AdvCurHopLimit = 64, lastadvcurhoplimit = {tv_sec = 0, tv_usec = 0}, AdvDefaultLifetime = -1, prefixes = {{rr = {rbt_root = 0x0, count = 0}}}, AdvHomeAgentFlag = 0, 
    HomeAgentPreference = 0, HomeAgentLifetime = -1, AdvIntervalOption = 0, DefaultPreference = 0, AdvRDNSSList = 0x559dccb129a0, AdvDNSSLList = 0x559dccb129d0, UseFastRexmit = true, inFastRexmit = 1 '\001', ra_configured = 1 '\001', 
    NumFastReXmitsRemain = 4}, ra_sent = 79, ra_rcvd = 5, irdp = 0x0, ptm_enable = 0 '\000', zif_type = ZEBRA_IF_MACVLAN, zif_slave_type = ZEBRA_IF_SLAVE_NONE, l2info = {br = {bridge = {vlan_aware = 0 '\000', br_zif = 0x0, 
        vlan_table = 0x0}}, vl = {vid = 0}, vxl = {vni_info = {iftype = 0, {vni = {vni = 0, access_vlan = 0, mcast_grp = {s_addr = 0}, flags = 0}, vni_table = 0x0}}, vtep_ip = {s_addr = 0}, ifindex_link = 0, link_nsid = 0}, gre = {
      vtep_ip = {s_addr = 0}, vtep_ip_remote = {s_addr = 0}, ikey = 0, okey = 0, ifindex_link = 0, link_nsid = 0}}, brslave_info = {bridge_ifindex = 0, br_if = 0x0, ns_id = 0}, bondslave_info = {bond_ifindex = 0, bond_if = 0x0}, 
  bond_info = {mbr_zifs = 0x0}, es_info = {sysmac = {octet = "\000\000\000\000\000"}, lid = 0, esi = {val = "\000\000\000\000\000\000\000\000\000"}, df_pref = 32767, flags = 0 '\000', es = 0x0}, vlan_bitmap = {data = 0x0, n = 0, 
    m = 0}, protodown_rc = 0, mac_list = 0x0, link_nsid = 0, link_ifindex = 72, link = 0x0, speed_update_count = 0 '\000', speed_update = 0x0, v6_2_v4_ll_neigh_entry = false, neigh_mac = "\324\231l\020\033\205", v6_2_v4_ll_addr6 = {
    __in6_u = {__u6_addr8 = "\376\200\000\000\000\000\000\000\326\231l\000\024\020\033\205", __u6_addr16 = {33022, 0, 0, 0, 39382, 108, 4116, 34075}, __u6_addr32 = {33022, 0, 7117270, 2233143316}}}, desc = 0x0}
(gdb) p link_ifp
$3 = (struct interface *) 0x0
(gdb) p *link_ifp
Cannot access memory at address 0x0

@louis-6wind
Copy link
Contributor

louis-6wind commented Feb 28, 2024

and

p zif->link_nsid
p zif->link

@joolli
Copy link
Author

joolli commented Feb 28, 2024

(gdb) p zif->link_nsid
$4 = 0
(gdb) p zif->link
$5 = (struct interface *) 0x0

@louis-6wind
Copy link
Contributor

p zif->link

@louis-6wind louis-6wind reopened this Feb 28, 2024
@louis-6wind
Copy link
Contributor

louis-6wind commented Feb 28, 2024

For some reason, link_nsid is not set. Try this patch instead 7861f4f

@joolli
Copy link
Author

joolli commented Feb 28, 2024

Is it a part of the fix-macvlan-crash branch?

@louis-6wind
Copy link
Contributor

nope but you can apply the patch manually

@joolli
Copy link
Author

joolli commented Feb 28, 2024

Compiled with the patch applied and I do not get a crash now when the macvlan interface comes up

@louis-6wind
Copy link
Contributor

fixed in #15010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants