-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
triageNeeds further investigationNeeds further investigation
Description
Description
Honestly I don't understand what exactly happened, I can only attach relevant logs: frr-bgpd.txt
Version
FRRouting 10.2.1 (redacted) on Linux(5.14.0-427.22.1.el9_4.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
'--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-static' '--disable-werror' '--enable-multipath=256' '--enable-vtysh' '--enable-ospfclient' '--enable-ospfapi' '--enable-rtadv' '--enable-ldpd' '--enable-pimd' '--enable-pim6d' '--enable-pbrd' '--enable-nhrpd' '--enable-eigrpd' '--enable-babeld' '--enable-vrrpd' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-fpm' '--enable-watchfrr' '--disable-bgp-vnc' '--enable-isisd' '--enable-rpki' '--enable-bfdd' '--enable-pathd' '--disable-grpc' '--enable-snmp' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' 'CC=gcc' 'CXX=g++' 'LT_SYS_LIBRARY_PATH=/usr/lib64:'
How to reproduce
N/A
Expected behavior
watchfrr issuing kill -9 on timeout for kill -15 for unresponsive bgpd process
Actual behavior
bgpd become unresponsive and watchfrr haven't killed it for two hours
Additional context
This happened just before bgpd gone unresponsive:
Apr 8, 2025 @ 07:19:58.706 [VH6Z7-MNSN0][EC 33554511] 2001:7b8:62b:1:0:d4ff:fe72:7848(Unknown) has not made any SendQ progress for 1 holdtime (9s), peer overloaded?
Apr 8, 2025 @ 07:20:04.448 [VH6Z7-MNSN0][EC 33554511] 2001:7b8:62b:1:0:d4ff:fe72:7848(Unknown) has not made any SendQ progress for 1 holdtime (9s), peer overloaded?
Apr 8, 2025 @ 07:20:07.355 [JQ5A9-TEQYM][EC 33554512] 2001:7b8:62b:1:0:d4ff:fe72:7848(Unknown) has not made any SendQ progress for 2 holdtimes (18s), terminating session
I have resolved my problem by killing bgpd with SIGKILL.
Backtraces that are relevant (happened on systemctl restart frr):
Apr 8, 2025 @ 09:57:43.803 Received signal 11 at 1744106263 (si_addr 0x2d0, PC 0x7f441988b5f2); aborting...
Apr 8, 2025 @ 09:57:43.804 /lib64/libfrr.so.0(zlog_backtrace_sigsafe+0x71) [0x7f4419cceeb1]
Apr 8, 2025 @ 09:57:43.804 /lib64/libfrr.so.0(zlog_signal+0xf5) [0x7f4419ccf0b5]
Apr 8, 2025 @ 09:57:43.804 /lib64/libfrr.so.0(+0x109e45) [0x7f4419d09e45]
Apr 8, 2025 @ 09:57:43.804 /lib64/libc.so.6(+0x3e6f0) [0x7f441983e6f0]
Apr 8, 2025 @ 09:57:43.804 /lib64/libc.so.6(+0x8b5f2) [0x7f441988b5f2]
Apr 8, 2025 @ 09:57:43.804 /lib64/libfrr.so.0(+0xb69fd) [0x7f4419cb69fd]Apr 8, 2025 @ 09:57:43.805 /lib64/libfrr.so.0(frr_pthread_stop_all+0x57) [0x7f4419cb5087]
Apr 8, 2025 @ 09:57:43.805 /lib64/libfrr.so.0(frr_pthread_finish+0x1d) [0x7f4419cb68dd]
Apr 8, 2025 @ 09:57:43.805 /lib64/libfrr.so.0(frr_fini+0x78) [0x7f4419cc5b78]
Apr 8, 2025 @ 09:57:43.805 /usr/lib/frr/bgpd(sigint+0x20b) [0x55cb8d69d09b]
Apr 8, 2025 @ 09:57:43.805 /lib64/libfrr.so.0(frr_sigevent_process+0x53) [0x7f4419d08b43]
Apr 8, 2025 @ 09:57:43.805 /lib64/libfrr.so.0(event_fetch+0x6b5) [0x7f4419d1cc85]
Apr 8, 2025 @ 09:57:43.805 /lib64/libfrr.so.0(frr_run+0xe3) [0x7f4419cc5933]
Apr 8, 2025 @ 09:57:43.805 /usr/lib/frr/bgpd(main+0x3f2) [0x55cb8d6943e2]
Apr 8, 2025 @ 09:57:43.806 /lib64/libc.so.6(+0x29590) [0x7f4419829590]
Apr 8, 2025 @ 09:57:43.806 /lib64/libc.so.6(__libc_start_main+0x80) [0x7f4419829640]
Apr 8, 2025 @ 09:57:43.806 /usr/lib/frr/bgpd(_start+0x25) [0x55cb8d695125]gdb symbols:
Reading symbols from /usr/lib/debug/usr/lib64/libfrr.so.0.0.0-10.2.1-01.el9.x86_64.debug...
(gdb) info symbol 0x109e45
core_handler + 181 in section .text of /usr/lib64/libfrr.so.0.0.0
(gdb) info symbol 0xb69fd
fpt_halt + 61 in section .text of /usr/lib64/libfrr.so.0.0.0
gdb /lib64/libc.so.6
(gdb) info symbol 0x3e6f0
__restore_rt in section .text of /usr/lib64/libc.so.6
(gdb) info symbol 0x8b5f2
__pthread_clockjoin_ex + 34 in section .text of /usr/lib64/libc.so.6
Checklist
- I have searched the open issues for this bug.
- I have not included sensitive information in this report.
Metadata
Metadata
Assignees
Labels
triageNeeds further investigationNeeds further investigation