-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8.2.2 stuck with high number of peers/routes with RPKI #10826
Comments
Can you provide at least a configuration? |
or logs? This is pretty useless bug report. |
2022/03/19 02:40:39 STATIC: [MRN6F-AYZC4] Terminating on signal |
Some IP addresses are modified or hidden ! |
Is this the whole log? |
2022/03/22 20:50:01 ZEBRA: [SWQK6-6JY63][EC 4043309074] 0:254:2602:fed3:7021::/48: Failed to enqueue dataplane install |
It's too hard to figure out what you are trying to show when you dump information this way. I've asked you to use the template repeatedly and you never do it. You need to use the template in order for others to make sense of the issues you're reporting. |
the same report in mail list. Today's Topics:
Hi everyone, Just following up on my previous note about BGPD hanging in FRR 8.2.2. I As background, I've got around 60 BGP feeds total in 30 different This hang seems to have a period of 5-7 days. Using FRR 8.2.2 on Ubuntu The latest hang earlier today allowed a colleague to grab debug info /var/log/frr/frr.log shows entries like this: Apr 2 11:46:42 frr watchfrr[52904]: [T58XM-TP956][EC 268435457] bgpd which just repeat every 10 minutes or so. A few hours earlier I was getting: Apr 1 22:53:19 frr bgpd[52925]: [YZRX4-ZXG0C][EC 100663315] Thread Apr 1 23:24:34 frr bgpd[52925]: [YZRX4-ZXG0C][EC 100663315] Thread Trying to connect by vtysh prints message of day, but never a command The only way out is a kill -9 of the BGPD process, followed by a The process stack for bgpd shows: root@frr:~# cat /proc/52925/stack Thread debugging shows: [Thread debugging using libthread_db enabled] I've got about 2.5Mbytes of strace which I'll happily unicast to whoever BTW, this is what's running (after I killed and restarted), including 1707406 ? S<s 0:02 /usr/lib/frr/watchfrr -d -F traditional Any ideas? I'd hate to revert to 8.1 but... philip |
I'll try to replicate and work on this. |
@liuxyon could you enable |
Since version 8.2.2 cannot be used, we have all returned to using version 8.1 |
I can't replicate this with 100k routes and two full RPKI validators (cache servers), but just found a memory leak (which might be a possible reason, don't know, that's why I asked for more details). |
@ton31337 I'm still staying with 8.2.2 and happy to help troubleshoot this. Will get you |
@pfsinoz cool, let me know when you have more details (as I described in a previous comment). |
@ton31337 is the stack trace I have from the last hang of any use at all? |
At least it's quite clear that RPKI-related... |
BTW, just for the record, this is what things look like with "situation normal":
Now we just have to wait for the next hang - probably 3-4 days time. |
@pfsinoz maybe you have more details about this? |
@ton31337 Frustratingly it has not hung since! I'm still waiting, still gathering the data every hour. I've had to restart the system once for another reason, but still no hang since. This is the latest snapshot, from about 40 minutes ago:
I've had a couple of instances where the |
@ton31337 just a quick update... FRR has been up and running for last 12 days now and not exhibited the hang issue. The full BGP feeds do pause for about 20-30 seconds when I do a "sh ip bgp" on them, but I can replicate that on other FRR versions too. I'm left wondering if there were any validator issues that perhaps led to "funny" VRPs being sent to FRR, but I can't even think what those might be. Just weird that the issue has seemingly gone away all by itself. I'm happy to test new/updated code if need be. |
I'm currently facing this exact issue, FRR continues to crash and not recover. |
running frr v8.2.2 use ubuntu 20.0.4 and debian11 version in ubuntu 21.10 system, The routing system is stuck for no reason, causing the frr system to crash. I haven't found the reason yet, but is there any way to find out why?
Also request the release of frr for the latest system version of ubutntu. like ubuntu 21.10 and 21.04
The text was updated successfully, but these errors were encountered: