-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[bug]: bridge_channel.c: Stream topology change amplification with multiple layers of Local channels #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Have you tested with the inclusion of the patch from #403? |
Or if you revert 315eb551dbd18ecd424a2f32179d4c1f6f6edd26 does it resolve it? |
I applied the patch from #403 and it resolved my minimally reproducible example but not the actual issue that I stumbled into in the first place, so now I'll have to come up with a new, more complex example. This happens with all channel technologies, not just PJSIP, so I think it's safe to say it is not resolved by either of the above. I've narrowed down now I can make the issue reproduce or not reproduce based on replacing |
Okay, I figured out how to get it to still reproduce with #403. Just add another layer of dialing a Local channel. So add:
And then change the Dial to:
|
Try reverting the commit @jcolp mentioned above. We've had another report of this (and can reproduce it) and narrowed it down to that commit from November 2022. |
@InterLinked1 We were going to cut release candidates today so if you can test the revert and it fixes your issue, we can include the revert in the RCs. |
This seems to be somewhat subtle. The example I showed actually didn't reproduce with DAHDI so maybe you guys are right that it's that PJSIP thing. However, I reverted lines 1938-1946 in res_pjsip_session.c, and the issue still happens with that modified reproducible example. Didn't bother with the |
The res_pjsip_session change is not the huge change, the change to channel.c is. |
I'm not following. Did you revert the changes in channel.c from 315eb55? |
Ah, I'm dumb. I saw that but it looked like the same code from #403 so I assumed I already had it and #403 was just reverting that bit. Same code but different file. Yes, this resolves the new example and the original issue - thanks for the catch! |
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themseles in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: asterisk#530
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themselves in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: asterisk#530
It's not strictly required, but does no harm in still being present. |
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themselves in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: #530
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themselves in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: #530
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themselves in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: #530
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themselves in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: #530
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themselves in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: #530 (cherry picked from commit b71e338)
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themselves in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: #530 (cherry picked from commit 4715c1b)
This reverts commit 315eb55. Over the past year, we've had several reports of "topology storms" occurring where 2 external facing channels connected by one or more local channels and bridges will get themselves in a state where they continually send each other topology change requests. This usually manifests itself in no-audio calls and a flood of "Exceptionally long queue length" messages. It appears that this commit is the cause so we're reverting it for now until we can determine a more appropriate solution. Resolves: #530 (cherry picked from commit c31cd32)
I'm having an issue on Asterisk 20.8.1 with the same symptoms as originally described in this issue, but I'm not quite sure if it has the same root cause, so I wanted to check here first before opening a new issue. I am experiencing:
My setup is a local channel that is recorded on both sides (and each side is in a bridge with another channel using PJSIP), and this seems to be enough to trigger the issue. I'm not using video in any way. Basically, my issue is exactly the same as this 2021 post, but that post had no replies, and that is literally the only thing I can find on Google for "Indicated Video Update". Does this sound like the same issue that was reported here, or like a separate issue? |
It's most likely a separate issue. |
Thank you, created issue #780 for it |
Severity
Major
Versions
21.0.0
Components/Modules
bridge_channel.c
Operating Environment
Debian 12
Frequency of Occurrence
Constant
Issue Description
I dug into a specific instance of the infamous
Extremely long queue length queuing...
warnings that I was seeing and found that this is due to an amplification of stream topology change control frames (AST_CONTROL_STREAM_TOPOLOGY_CHANGED
) being propagated back and forth between the channels. By this, I mean if I analyze how many frames are queued on the channel, it grows to oblivion, quickly reaching thousands of queued frames within seconds and this will keep growing as long as the channels are bridged.As far as these kinds of things go, they make the bridge completely unusable, and result in a generally degraded system due to high CPU usage - and left unchecked, this is technically a resource leak until channels are hung up since frames will accumulate forever - faster than they are being consumed. If users don't do that manually, then I suspect Asterisk would, in time, use all available memory. Based on my experience, I suspect a number of instances of this error message might have the same underlying issue.
I've put together a sample dialplan that reproduces this as well as included logs from running against that, with some debug messages added to show the continuous stream topology change frames. This happens precisely when dialing a queue agent that is via a local channel and the agent does an attended transfer, and when the channel gets bridged.
The issue only happens when using a local channel; however,
core_unreal
isn't constantly triggering the stream topology change frames, that might be happening just in the bridging core.(Either
Queue
orDial
can be used, though Dial is simpler for reproduction purposes).Caller A calls B, and B initiates an attended transfer, e.g. using *2 (starting at
test-operator-transfer,1,1
)The issue happens with both
Wait
andBridgeWait
, i.e. regardless of if the channel is masqueraded.Example below where party A dials extension 123, triggering the issue when B (
PJSIP/Polycom5
in this example) initiates the attended transfer, more precisely 3 seconds afterwards, when the channels are bridged.I went ahead and reproduced it using the following on a second system, using DAHDI instead of PJSIP for both endpoints, without modifications to log bridging changes but indeed I do get hundreds of these messages, so this issue is very easy to reproduce consistently:
DEBUG[136258][C-000001e4]: chan_dahdi.c:9344 dahdi_indicate: Requested indication 36 on channel DAHDI/18-1
GitHub complains the relevant log output is too long, so I've linked it here: https://paste.interlinked.us/zl4geodvat.txt
Relevant log output
No response
Asterisk Issue Guidelines
The text was updated successfully, but these errors were encountered: