Skip to content
This repository has been archived by the owner on Jul 12, 2023. It is now read-only.

Crash from assertion in agent.c #268

Closed
brstgt opened this issue Jul 2, 2018 · 7 comments
Closed

Crash from assertion in agent.c #268

brstgt opened this issue Jul 2, 2018 · 7 comments

Comments

@brstgt
Copy link

brstgt commented Jul 2, 2018

KMS Version: dev-master, 2018-06-29

1:51:15.152657484     8 0x7fc988136c00 WARN                 srtpdec gstsrtpdec.c:1149:gst_srtp_dec_decode_buffer:<srtp-decoder:rtp_sink> Unable to unprotect buffer (unprotect failed code 9)
1:51:15.152686393     8 0x7fc988136c00 WARN                 srtpdec gstsrtpdec.c:1178:gst_srtp_dec_decode_buffer:<srtp-decoder> Other error, dropping
1:51:15.372441789     8 0x7fcde4f62a80 FIXME                default gstutils.c:3766:gst_pad_create_stream_id_internal:<nicesrc39428:src> Creating random stream-id, consider implementing a deterministic way of creating a stream-id
1:51:15.586825941     8 0x7fcc3c214320 WARN               rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 62374, expected 62400)
1:51:15.590294716     8 0x7fcc3c214320 WARN               rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 62375, expected 62400)
1:51:15.590352733     8 0x7fcc3c214320 WARN               rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 62376, expected 62400)
1:51:15.596358116     8 0x7fcc3c214320 WARN               rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 62377, expected 62400)
1:51:15.609937376     8 0x7fc118492370 WARN                  rtpbin gstrtpbin.c:1156:stream_set_ts_offset:<rtpbin32704> offset unusually large, ignoring
**
libnice:ERROR:agent.c:2342:agent_signal_component_state_change: assertion failed: (TRANSITION (DISCONNECTED, FAILED) || TRANSITION (GATHERING, FAILED) || TRANSITION (CONNECTING, FAILED) || TRANSITION (CONNECTED, FAILED) || TRANSITION (READY, FAILED) || TRANSITION (DISCONNECTED, GATHERING) || TRANSITION (GATHERING, CONNECTING) || TRANSITION (CONNECTING, CONNECTED) || TRANSITION (CONNECTED, READY) || TRANSITION (READY, CONNECTED) || TRANSITION (FAILED, CONNECTING) || TRANSITION (FAILED, GATHERING) || TRANSITION (DISCONNECTED, CONNECTING))
Aborted (thread 140489430521600, pid 8)
Stack trace:
[__GI_raise]
sysdeps/unix/sysv/linux/raise.c:54
[__GI_abort]
/build/glibc-Cl5G7W/glibc-2.23/stdlib/abort.c:91
[g_assertion_message]
/build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gtestutils.c:2429
[g_assertion_message_expr]
/build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gtestutils.c:2453
[agent_signal_component_state_change]
/opt/kurento/agent/agent.c:2353
[priv_map_reply_to_conn_check_request]
/opt/kurento/agent/conncheck.c:3420
[agent_recv_message_unlocked]
/opt/kurento/agent/agent.c:3886
[component_io_cb]
/opt/kurento/agent/agent.c:5181
[socket_source_dispatch]
/build/glib2.0-prJhLS/glib2.0-2.48.2/./gio/gsocket.c:3543
[g_main_dispatch]
/build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3157
[g_main_context_iterate]
/build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:3840
[g_main_loop_run]
/build/glib2.0-prJhLS/glib2.0-2.48.2/./glib/gmain.c:4033
[gst_nice_src_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x2F24
[gst_base_src_get_range]
/opt/kurento/libs/gst/base/gstbasesrc.c:2465
[gst_base_src_loop]
/opt/kurento/libs/gst/base/gstbasesrc.c:2737
[gst_task_func]
/opt/kurento/gst/gsttask.c:346
Aborted (core dumped)

Other libraries versions:

ii  gir1.2-gst-plugins-base-1.5            1.8.1.1.xenial~20180213131504.55.7b19cfd  amd64        Description: GObject introspection data for the GStreamer Plugins Base library
ii  gir1.2-gstreamer-1.5                   1.8.1.1.xenial~20180213130525.170.0d6031b amd64        Description: GObject introspection data for the GStreamer library
ii  gstreamer1.5-alsa:amd64                1.8.1.1.xenial~20180213131504.55.7b19cfd  amd64        GStreamer plugin for ALSA
ii  gstreamer1.5-libav:amd64               1.8.2.1.xenial~20180214151106.96.493eee4  amd64        libav plugin for GStreamer
ii  gstreamer1.5-libav-dbg:amd64           1.8.2.1.xenial~20180214151106.96.493eee4  amd64        libav plugin for GStreamer (debug symbols)
ii  gstreamer1.5-nice:amd64                0.1.15.xenial~20180213132444.83.28531a4   amd64        ICE library (GStreamer plugin)
ii  gstreamer1.5-plugins-bad:amd64         1.8.1.1.xenial~20170725164047.100.3db37b1 amd64        GStreamer plugins from the "bad" set
ii  gstreamer1.5-plugins-bad-dbg:amd64     1.8.1.1.xenial~20170725164047.100.3db37b1 amd64        GStreamer plugins from the "bad" set (debug symbols)
ii  gstreamer1.5-plugins-base:amd64        1.8.1.1.xenial~20180213131504.55.7b19cfd  amd64        GStreamer plugins from the "base" set
ii  gstreamer1.5-plugins-base-dbg:amd64    1.8.1.1.xenial~20180213131504.55.7b19cfd  amd64        GStreamer plugins from the "base" set
ii  gstreamer1.5-plugins-good:amd64        1.8.1.1.xenial~20180213133150.112.9ee4248 amd64        GStreamer plugins from the "good" set
ii  gstreamer1.5-plugins-good-dbg:amd64    1.8.1.1.xenial~20180213133150.112.9ee4248 amd64        GStreamer plugins from the "good" set
ii  gstreamer1.5-plugins-ugly:amd64        1.8.1.1.xenial~20180214150540.89.2685b0f  amd64        GStreamer plugins from the "ugly" set
ii  gstreamer1.5-plugins-ugly-dbg:amd64    1.8.1.1.xenial~20180214150540.89.2685b0f  amd64        GStreamer plugins from the "ugly" set (debug symbols)
ii  gstreamer1.5-pulseaudio:amd64          1.8.1.1.xenial~20180213133150.112.9ee4248 amd64        GStreamer plugin for PulseAudio
ii  gstreamer1.5-tools                     1.8.1.1.xenial~20180213130525.170.0d6031b amd64        Tools for use with GStreamer
ii  gstreamer1.5-x:amd64                   1.8.1.1.xenial~20180213131504.55.7b19cfd  amd64        GStreamer plugins for X11 and Pango
ii  kms-chroma                             6.7.1.xenial~20180626115825.10.4086adc    amd64        Kurento Chroma filter
ii  kms-chroma-dbg                         6.7.1.xenial~20180626115825.10.4086adc    amd64        Kurento Chroma filter - Debugging symbols
ii  kms-core                               6.7.3.xenial~20180626112538.24.c171bbd    amd64        Kurento Core module
ii  kms-core-dbg                           6.7.3.xenial~20180626112538.24.c171bbd    amd64        Kurento Core module
ii  kms-crowddetector                      6.7.1.xenial~20180626120735.10.bb14a6a    amd64        Kurento Crowd Detector filter
ii  kms-crowddetector-dbg                  6.7.1.xenial~20180626120735.10.bb14a6a    amd64        Kurento Crowd Detector filter - Debug symbols
ii  kms-elements                           6.7.3.xenial~20180626113357.18.37dbc17    amd64        Kurento Elements module
ii  kms-elements-dbg                       6.7.3.xenial~20180626113357.18.37dbc17    amd64        Kurento Elements module
ii  kms-filters                            6.7.2.xenial~20180626114125.12.c075e3c    amd64        Kurento Filters module
ii  kms-filters-dbg                        6.7.2.xenial~20180626114125.12.c075e3c    amd64        Kurento Filters module
ii  kms-jsonrpc                            6.7.1.xenial~20180613184007.8.23d8b98     amd64        Kurento JSON-RPC library
ii  kms-jsonrpc-dbg                        6.7.1.xenial~20180613184007.8.23d8b98     amd64        Kurento JSON-RPC library
ii  kms-platedetector                      6.7.1.xenial~20180626122320.12.e6d7fc6    amd64        Kurento Plate Detector filter
ii  kms-platedetector-dbg                  6.7.1.xenial~20180626122320.12.e6d7fc6    amd64        Kurento Plate Detector filter - Debug symbols
ii  kms-pointerdetector                    6.7.1.xenial~20180626123120.9.ce39350     amd64        Kurento Pointer Detector filter
ii  kms-pointerdetector-dbg                6.7.1.xenial~20180626123120.9.ce39350     amd64        Kurento Pointer Detector filter - Debug symbols
ii  kmsjsoncpp                             1.6.3.xenial.20180613183930.d78deb7       amd64        Kurento jsoncpp library
ii  kmsjsoncpp-dbg                         1.6.3.xenial.20180613183930.d78deb7       amd64        Kurento jsoncpp library
ii  kmsjsoncpp-dev                         1.6.3.xenial.20180613183930.d78deb7       amd64        Kurento jsonrpc library
ii  kurento-media-server                   6.7.3.xenial~20180626114843.4.e7e0a73     amd64        Kurento Media Server
ii  kurento-media-server-dbg               6.7.3.xenial~20180626114843.4.e7e0a73     amd64        Kurento Media Server
ii  libgstreamer-plugins-bad1.5-0:amd64    1.8.1.1.xenial~20170725164047.100.3db37b1 amd64        GStreamer development files for libraries from the "bad" set
ii  libgstreamer-plugins-base1.5-0:amd64   1.8.1.1.xenial~20180213131504.55.7b19cfd  amd64        GStreamer libraries from the "base" set
ii  libgstreamer-plugins-base1.5-dev       1.8.1.1.xenial~20180213131504.55.7b19cfd  amd64        GStreamer development files for libraries from the "base" set
ii  libgstreamer1.5-0:amd64                1.8.1.1.xenial~20180213130525.170.0d6031b amd64        Core GStreamer libraries and elements
ii  libgstreamer1.5-0-dbg:amd64            1.8.1.1.xenial~20180213130525.170.0d6031b amd64        Core GStreamer libraries and elements
ii  libgstreamer1.5-dev                    1.8.1.1.xenial~20180213130525.170.0d6031b amd64        GStreamer core development files
ii  libnice-dbg:amd64                      0.1.15.xenial~20180213132444.83.28531a4   amd64        ICE library (debugging symbols)
ii  libnice-dev                            0.1.15.xenial~20180213132444.83.28531a4   amd64        ICE library (development files)
ii  libnice10:amd64                        0.1.15.xenial~20180213132444.83.28531a4   amd64        ICE library (shared library)
ii  openh264-gst-plugins-bad-1.5:amd64     1.8.1.1.xenial~20170725164047.100.3db37b1 amd64        GStreamer plugins from openh264

Client libraries

  • Language: <Java, Node.js, JavaScript, homebrewed>
  • Version:

Browsers tested
Add OK or FAIL, along with the version, after browsers where you
have tested this issue:

  • Chrome:
  • Firefox:
  • Native:

System description:
Please describe your setup (where is KMS located, where are the
clients, STUN, TURN...)

What steps will reproduce the problem?
Happened in production, can't tell how to repro

What is the expected result?
No Crash

What happens instead?
Crash

Does it happen with one of the tutorials?
No

Please provide any additional information below.

@brstgt
Copy link
Author

brstgt commented Jul 2, 2018

I had this crash many times today, it seems to happen whenever the server is getting a bit more busy. Unfortunately I don't have more details right now

@j1elo
Copy link
Member

j1elo commented Sep 18, 2018

If this error happens, as an emergency measure it's possible to downgrade libnice version to the one available in Kurento Release repo (libnice version 0.1.13.1.xenial~20170725160546.81.eebfdab in Xenial); the old, "stable" 0.1.13 version has a custom modification from a previous Kurento developer, that removes the assertion (replaces "g_assert" it with a more permissive "g_warn_if_fail"). This modification was lost when updating to latest libnice upstream, as I'm finding out now. I'll make sure this makes its way back to the experimental versions!

If it's possible, please before downgrading try to get libnice debug logs (you'll need to launch KMS on console, the "service init" won't log libnice messages yet). A command such as this should do it:

{
    sudo service kurento-media-server stop
    source /etc/default/kurento-media-server
    export GST_DEBUG="${GST_DEBUG:-3},kmsiceniceagent:5,kmswebrtcsession:5,webrtcendpoint:4"
    export G_MESSAGES_DEBUG="libnice,libnice-stun"
    export NICE_DEBUG="$G_MESSAGES_DEBUG"
    /usr/bin/kurento-media-server
}

libnice, when running with debug enabled, outputs a message such as this:

"Agent %p : stream %u component %u STATE-CHANGE %s -> %s."

(with placeholders filled with actual values). We'd benefit from knowing what's the state change that's causing this crash.

Also for completeness, if possible, also repeat the same procedure with the older version of libnice; it won't crash, but will still print some warning with the exact same format. We'll need to know what are those state changes that are not expected by libnice (see here) to then tell libnice maintainer about this issue (I'll probably open a bug report on your behalf, unless you prefer to do it yourself).

@jmaiquez
Copy link

jmaiquez commented Sep 18, 2018

Hi Juan,

We were able to catch this error twice on the same server:
KMS: 6.7.2~19.g181284d
libnice: 0.1.15.xenial~20180709152002.83.28531a4

I've attached the logs captured when launching KMS on the console.

We were able to reproduce this with a user that was inside a corporate network in France. The crash occurred when that user stopped publishing his stream.

He was nice enough to help us reproduce this, but we no longer have access to him- he was an external speaker for our customer.

It seems that the nature of corporate networks might have something to do with this particular bug.

it won't crash, but will still print some warning

We did not see any logs when using the same KMS version with 0.1.13.1.xenial~20170725160546.81.eebfdab

But there was definitely weird behavior going on:

  • user A broadcasting (us)
  • user B broadcasting (our customer)
  • user C broadcasting(external corporate speaker)
  • we could see & hear B and C
  • user C stops broadcasting
  • no crash, but we could no longer see or hear user B

So .13 still has the issue (as you mentioned), but it simply does not crash kurento. It's a step better than a crash, but still problematic for our customer.

I really hope this helps!

Best regards,
Jorge

1.txt
2.txt

@j1elo
Copy link
Member

j1elo commented Sep 19, 2018

@jmaiquez I've checked your linked debug logs, but it turns out the reason for crash in those two cases is not the assert problem described in this issue report and in your groups message; instead KMS crashed due to the proverbial bug in libnice socket code (Kurento report, libnice report).

They key to differentiate is that the assert has this stack trace:

/lib/x86_64-linux-gnu/libc.so.6:0x35428
[abort]
/lib/x86_64-linux-gnu/libc.so.6:0x3702A
[g_assertion_message]
/build/glib2.0-b4FPyK/glib2.0-2.48.2/./glib/gtestutils.c:2429
[g_assertion_message_expr]
/build/glib2.0-b4FPyK/glib2.0-2.48.2/./glib/gtestutils.c:2453
[agent_signal_component_state_change]
/opt/libnice/agent/agent.c:2353

And the socket crash has this other:

[g_socket_send_message]
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044
[socket_send_message]
/opt/libnice/socket/tcp-bsd.c:309
[socket_send_messages]
/opt/libnice/socket/tcp-bsd.c:362
[nice_agent_send_messages_nonblocking_internal]
/opt/libnice/agent/agent.c:4833

Please keep monitoring your servers to see if we can catch the invalid state transition (remember that in old, Release version of libnice 0.1.13, the assert was replaced by a warning, so you won't find a Kurento crash when the invalid state change occurs).

I'll provide your debug logging to the libnice socket crash issue report, maybe they are helpful for the libnice team.

@jmaiquez
Copy link

jmaiquez commented Sep 21, 2018

Hi Juan,

Understood. Thanks for the clarification. Our customer has been running webinars all week, and we have also been testing. Here are our observations from this week.

Monday, 17 Sep

  • pre-event testing with corporate network speaker
  • low load (just 2 users: corporate network speaker + our customer)
  • KMS crashed twice when the corporate network speaker
  • no logs pulled (KMS was restarted, which overwrites the error log)
  • KMS 6.7.2 + libnice 0.1.15.xenial were in play

Tuesday, 18 Sep

  • webinar event, in which the corporate network speaker participated
  • user load was 30 viewers with 3 broadcasters
  • initially, we used a KMS VM running KMS 6.7.2 + libnice 0.1.15.xenial
  • KMS crashed quickly when the corporate network speaker started broadcasting
  • this is the log I sent you here and shows the assert bug
  • our load balancer moved the webinar to a VM running KMS 6.7.0 + libnice 0.1.13.1.xenial
  • no more crashes, but the weird behavior I described here was observed

Tuesday, 18 Sep

  • post-webinar troubleshooting in the evening with the corporate network speaker
  • low load (just 3 users)
  • KMS 6.7.2 + libnice 0.1.15.xenial was tested first
  • KMS crashed twice
  • those are the 2 logs I sent here, which show the socket crash bug
  • KMS 6.7.2 + libnice 0.1.13.1.xenial was tested next
  • no more crashes, but same weird behavior as the KMS 6.7.0 + libnice 0.1.13.1.xenial combination described earlier

Wednesday, 19 Sep

  • webinar event (without that corporate network speaker)
  • user load was 32 viewers with 3 broadcasters
  • all our VMs were now on KMS 6.7.2 + libnice 0.1.13.1.xenial
  • around 40 minutes into the webinar, KMS crashed
  • the log (not sent, but similar to others) shows the socket crash bug

LOW LOAD SITUATIONS

  • corporate user produced socket crash bug when using libnice 0.1.15.xenial
  • corporate user produced weird behavior when using libnice 0.1.13.1.xenial- we can only speculate that this is the assert bug presenting itself, but no logs are produced
  • internal testing, trying to mimic a corporate-type setup produced no problems

HIGHER LOAD SITUATIONS

  • corporate user produced assert bug, which crashed libnice 0.1.15.xenial
  • corporate user produced weird behavior when using libnice 0.1.13.1.xenial- we can only speculate that this is the assert bug presenting itself, but no logs are produced
  • without this corporate user, KMS seems to crash without any specific user interaction, and usually after a longer period of time

SUMMARY

  • With libnice 0.1.15.xenial, a certain type of corporate network user will produce a socket crash quickly, independent of load.
  • With libnice 0.1.13.1.xenial, a certain type of corporate network user will produce weird behavior, independent of load (might be the assert bug).
  • With libnice 0.1.13.1.xenial, “normal” users under low load does not produce a crash.
  • With libnice 0.1.13.1.xenial, “normal” users under high load will produce a socket crash after some period of time.

CONCLUSIONS

  • Certain client/network configurations cause problems (crash or weird behavior), independent of load.
  • Higher load situations cause socket crash, even without problematic clients/networks.

We have been so swamped with crisis management that we have not been able to put together an effective load testing platform, but (ignoring the special client/network configurations for a while) it seems to me that the key to reproducing the socket crash is load testing. How much load have you guys been able to put on KMS and run a session for over an hour?

Best regards,
Jorge

@j1elo
Copy link
Member

j1elo commented Oct 18, 2018

We have replaced the undesired assert() in our fork with a saner warning that won't break if the condition applies. Please check my comment in the other issue.

By the way it turns out that the assert() is actually part of the original libnice sources!
If you ask me, using assertions in production code is a Very Bad practice, to say the least. OK for debug builds, but totally unacceptable for releases. That's my personal policy, anyway. We could probably file a bug report in libnice to ask for removing that assert...

The latest Pre-Release builds don't have the assertion any more, and can be installed with a simple apt-get install kurento-media-server.

You can check if the latest version of libnice is installed in your system if you run this command:

apt-cache policy libnice10

and the result looks like this:

libnice10:
  Installed: 0.1.15-1kurento1~20181018[...]

This version will stay in Pre-Release for some time to give people time to try it, and will be promoted to Release if no regressions or blocking bugs are found.

@j1elo
Copy link
Member

j1elo commented Jun 25, 2020

Kurento 6.14 uses libnice 0.1.17, which has received a ton of fixes and improvements. So this is probably fixed by now.

@j1elo j1elo closed this as completed Jun 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants