New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libnice crashes in socket code: g_socket_send_message (socket=0x0) #247

Closed
j1elo opened this Issue Apr 12, 2018 · 75 comments

Comments

Projects
None yet
@j1elo
Contributor

j1elo commented Apr 12, 2018

Mail thread: https://groups.google.com/d/topic/kurento/_rf1ANq5Cm8/discussion
Related: #208

Analysis
It seems that the crash is happening in the 3rd-party library libnice, during socket access:

#5  0x00007f87c1272044 in g_socket_send_message (socket=0x0, address=address@entry=0x0, vectors=0x7f875c3f9840, num_vectors=2, messages=messages@entry=0x0, num_messages=num_messages@entry=0, flags=0, cancellable=0x0, error=0x7f86dc6951b0) at /build/glib2.0-prJhLS/glib2.0-2.48.2/./gio/gsocket.c:4255
#6  0x00007f87bedb8cbf in socket_send_message (sock=sock@entry=0x7f87940596e0, message=message@entry=0x7f86dc6952a0, reliable=reliable@entry=0) at tcp-bsd.c:306
#7  0x00007f87bedb8f3b in socket_send_messages (sock=0x7f87940596e0, to=<optimized out>, messages=<optimized out>, n_messages=1) at tcp-bsd.c:360
#8  0x00007f87beda0ae9 in nice_agent_send_messages_nonblocking_internal (agent=0x7f875c104650 [NiceAgent], stream_id=<optimized out>, component_id=<optimized out>, messages=0x7f875c3bfdb0, n_messages=5, allow_partial=0, error=0x0) at agent.c:4748

Crash occurs because of g_socket_send_message (socket=0x0). For some reason, g_socket_send_message is called with NULL socket, so it breaks in line 4255 of glib/gsocket.c.

Seems that libnice is famous for having random crashes in socket code. I could find more information about similar issues here and here; in the second link, they mention that some of the crashes were fixed in 2016 with this commit, so I assume that the fix is included in the libnice release version 0.1.14, on Apr 3, 2017.

Kurento uses libnice 0.1.13, so it suffers the bug.

Solution
Proposed solution is to update the version of libnice, however this is a big update, and we tried it already in the past but several integration tests broke. Currently, updating libnice to 0.1.14 will break some use cases due to the changes it brings. So updating this library requires careful planning and lots of tests.

Update 2018-08-22
This is a bug in libnice, reported 9 months ago and currently tracked here:
libnice issue #33 segfault in g_socket_send_message

We still don't have enough information about the cause of the issue, and have been unable to reproduce it in debugging sessions (as it seems to be a crash that only happens with some amount of load on production servers). As it is, the best option is to show interest in solving that bug in the upstream project's bug tracker. They have all the context and the know-how about libnice's code base and will probably be able to solve the issue.

@zhangalex

This comment has been minimized.

zhangalex commented Apr 15, 2018

Could you please tell me when the bug will be fixed? thank you.

@bugwheels94

This comment has been minimized.

bugwheels94 commented Apr 25, 2018

It is getting very problematic and kurento is not usable anymore in any production environment. We are losing people. May you please give me a timeline about this issue? Thanks

@micaelgallego

This comment has been minimized.

Member

micaelgallego commented Apr 26, 2018

@Kukunin

This comment has been minimized.

Kukunin commented Jun 1, 2018

Affected by this issue as well. That's sad since we also can't use Kurento in production because it fails every a couple of hours =( Any possible workarounds? (cherry picking, custom builds, etc?)

@micaelgallego

This comment has been minimized.

Member

micaelgallego commented Jun 1, 2018

Hi Sergey,

We are going to put some effort on this issue in the following days. We are going to update libnice lib to the last version to see if it solves the problem.

Please stay tuned

@Kukunin

This comment has been minimized.

Kukunin commented Jul 26, 2018

Please, can you provide a little update on this issue? We can't go live without having this fixed

@JerryGwd

This comment has been minimized.

JerryGwd commented Aug 8, 2018

I faced the same problem, as follow:
egmentation fault (thread 139909583050496, pid 8231)
Stack trace:
[g_socket_send_message]
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044
[nice_output_stream_new]
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x2ACBF
[nice_output_stream_new]
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x2AF3B
[nice_agent_recv_nonblocking]
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x12AE9
[gst_nice_src_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x36B2
[gst_nice_sink_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x3FB3
[gst_base_sink_do_preroll]
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2A1B2
[gst_base_sink_do_preroll]
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2B620
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
[gst_proxy_pad_chain_default]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x5F5E3
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
0x1B48D at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533

@j1elo

This comment has been minimized.

Contributor

j1elo commented Aug 9, 2018

Hi,

We've been working on this issue, and found out that the latest development branch of libnice seems to work better. In our tests, either versions 0.1.13 and 0.1.14 crashed, but the development branch for 0.1.15 (or as libnice creators like to put it, "0.1.14.1") is currently working pretty well.

It would help a lot if we could confirm with more people if this is only due to our specific test environment, or if this latest version of libnice does actually solve problems that are being encountered by KMS users.

I've prepared Debian package files from upstream libnice commit 090d3dba, the latest as of this week. If you have a staging server where you could try out this version, it would help us all to know whether this version is a good candidate to be included in the upcoming release of Kurento 6.8.

Installation steps for a clean Ubuntu 16.04 (Xenial) server:

  1. Download the experimental packages.

For Kurento Pre-Release:

wget -O test.zip https://www.dropbox.com/sh/525depzmhj2vt47/AADcgc4o_QwjcZpaBlQWqgspa?dl=1

For Kurento Release:

No build available.
  1. Install the media server:
sudo apt-get update
sudo apt-get install kurento-media-server
  1. Install the debug symbols. These will provide needed information in case of a crash. Run the apt-get steps to install all -dbg symbols, as explained in Media Server crashed.

  2. Install the experimental version of the provided package(s):

unzip test.zip
sudo dpkg -i ./*.*deb
  1. If any errors happen, force installing any missing dependencies and try again:
sudo apt-get install -f
sudo dpkg -i ./*.*deb
  1. Test again your use case and let us know if it fails again with the same issue.
@Richard-Aasa

This comment has been minimized.

Richard-Aasa commented Aug 14, 2018

(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : STUN-CC RESP to '213.184.55.243:57201', socket=47, len=80, cand=0x7fd8640b4070 (c-id:1), use-cand=1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : Found a matching pair 0x7fd864100820 (6:remote1) (SUCCEEDED) ...
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : nothing to do for pair 0x7fd864100820.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Pruning pending checks. Highest nominated priority is 4341472238214462975
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : conn.check list status: 3 nominated, 3 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : marking pair 0x7fd864100820 (6:remote1) as nominated
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Pruning pending checks. Highest nominated priority is 4341472238214462975
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : conn.check list status: 3 nominated, 3 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received.
(kurento-media-server:13770): libnice-stun-DEBUG: STUN error: Incomplete message: 61 of 65300 bytes!
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40: agent_recv_message_unlocked returned -1, errno (11) : Resource temporarily unavailable
(kurento-media-server:13770): libnice-DEBUG: component_io_cb: 0x7fd88407ba40: error receiving message
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40 : Retransmissions failed, giving up on pair 0xff2000
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40 : pair 0xff2000 state FAILED (candidate_check_pair_fail)
(kurento-media-server:13770): libnice-DEBUG: Detach socket 0x7fd898050b10.
(kurento-media-server:13770): libnice-DEBUG: Detaching source 0xec4550 (socket 0x7fd898050b10, FD 138) from context 0x7fd89409bc80
(kurento-media-server:13770): libnice-DEBUG: Detaching source (nil) (socket 0x7fd898050b10, FD 138) from context (nil)

[1]+ Stopped /usr/bin/kurento-media-server
$ Segmentation fault (thread 140566658152192, pid 13770)
Stack trace:
[g_socket_send_message]
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044
[socket_send_message]
/opt/libnice/socket/tcp-bsd.c:309
[socket_send_messages]
/opt/libnice/socket/tcp-bsd.c:362
[nice_agent_send_messages_nonblocking_internal]
/opt/libnice/agent/agent.c:4833
[gst_nice_src_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x36A2
[gst_nice_sink_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x3FA3
[gst_base_sink_do_preroll]
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2A1B2
[gst_base_sink_do_preroll]
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2B620
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
[gst_proxy_pad_chain_default]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x5F5E3
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
0x1B48D at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533

Hope this helps. Seems something to do with partial messages or nulled sockets.

@j1elo

This comment has been minimized.

Contributor

j1elo commented Aug 14, 2018

Hi, thanks for testing! I've updated the corresponding bug report in libnice's bugtracker: https://gitlab.freedesktop.org/libnice/libnice/issues/33#note_19329

But, we'll need more information about this crash, as that's the first thing they will need to fix this. Do you have any additional information, especially about how to reproduce the crash?

@Richard-Aasa

This comment has been minimized.

Richard-Aasa commented Aug 14, 2018

@Richard-Aasa

This comment has been minimized.

Richard-Aasa commented Aug 17, 2018

We had some miscommunication and turns out the error was not produced on patched libnice. Here's the stacktrace that actually happened on the provided patch.

Kurento Media Server version: 6.7.1 
Found modules: 
       'core' version 6.7.1 
       'elements' version 6.7.1 
       'filters' version 6.7.1

ii  gir1.2-nice-0.1:amd64                  0.1.15-1ubuntu1~20180808133501.gbpae8742   amd64        ICE library (GObject introspection) 
ii  gstreamer1.5-nice:amd64                0.1.15-1ubuntu1~20180808133501.gbpae8742   amd64        ICE library (GStreamer 1.5 plugin) 
ii  libnice10:amd64                        0.1.15-1ubuntu1~20180808133501.gbpae8742   amd64        ICE library (shared library)

(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : STUN-CC RESP to '213.184.55.243:60754', socket=34, len=80, cand=0x7ff904005ef0 (c-id:1), use-cand=1. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : Found a matching pair 0x7ff91c021590 (6:remote1) (SUCCEEDED) ... 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : nothing to do for pair 0x7ff91c021590. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650: Finding highest priority for component 1 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650: Pruning pending checks. Highest nominated priority is 4341472238214462975 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : conn.check list status: 1 nominated, 1 valid, c-id 1. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : marking pair 0x7ff91c021590 (6:remote1) as nominated 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650: Finding highest priority for component 1 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650: Pruning pending checks. Highest nominated priority is 4341472238214462975 
(kurento-media-server:4362): libnice-DEBUG: Agent 0xa7d650 : conn.check list status: 1 nominated, 1 valid, c-id 1. 
(kurento-media-server:4362): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: inbound STUN packet for 1/1 (stream/component) from [213.184.55.243]:60803 (100 octets) : 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN demux: OK! 
(kurento-media-server:4362): libnice-stun-DEBUG: Comparing username/ufrag of len 9 and 4, equal=0 
(kurento-media-server:4362): libnice-stun-DEBUG:   username: 0x665551463a75782f72 
(kurento-media-server:4362): libnice-stun-DEBUG:   ufrag:    0x66555146 
(kurento-media-server:4362): libnice-stun-DEBUG: Found valid username, returning password: 'NAypFUPXKnVufpeu1zq7Wj' 
(kurento-media-server:4362): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 
(kurento-media-server:4362): libnice-stun-DEBUG:   key     : 0x4e417970465550584b6e567566706575317a7137576a 
(kurento-media-server:4362): libnice-stun-DEBUG:   expected: 0xe60e79247caf1e6f0ce35f17ce2bf2c1558ea912 
(kurento-media-server:4362): libnice-stun-DEBUG:   received: 0xe60e79247caf1e6f0ce35f17ce2bf2c1558ea912 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN auth: OK! 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN unknown: 0 mandatory attribute(s)! 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN Reply (buffer size = 1300)... 
(kurento-media-server:4362): libnice-stun-DEBUG:  Message HMAC-SHA1 message integrity: 
(kurento-media-server:4362): libnice-stun-DEBUG:   key     : 0x4e417970465550584b6e567566706575317a7137576a 
(kurento-media-server:4362): libnice-stun-DEBUG:   sent    : 0x3bfebf4eff653d3136bcc700ac33b980fd527fa3 
(kurento-media-server:4362): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 0x29668cba 
(kurento-media-server:4362): libnice-stun-DEBUG:  All done (response size: 80) 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : STUN-CC RESP to '213.184.55.243:60803', socket=60, len=80, cand=0x7ff92c0734e0 (c-id:1), use-cand=1. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : Found a matching pair 0x7ff92c046960 (6:remote1) (SUCCEEDED) ... 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : nothing to do for pair 0x7ff92c046960. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: Finding highest priority for component 1 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: Pruning pending checks. Highest nominated priority is 4341472238197816831 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : conn.check list status: 2 nominated, 2 valid, c-id 1. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : marking pair 0x7ff92c046960 (6:remote1) as nominated 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: Finding highest priority for component 1 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780: Pruning pending checks. Highest nominated priority is 4341472238197816831 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff914062780 : conn.check list status: 2 nominated, 2 valid, c-id 1. 
(kurento-media-server:4362): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 :STUN transaction retransmitted on pair 0x7ff928009ac0 (timer=6/7 2/3840ms). 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 : discovery tick #4101 with list 0x7ff910001d50 (1) 
(kurento-media-server:4362): libnice-stun-DEBUG: STUN error: Incomplete message: 61 of 65300 bytes! 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 : stream 1: timer tick #201: 0 frozen, 4 in-progress, 0 waiting, 1 succeeded, 0 discovered, 1 nominated, 0 waiting-for-nom, 1 valid. 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20: agent_recv_message_unlocked returned -1, errno (11) : Resource temporarily unavailable 
(kurento-media-server:4362): libnice-DEBUG: component_io_cb: 0x7ff93c1b8e20: error receiving message 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 : Retransmissions failed, giving up on pair 0x7ff908022b90 
(kurento-media-server:4362): libnice-DEBUG: Agent 0x7ff93c1b8e20 : pair 0x7ff908022b90 state FAILED (candidate_check_pair_fail) 
(kurento-media-server:4362): libnice-DEBUG: Detach socket 0x7ff9280c80a0. 
(kurento-media-server:4362): libnice-DEBUG: Detaching source 0x7ff908010f80 (socket 0x7ff9280c80a0, FD 76) from context 0x7ff9380e4f20 
(kurento-media-server:4362): libnice-DEBUG: Detaching source (nil) (socket 0x7ff9280c80a0, FD 76) from context (nil) 

[1]+  Stopped                 /usr/bin/kurento-media-server 
$ Segmentation fault (thread 140707155990272, pid 4362) 
Stack trace: 
[g_socket_send_message] 
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044 
[nice_output_stream_new] 
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x2B04F 
[nice_output_stream_new] 
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x2B2CB 
[nice_agent_recv_nonblocking] 
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x12CF9 
[gst_nice_src_get_type] 
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x36A2 
[gst_nice_sink_get_type] 
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x3FA3 
[gst_base_sink_do_preroll] 
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2A1B2 
[gst_base_sink_do_preroll] 
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2B620 
[gst_flow_get_name] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF 
[gst_pad_push] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533 
[gst_proxy_pad_chain_default] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x5F5E3 
[gst_flow_get_name] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF 
[gst_pad_push] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533 
0x1B48D at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so 
[gst_flow_get_name] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF 
[gst_pad_push] 
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
@j1elo

This comment has been minimized.

Contributor

j1elo commented Aug 17, 2018

Hi,
that stack trace doesn't contain any file names and line numbers; please make sure you follow the step 3 from the install instructions; also in step 4, make sure you install packages gstreamer1.5-nice-dbgsym and libnice10-dbgsym, which contain debug symbols of libnice itself.

@Kukunin

This comment has been minimized.

Kukunin commented Aug 17, 2018

Here is my crash report on the updated libnice:

[1]+ Stopped /usr/bin/kurento-media-server
root@ns3113927:/home# [g_socket_send_message]
/build/glib2.0-b4FPyK/glib2.0-2.48.2/./gio/gsocket.c:4255
[socket_send_message]
/workspace/socket/tcp-bsd.c:309
[socket_send_messages]
/workspace/socket/tcp-bsd.c:362
[nice_agent_send_messages_nonblocking_internal]
/workspace/agent/agent.c:4833
[gst_nice_sink_render_buffers]
/workspace/gst/gstnicesink.c:301
[gst_nice_sink_render]
/workspace/gst/gstnicesink.c:362
[gst_base_sink_chain_unlocked]
/opt/kurento/libs/gst/base/gstbasesink.c:3544
[gst_base_sink_chain_main]
/opt/kurento/libs/gst/base/gstbasesink.c:3656
[gst_pad_chain_data_unchecked]
/opt/kurento/gst/gstpad.c:4185
[gst_pad_push]
/opt/kurento/gst/gstpad.c:4556
[gst_proxy_pad_chain_default]
/opt/kurento/gst/gstghostpad.c:127
[gst_pad_chain_data_unchecked]
/opt/kurento/gst/gstpad.c:4185
[gst_pad_push]
/opt/kurento/gst/gstpad.c:4556
[gst_funnel_sink_chain_object]
/opt/kurento/plugins/elements/gstfunnel.c:454
[gst_pad_chain_data_unchecked]
/opt/kurento/gst/gstpad.c:4185
[gst_pad_push]
/opt/kurento/gst/gstpad.c:4556
@Kukunin

This comment has been minimized.

Kukunin commented Aug 17, 2018

I have the following packages:
dpkg -l | grep libnice:

ii libnice10:amd64 0.1.15-1ubuntu1~20180808133501.gbpae8742 amd64 ICE library (shared library)
ii libnice10-dbgsym:amd64 0.1.15-1ubuntu1~20180808133501.gbpae8742 amd64 debug symbols for package libnice10

dpkg -l | grep gstreamer

ii gstreamer1.5-alsa:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plu gin for ALSA
ii gstreamer1.5-libav:amd64 1.8.2.1.xenial~20170725171352.96.493eee4 amd64 libav plugin for GStreamer
ii gstreamer1.5-libav-dbg:amd64 1.8.2.1.xenial~20170725171352.96.493eee4 amd64 libav plugin for GStreamer (debug symbols)
ii gstreamer1.5-nice:amd64 0.1.15-1ubuntu1~20180808133501.gbpae8742 amd64 ICE library ( GStreamer 1.5 plugin)
ii gstreamer1.5-nice-dbgsym:amd64 0.1.15-1ubuntu1~20180808133501.gbpae8742 amd64 debug symbols for package gstreamer1.5-nice
ii gstreamer1.5-plugins-bad:amd64 1.8.1.1.xenial~20170725164047.100.3db37b1 amd64 GStreamer plu gins from the "bad" set
ii gstreamer1.5-plugins-bad-dbg:amd64 1.8.1.1.xenial~20170725164047.100.3db37b1 amd64 GStreamer plu gins from the "bad" set (debug symbols)
ii gstreamer1.5-plugins-base:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plu gins from the "base" set
ii gstreamer1.5-plugins-base-dbg:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plu gins from the "base" set
ii gstreamer1.5-plugins-good:amd64 1.8.1.1.xenial~20170725161537.112.9ee4248 amd64 GStreamer plu gins from the "good" set
ii gstreamer1.5-plugins-good-dbg:amd64 1.8.1.1.xenial~20170725161537.112.9ee4248 amd64 GStreamer plu gins from the "good" set
ii gstreamer1.5-plugins-ugly:amd64 1.8.1.1.xenial~20170725170621.89.2685b0f amd64 GStreamer plu gins from the "ugly" set
ii gstreamer1.5-plugins-ugly-dbg:amd64 1.8.1.1.xenial~20170725170621.89.2685b0f amd64 GStreamer plu gins from the "ugly" set (debug symbols)
ii gstreamer1.5-pulseaudio:amd64 1.8.1.1.xenial~20170725161537.112.9ee4248 amd64 GStreamer plu gin for PulseAudio
ii gstreamer1.5-x:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer plu gins for X11 and Pango
ii libgstreamer-plugins-bad1.5-0:amd64 1.8.1.1.xenial~20170725164047.100.3db37b1 amd64 GStreamer dev elopment files for libraries from the "bad" set
ii libgstreamer-plugins-base1.5-0:amd64 1.8.1.1.xenial~20170725154709.55.7b19cfd amd64 GStreamer lib raries from the "base" set
ii libgstreamer1.5-0:amd64 1.8.1.1.xenial~20170725152356.170.0d6031b amd64 Core GStreame r libraries and elements
ii libgstreamer1.5-0-dbg:amd64 1.8.1.1.xenial~20170725152356.170.0d6031b amd64 Core GStreame r libraries and elements
@Richard-Aasa

This comment has been minimized.

Richard-Aasa commented Aug 22, 2018

Same message. Bug exists on both the dev and latest. Still can't reproduce it. We tried different latency, bandwith, dropping connections and those don't seem to affect it either. If your patch seemed to fix it, then is this something completely unrelated? @j1elo I know this ain't the place to ask for estimates, but can You give an estimate if this problem is solvable for You in the next 30 days? Currently it's completely unusable.

(kurento-media-server:13770): libnice-stun-DEBUG: STUN demux: OK!
(kurento-media-server:13770): libnice-stun-DEBUG: Comparing username/ufrag of len 9 and 4, equal=0
(kurento-media-server:13770): libnice-stun-DEBUG:   username: 0x56336e5a3a326c5465
(kurento-media-server:13770): libnice-stun-DEBUG:   ufrag:    0x56336e5a
(kurento-media-server:13770): libnice-stun-DEBUG: Found valid username, returning password: 'zusuWV9dq9WWtaE8ZZ9YhB'
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint:
(kurento-media-server:13770): libnice-stun-DEBUG:   key     : 0x7a7573755756396471395757746145385a5a39596842
(kurento-media-server:13770): libnice-stun-DEBUG:   expected: 0x49a816ea79cc96ef7dfb6693a20cd221a5ba64a1
(kurento-media-server:13770): libnice-stun-DEBUG:   received: 0x49a816ea79cc96ef7dfb6693a20cd221a5ba64a1
(kurento-media-server:13770): libnice-stun-DEBUG: STUN auth: OK!
(kurento-media-server:13770): libnice-stun-DEBUG: STUN unknown: 0 mandatory attribute(s)!
(kurento-media-server:13770): libnice-stun-DEBUG: STUN Reply (buffer size = 1300)...
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 message integrity:
(kurento-media-server:13770): libnice-stun-DEBUG:   key     : 0x7a7573755756396471395757746145385a5a39596842
(kurento-media-server:13770): libnice-stun-DEBUG:   sent    : 0x662a3e3f135745a9d2bcd29d0b26277d8593c9e7
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 0x16afbd47
(kurento-media-server:13770): libnice-stun-DEBUG:  All done (response size: 80)
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : STUN-CC RESP to '213.184.55.243:63450', socket=86, len=80, cand=0x7fd860016e00 (c-id:1), use-cand=1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : Found a matching pair 0x7fd8641a4f00 (6:remote1) (SUCCEEDED) ...
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : nothing to do for pair 0x7fd8641a4f00.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0: Pruning pending checks. Highest nominated priority is 4341472238197816831
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : conn.check list status: 1 nominated, 1 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : marking pair 0x7fd8641a4f00 (6:remote1) as nominated
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0: Pruning pending checks. Highest nominated priority is 4341472238197816831
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd8580105c0 : conn.check list status: 1 nominated, 1 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: inbound STUN packet for 1/1 (stream/component) from [213.184.55.243]:57201 (100 octets) :
(kurento-media-server:13770): libnice-stun-DEBUG: STUN demux: OK!
(kurento-media-server:13770): libnice-stun-DEBUG: Comparing username/ufrag of len 9 and 4, equal=0
(kurento-media-server:13770): libnice-stun-DEBUG:   username: 0x525047483a796a4a31
(kurento-media-server:13770): libnice-stun-DEBUG:   ufrag:    0x52504748
(kurento-media-server:13770): libnice-stun-DEBUG: Found valid username, returning password: 'qdSqKiT7GWoxP0s8CxyOJD'
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint:
(kurento-media-server:13770): libnice-stun-DEBUG:   key     : 0x716453714b69543747576f78503073384378794f4a44
(kurento-media-server:13770): libnice-stun-DEBUG:   expected: 0x5f03b741338b921b6ec381306e21370dd07f994d
(kurento-media-server:13770): libnice-stun-DEBUG:   received: 0x5f03b741338b921b6ec381306e21370dd07f994d
(kurento-media-server:13770): libnice-stun-DEBUG: STUN auth: OK!
(kurento-media-server:13770): libnice-stun-DEBUG: STUN unknown: 0 mandatory attribute(s)!
(kurento-media-server:13770): libnice-stun-DEBUG: STUN Reply (buffer size = 1300)...
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 message integrity:
(kurento-media-server:13770): libnice-stun-DEBUG:   key     : 0x716453714b69543747576f78503073384378794f4a44
(kurento-media-server:13770): libnice-stun-DEBUG:   sent    : 0x4e81cf80dab32530fb2f7bec11277c8089e87e1c
(kurento-media-server:13770): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 0x4346a4bb
(kurento-media-server:13770): libnice-stun-DEBUG:  All done (response size: 80)
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : STUN-CC RESP to '213.184.55.243:57201', socket=47, len=80, cand=0x7fd8640b4070 (c-id:1), use-cand=1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : Found a matching pair 0x7fd864100820 (6:remote1) (SUCCEEDED) ...
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : nothing to do for pair 0x7fd864100820.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Pruning pending checks. Highest nominated priority is 4341472238214462975
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : conn.check list status: 3 nominated, 3 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : marking pair 0x7fd864100820 (6:remote1) as nominated
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Finding highest priority for component 1
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40: Pruning pending checks. Highest nominated priority is 4341472238214462975
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd87404ca40 : conn.check list status: 3 nominated, 3 valid, c-id 1.
(kurento-media-server:13770): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received.
(kurento-media-server:13770): libnice-stun-DEBUG: STUN error: Incomplete message: 61 of 65300 bytes!
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40: agent_recv_message_unlocked returned -1, errno (11) : Resource temporarily unavailable
(kurento-media-server:13770): libnice-DEBUG: component_io_cb: 0x7fd88407ba40: error receiving message
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40 : Retransmissions failed, giving up on pair 0xff2000
(kurento-media-server:13770): libnice-DEBUG: Agent 0x7fd88407ba40 : pair 0xff2000 state FAILED (candidate_check_pair_fail)
(kurento-media-server:13770): libnice-DEBUG: Detach socket 0x7fd898050b10.
(kurento-media-server:13770): libnice-DEBUG: Detaching source 0xec4550 (socket 0x7fd898050b10, FD 138) from context 0x7fd89409bc80
(kurento-media-server:13770): libnice-DEBUG: Detaching source (nil) (socket 0x7fd898050b10, FD 138) from context (nil)

[1]+  Stopped                 /usr/bin/kurento-media-server
$ Segmentation fault (thread 140566658152192, pid 13770)
Stack trace:
[g_socket_send_message]
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x7B044
[socket_send_message]
/opt/libnice/socket/tcp-bsd.c:309
[socket_send_messages]
/opt/libnice/socket/tcp-bsd.c:362
[nice_agent_send_messages_nonblocking_internal]
/opt/libnice/agent/agent.c:4833
[gst_nice_src_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x36A2
[gst_nice_sink_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x3FA3
[gst_base_sink_do_preroll]
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2A1B2
[gst_base_sink_do_preroll]
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0:0x2B620
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
[gst_proxy_pad_chain_default]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x5F5E3
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
0x1B48D at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
@j1elo

This comment has been minimized.

Contributor

j1elo commented Aug 22, 2018

I know this ain't the place to ask for estimates, but can You give an estimate if this problem is solvable for You in the next 30 days? Currently it's completely unusable.

Hi @Richard-Aasa, if this was some known bug on Kurento code I'd be able to let you know what is wrong and how much time we expected to take for a fix.

But this is a bug in libnice, reported 9 months ago and currently tracked here:
libnice issue #33: segfault in g_socket_send_message

We at Kurento still don't have enough information about the cause of the issue, because everybody just writes their stack traces, which are all the same so more reports don't equal more information. If you can reproduce this while debugging with GDB and get some insight about why the crash happens, it could help a lot.

We all know that libnice is crashing in agent.c at line 4833 but we have been unable to reproduce it in debugging sessions (as it seems to be a crash that only happens with some amount of load on production servers); we are working on a project that will allow to simulate serious high-load scenarios, as a side effort. But for now, the best option is to show interest in solving that bug in the link above. They have all the context and the know-how about libnice's code base and will probably be able to solve the issue.

I want to have it fixed, so you'll see I've been active in the comments trying to move it forward. No answer so far, but hopefully that will happen soon. I'd encourage you (and everybody interested) to provide debugging information in that bug report, so the maintainers have as much information as possible.

@neilyoung

This comment has been minimized.

neilyoung commented Aug 22, 2018

@Richard-Aasa How did you make the KMS to trace the libnice requests? Just setting DEBUG level?

@j1elo

This comment has been minimized.

Contributor

j1elo commented Aug 22, 2018

You mean all those libnice-DEBUG and libnice-stun-DEBUG messages?

It's enabled by the WebRtcEndpoint code if the environment variables G_MESSAGES_DEBUG and NICE_DEBUG are set when KMS runs.

By default, it's easy to enable this behavior by editing the Kurento service settings (/etc/default/kurento-media-server) after an apt-get installation.

@neilyoung

This comment has been minimized.

neilyoung commented Aug 22, 2018

Yepp, thanks. Maybe those will help me to provide more input for my TURN problem (#294), even though from my POV it is self explanatory...

@neilyoung

This comment has been minimized.

neilyoung commented Aug 22, 2018

Is the KMS supposed to be compiled as debug version? I can't make these traces visible, even though I see this in my release KMS:

2018-08-22T18:26:22,245807 11634 0x00007f36f544a700    info kmsiceniceagent           kmsiceniceagent.c:256 kms_ice_nice_agent_new() <KmsIceNiceAgent@0x7f36e002f560>  Enable debug logging in 'libnice' library
@Richard-Aasa

This comment has been minimized.

Richard-Aasa commented Aug 30, 2018

@neilyoung Delegating some info from IT regarding version 6.7.1:

Just take all export lines from /etc/default/kurento-media-server and run everything from command line under kurento user(not as service).
All debug messages are visible in console.
Kurento does not write those messages to log files - maybe it is a bug.
su - kurento --shell=/bin/bash
export GST_DEBUG="3,Kurento*:4,kms*:4,kmsiceniceagent:5,kmswebrtcsession:5,webrtcendpoint:4"
export NICE_DEBUG="libnice,libnice-stun"
export KURENTO_LOGS_PATH="/var/log/kurento-media-server"
export KURENTO_NUMBER_LOG_FILES=2
/usr/bin/kurento-media-server

@neilyoung

This comment has been minimized.

neilyoung commented Aug 30, 2018

@Richard-Aasa

This comment has been minimized.

Richard-Aasa commented Sep 11, 2018

I'm not very comfortable with C so here's something we've tried that did not work. It seemed very similar to what was proposed by: @yongje.lee on https://gitlab.freedesktop.org/libnice/libnice/issues/33,

libnice tcp-bsd.c :

 if (!sock) { 
   nice_debug ("!sock detected! Returning -1"); 
   return -1; 
} 

 if (!sock->fileno) { 
   nice_debug ("!sock->fileno detected! Returning -1"); 
   return -1; 
 }

Stack:

(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10: inbound STUN packet for 1/1 (stream/component) from [195.80.114.254]:53412 (92 octets) : 
(kurento-media-server:8259): libnice-stun-DEBUG: STUN demux: OK! 
(kurento-media-server:8259): libnice-stun-DEBUG: Comparing username/ufrag of len 13 and 4, equal=0 
(kurento-media-server:8259): libnice-stun-DEBUG:   username: 0x6c4337583a6264383030356366 
(kurento-media-server:8259): libnice-stun-DEBUG:   ufrag:    0x6c433758 
(kurento-media-server:8259): libnice-stun-DEBUG: Found valid username, returning password: 'L+c0QxkX1JJEk0jhZO1MoX' 
(kurento-media-server:8259): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 
(kurento-media-server:8259): libnice-stun-DEBUG:   key     : 0x4c2b633051786b58314a4a456b306a685a4f314d6f58 
(kurento-media-server:8259): libnice-stun-DEBUG:   expected: 0x2ec8262372427f305714adba5f3fa453e191fe9f 
(kurento-media-server:8259): libnice-stun-DEBUG:   received: 0x2ec8262372427f305714adba5f3fa453e191fe9f 
(kurento-media-server:8259): libnice-stun-DEBUG: STUN auth: OK! 
(kurento-media-server:8259): libnice-stun-DEBUG: STUN unknown: 0 mandatory attribute(s)! 
(kurento-media-server:8259): libnice-stun-DEBUG: STUN Reply (buffer size = 1300)... 
(kurento-media-server:8259): libnice-stun-DEBUG:  Message HMAC-SHA1 message integrity: 
(kurento-media-server:8259): libnice-stun-DEBUG:   key     : 0x4c2b633051786b58314a4a456b306a685a4f314d6f58 
(kurento-media-server:8259): libnice-stun-DEBUG:   sent    : 0xd60fb40e3729cc5412fd9c0fb868aabdae21d50a 
(kurento-media-server:8259): libnice-stun-DEBUG:  Message HMAC-SHA1 fingerprint: 0x2a8e344b 
(kurento-media-server:8259): libnice-stun-DEBUG:  All done (response size: 84) 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10 : STUN-CC RESP to '195.80.114.254:53412', socket=166, len=84, cand=0x7f390034c000 (c-id:1), use-cand=0. 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10 : Found a matching pair 0x7f38fc2a2240 (6:remote1) (SUCCEEDED) ... 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10 : nothing to do for pair 0x7f38fc2a2240. 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10: Finding highest priority for component 1 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10: Pruning pending checks. Highest nominated priority is 4341472239187460607 
(kurento-media-server:8259): libnice-DEBUG: Agent 0x7f39181b2a10 : conn.check list status: 1 nominated, 1 valid, c-id 1. 
(kurento-media-server:8259): libnice-DEBUG: agent_recv_message_unlocked: Valid STUN packet received. 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1 
(kurento-media-server:8259): libnice-DEBUG: !sock->fileno detected! Returning -1
@garry81

This comment has been minimized.

garry81 commented Sep 12, 2018

@Richard-Aasa in my case, it is not enough to check whether fd is valid value becuase socket is invalidated due to RST but libnice's socket fd isn't updated. would you check fd with G_IS_SOCKET which I suggested on that post?

@j1elo

This comment has been minimized.

Contributor

j1elo commented Sep 12, 2018

Hi @Richard-Aasa ,
I don't think the socket_send_message() will get called with NiceSocket *sock = NULL, but for these tests it's worth adding the check.

Do as @garry81 explains and do this (complete snippet for clarity):

static gssize
socket_send_message (NiceSocket *sock,
    const NiceOutputMessage *message, gboolean reliable)
{
  TcpPriv *priv = sock->priv;
  gssize ret;
  GError *gerr = NULL;
  gsize message_len;

  if (sock == NULL) {
    nice_debug ("NULL sock detected! Returning -1");
    return -1;
  }

  /* Socket has been closed: */
  if (sock->priv == NULL)
    return -1;
    
  /* Don't try to access the socket if it had an error, otherwise we risk a
   * crash with SIGPIPE (Broken pipe) */
  if (priv->error)
    return -1;
    
  if (!G_IS_SOCKET (sock->fileno)) {
    nice_debug ("INVALID sock->fileno detected! Returning -1");
    return -1;
  }

  message_len = output_message_get_size (message);
  
  ...

Also please let us know:

  • What version of the code are you testing? (commit ID, version number, etc)

  • Are you building the library and replacing the old /usr/lib/x86_64-linux-gnu/libnice.so.10.* with the new one?

  • Are you instead building Debian packages on your own and installing them on top of the default ones?
    (In summary, how are you making sure that the new, modified version of the libnice code is being actually used when your tests are run? I mean, I see you know it's being used because the printed logs show your message, I just want to know what method you are using)

  • Do you have a Kurento test application that can be used to reproduce this crash?
    This is the easiest way to be successful troubleshooting and fixing complex issues as this one.

@puneet89

This comment has been minimized.

puneet89 commented Oct 3, 2018

Hi Jorge,
I have raised a pull request for same with libnice ( Kurento/libnice#3) .
In the mean time you can use following branch to test your scenarios :
https://github.com/puneet89/libnice/tree/patch_gsocket_fix1
( KMS version 6.7.2 deb pkg , libnice patch created from 0.1.14 )

Let me know in case of any issue.

Thanks & Regards
Puneet Singh

@jmaiquez

This comment has been minimized.

jmaiquez commented Oct 4, 2018

Hi Puneet,

Thanks so much for this. We are still trying to figure out how to reproduce the crash reliably using our video collaboration application. Once we can semi-reliably reproduce it, we will use your patch to see if it becomes less (or not at all) reproducible.

I will report my findings here when I have them.

Thanks again,
Jorge

@guzmand

This comment has been minimized.

guzmand commented Oct 12, 2018

Hi,

I would like to know if this would be released in Kurento 6.8? and also if the Kurento 6.8 version would be considered a stable version?

Thanks,

Dario

@j1elo

This comment has been minimized.

Contributor

j1elo commented Oct 15, 2018

@guzmand Please wait until official announcement for Kurento 6.8.1, where there are already some minor problems fixed (like the KMS truncating the error log on each restart) and improvements added to aid in working with the libnice issues (libnice logging integrated with Kurento logging, all will be logged now to the same output files). I'll write an official announcement on the Kurento mail list and the homepage blog (kurento.org)

@puneet89 I have used your test but wasn't able to reproduce the crash, after several repetitions libnice never crashed, so I couldn't see either the assert or the socket issues...
I have tried several versions of KMS:

  • KMS 6.7.2, which comes with libnice 0.1.13
  • KMS 6.8.0, which comes with libnice 0.1.13
  • KMS Nightly, that has an experimental (mostly unreliable) version of libnice 0.1.15
  • Running test + KMS + java app in localhost
  • Running test in localhost, and KMS + java app in AWS machine

I think these issues are strongly related to the actual network configuration where the failure happens, and is not so much related to load, because it seems strange that some are able to easily reproduce the issue with their systems, and some other people have full system load and never see the issue...

j1elo added a commit to Kurento/experiments that referenced this issue Oct 22, 2018

@j1elo

This comment has been minimized.

Contributor

j1elo commented Nov 15, 2018

Latest Kurento Media Server 6.8.1 brings some performance improvements that were merged into upstream libnice. We've load tested this version and couldn't make libnice crash.

Our load test consists of 13 rooms with 7 browser participants on each room. This means a total of 91 participants:

  • 91 WebRTC incoming streams at KMS
  • 546 WebRTC outgoing streams from KMS to all participants
    Test ended successfully and the crash described in this report didn't happen.

If the crash from this report still happens in your systems, please provide a detailed explanation of the network topology (firewalls? using TURN? using TCP vs. UDP streams?) and organization of participants (all in same room? multiple rooms? etc).

@pabloFuente

This comment has been minimized.

Member

pabloFuente commented Nov 15, 2018

Worth noticing that this load test has been executed in a AWS machine with 4 cores and 8GB of RAM.

@Kukunin

This comment has been minimized.

Kukunin commented Nov 15, 2018

@j1elo what great news, thanks! a small question: were you able to reproduce the crash earlier with the described test environment?

@puneet89

This comment has been minimized.

puneet89 commented Nov 15, 2018

@j1elo Hi,

I will pick up this activity again in next week then will report if i can still reproduce the crash with latest build 6.8.1.
Regards
Puneet Singh

@j1elo

This comment has been minimized.

Contributor

j1elo commented Nov 15, 2018

@puneet89 thank you; also please remember to update Olivier in libnice's issue #33

@Kukunin Not really, we finished development of our load tests just last week. We'll be doing various testing in the coming days.

@American-Ledger

This comment has been minimized.

American-Ledger commented Nov 16, 2018

I can reproduce this on a fresh AWS instance of 6.8.1 every time as follows:

  1. Use one client of Tutorial Recorder in Chrome Browser.
  2. Modify index.js to ignore all IceCandidates that are not TCP.
  3. Load tutorial in browser. Press Play. Stream will appear on right via TCP.
  4. Press F5 to break the connection... Boom! Kurento crashes with:
    Segmentation fault (thread 140198302856960, pid 8089)
  • Every time.
@puneet89

This comment has been minimized.

puneet89 commented Nov 19, 2018

@j1elo this issue is coming frequently with KMS 6.8.1 ( no performance test) ,even in normal testing this issue is coming .
for issue https://gitlab.freedesktop.org/libnice/libnice/issues/33 i will update Olivier soon .

pkg Info:
root@imp-desk-101:/var/log/kurento-media-server# dpkg -l | grep kms
ii kms-core 6.8.1.xenial.20181023170759.64317be amd64 Kurento Core module
ii kms-elements 6.8.1.xenial.20181023171349.0f655c6 amd64 Kurento Elements module
ii kms-filters 6.8.1.xenial.20181023172029.83f02d4 amd64 Kurento Filters module
ii kms-jsonrpc 6.8.1.xenial.20181023170316.9ad5ec7 amd64 Kurento JSON-RPC library
ii kmsjsoncpp 1.6.3.xenial.20181023152331.d78deb7 amd64 Kurento jsoncpp library
root@imp-desk-101:/var/log/kurento-media-server# dpkg -l | grep libnice
ii libnice10:amd64 0.1.15-1kurento1 amd64 ICE library (shared library)

Attached error.log file for KMS 6.8.1.
errors.log

Stack trace:
�[34;1m[g_socket_send_message]�[0m
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0�[32;1m:0x7B044�[0m
�[34;1m[nice_output_stream_new]�[0m
/usr/lib/x86_64-linux-gnu/libnice.so.10�[32;1m:0x2769E�[0m
�[34;1m[nice_output_stream_new]�[0m
/usr/lib/x86_64-linux-gnu/libnice.so.10�[32;1m:0x27813�[0m
�[34;1m[nice_agent_recv_nonblocking]�[0m
/usr/lib/x86_64-linux-gnu/libnice.so.10�[32;1m:0x11879�[0m
�[34;1m[gst_nice_src_get_type]�[0m
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so�[32;1m:0x3902�[0m
�[34;1m[gst_nice_sink_get_type]�[0m
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so�[32;1m:0x4203�[0m
�[34;1m[gst_base_sink_do_preroll]�[0m
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0�[32;1m:0x2A1B2�[0m
�[34;1m[gst_base_sink_do_preroll]�[0m
/usr/lib/x86_64-linux-gnu/libgstbase-1.5.so.0�[32;1m:0x2B620�[0m
�[34;1m[gst_flow_get_name]�[0m
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0�[32;1m:0x6E5CF�[0m
�[34;1m[gst_pad_push]�[0m
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0�[32;1m:0x76533�[0m
�[34;1m[gst_proxy_pad_chain_default]�[0m
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0�[32;1m:0x5F5E3�[0m
�[34;1m[gst_flow_get_name]�[0m
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0�[32;1m:0x6E5CF�[0m
�[34;1m[gst_pad_push]�[0m
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0�[32;1m:0x76533�[0m
�[32;1m0x1B48D�[0m at /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so
�[34;1m[gst_flow_get_name]�[0m
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0�[32;1m:0x6E5CF�[0m
�[34;1m[gst_pad_push]�[0m
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0�[32;1m:0x76533�[0m
0:00:00.610831422 �[334m 8757�[00m 0x226eaa0 �[33;01mWARN

@micaelgallego

This comment has been minimized.

Member

micaelgallego commented Nov 19, 2018

@j1elo is in vacancy right now. It will take care of this issue as soon as he comes (in two days).

Having a way to reproduce the crash we will be able to fix it very soon.

@chevonc

This comment has been minimized.

chevonc commented Nov 20, 2018

This is currently blocking us from launching a feature with Kurento in production. Any ETA on what the turn around time will be once he's back? Is this a difficult fix?

@micaelgallego

This comment has been minimized.

Member

micaelgallego commented Nov 20, 2018

We will work hard on having it fixed ASAP. Maybe at the beginning of the next week.

@chevonc

This comment has been minimized.

chevonc commented Nov 20, 2018

Thanks for the quick response @micaelgallego. Eagerly awaiting the fix, let me know if can use a nightly build or something right away

@micaelgallego

This comment has been minimized.

Member

micaelgallego commented Nov 20, 2018

@chevonc we will notify when probable fix is in nightly.

Are you able to reproduce the crash easily?

@chevonc

This comment has been minimized.

chevonc commented Nov 20, 2018

@micaelgallego yes I can. Can I work with you to get it fixed? Will do w/e necessary :)

@chevonc

This comment has been minimized.

chevonc commented Nov 20, 2018

FYI - looks like Olivier was nice enough to push another set of fixes. I'm willing to work with whoever to get this verified/merged.

https://gitlab.freedesktop.org/libnice/libnice/issues/33#note_80203

Currently trying to figure out how to test out a patched lib with his changes if anyone has pointers

@chevonc

This comment has been minimized.

chevonc commented Nov 20, 2018

Validated this fix no longer causes the crash 🍾 . If we can merge/get this a released patched version of Kurento that'd be awesome. Currently just running with src-built patched version of the libnice

@j1elo

This comment has been minimized.

Contributor

j1elo commented Nov 21, 2018

Tested with the procedure suggested by @American-Ledger and it works. Curiously enough, it didn't work with the simpler Hello World tutorial. After some investigation I found the reason: when the page is closed / reloaded, the recorder tutorials abort abruptly by deleting the client objects, instead of doing a controlled stop. This is actually a bug in the tutorial code ;-)

But funnily enough this "bug" does actually trigger the libnice crash; a controlled release of resources from the application wouldn't cause the crash...

I've imported this behavior to a customized version of kurento-hello-world (Java tutorial). This can be used to easily reproduce the crash by anyone interested (you need to install Java JDK and Maven, like for any of the other Kurento Java tutorials).

I've tested latest commits from libnice PR 15 and the issue doesn't happen any more with this test application. So I've merged these patches into our fork of libnice and packages are already available in Kurento's pre-release repository.

Please test with the pre-release version of Kurento and let us know if these patches do actually fix the crashes!

@puneet89

This comment has been minimized.

puneet89 commented Nov 21, 2018

@j1elo Thanks !
I have upgraded all my setups with above kurento build, now will do all kind of sanity test on same .

@puneet89

This comment has been minimized.

puneet89 commented Nov 23, 2018

@j1elo with latest dev repo i am getting following error on Kurento when more than 12 user joining conference.

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

@puneet89

This comment has been minimized.

puneet89 commented Nov 23, 2018

I am raising different issue for this GStreamer error , as libnice issue is closed and working.

@j1elo

This comment has been minimized.

Contributor

j1elo commented Nov 23, 2018

That assertion is happening at this point in GStreamer code.

The docs of g_return_if_fail provide a way to debug this: you would need to follow these steps:

  1. Install KMS and the debug symbols, as explained here.
  2. Set an environment variable: G_DEBUG=fatal-criticals.
  3. Run KMS with GDB, until the same assertion happens.
  4. Generate a backtrace.
# Install KMS and debug symbols
apt-get update && apt-get install [...]

$ export G_DEBUG=fatal-criticals
$ export GST_DEBUG="3,Kurento*:4,kms*:4"
$ gdb /usr/bin/kurento-media-server
(gdb) run
# Now run until the assertion happens, GDB should break at that point
(gdb) backtrace

Then copy the backtrace so we can see it and know where it comes from.
Please use Markdown formatting when you copy stuff from logs or from a terminal... (use ``` for quoting blocks)

@puneet89

This comment has been minimized.

puneet89 commented Nov 23, 2018

@j1elo okay thanks i will take all logs and will share with you.

@j1elo

This comment has been minimized.

Contributor

j1elo commented Dec 5, 2018

Closing as it seems to be fixed since upstream merged their Merge Request !15

Fix release: Kurento 6.9.0

@j1elo j1elo closed this Dec 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment