Skip to content
This repository has been archived by the owner on Jul 12, 2023. It is now read-only.

Out of memory: Kill process 3386 (kurento-media-s) score 765 or sacrifice child #313

Closed
puneet89 opened this issue Nov 23, 2018 · 36 comments
Assignees

Comments

@puneet89
Copy link

I have installed -
KMS 6.8.1 and Dev branch
I am getting following error on Kurento when more than 12 user joining conference and after some time binary killed by kernel.
(kurento-media-server:1251): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

KMS 6.7.2
Joined with 14 user work fine till 1 hour after that KMS binary stopped and following error logs reported.
(kurento-media-server:3386): GStreamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed

DMESG LOGS: It seems some memory leak in KMS as kernel killing kurento process.

**[15874.505398] Out of memory: Kill process 3386 (kurento-media-s) score 765 or sacrifice child
[15874.511950] Killed process 3386 (kurento-media-s) total-vm:10667176kB, anon-rss:3087576kB, file-rss:0kB

[15874.505143] java invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[15874.505146] java cpuset=/ mems_allowed=0
[15874.505151] CPU: 0 PID: 3436 Comm: java Not tainted 4.4.0-1072-aws #82-Ubuntu
[15874.505153] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[15874.505155] 0000000000000286 9cd6e1d582553a57 ffff88003687f9e8 ffffffff81401783
[15874.505158] ffff88003687fba0 ffff8800e9f31b80 ffff88003687fa58 ffffffff8121039a
[15874.505160] ffffffff81cc7209 0000000000000000 ffffffff81e6b2a0 0000000000000206
[15874.505163] Call Trace:
[15874.505170] [] dump_stack+0x63/0x90
[15874.505175] [] dump_header+0x5a/0x1c5
[15874.505178] [] oom_kill_process+0x202/0x3c0
[15874.505181] [] out_of_memory+0x219/0x460
[15874.505185] [] __alloc_pages_slowpath.constprop.88+0x943/0xaf0
[15874.505187] [] __alloc_pages_nodemask+0x288/0x2a0
[15874.505190] [] alloc_pages_current+0x8c/0x110
[15874.505193] [] __page_cache_alloc+0xab/0xc0
[15874.505195] [] filemap_fault+0x160/0x440
[15874.505199] [] ext4_filemap_fault+0x36/0x50
[15874.505202] [] __do_fault+0x77/0x110
[15874.505204] [] handle_mm_fault+0x1252/0x1b70
[15874.505208] [] __do_page_fault+0x1a4/0x410
[15874.505210] [] do_page_fault+0x22/0x30
[15874.505214] [] page_fault+0x28/0x30
[15874.505216] Mem-Info:
[15874.505220] active_anon:933608 inactive_anon:1067 isolated_anon:0
active_file:523 inactive_file:714 isolated_file:0
unevictable:913 dirty:0 writeback:0 unstable:0
slab_reclaimable:4455 slab_unreclaimable:10385
mapped:1448 shmem:1483 pagetables:4107 bounce:0
free:20927 free_pcp:138 free_cma:0
[15874.505224] Node 0 DMA free:15900kB min:264kB low:328kB high:396kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[15874.505230] lowmem_reserve[]: 0 3729 3919 3919 3919
[15874.505233] Node 0 DMA32 free:64792kB min:64064kB low:80080kB high:96096kB active_anon:3567896kB inactive_anon:4260kB active_file:2092kB inactive_file:2856kB unevictable:3080kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835340kB mlocked:3080kB dirty:0kB writeback:0kB mapped:5220kB shmem:5896kB slab_reclaimable:17140kB slab_unreclaimable:38480kB kernel_stack:15312kB pagetables:15108kB unstable:0kB bounce:0kB free_pcp:552kB local_pcp:432kB free_cma:0kB writeback_tmp:0kB pages_scanned:32360 all_unreclaimable? yes
[15874.505239] lowmem_reserve[]: 0 0 189 189 189
[15874.505242] Node 0 Normal free:3016kB min:3248kB low:4060kB high:4872kB active_anon:166536kB inactive_anon:8kB active_file:0kB inactive_file:0kB unevictable:572kB isolated(anon):0kB isolated(file):0kB present:262144kB managed:193804kB mlocked:572kB dirty:0kB writeback:0kB mapped:572kB shmem:36kB slab_reclaimable:680kB slab_unreclaimable:3060kB kernel_stack:1584kB pagetables:1320kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[15874.505247] lowmem_reserve[]: 0 0 0 0 0
[15874.505250] Node 0 DMA: 14kB (U) 18kB (U) 116kB (U) 032kB 264kB (U) 1128kB (U) 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15900kB
[15874.505261] Node 0 DMA32: 626
4kB (UE) 2288kB (UME) 26316kB (UME) 54232kB (UE) 34464kB (UME) 96128kB (UME) 18256kB (UME) 0512kB 01024kB 02048kB 04096kB = 64792kB
[15874.505272] Node 0 Normal: 4844kB (UMH) 398kB (M) 4816kB (UM) 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 3016kB
[15874.505281] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[15874.505282] 3382 total pagecache pages
[15874.505284] 0 pages in swap cache
[15874.505285] Swap cache stats: add 0, delete 0, find 0/0
[15874.505286] Free swap = 0kB
[15874.505287] Total swap = 0kB
[15874.505289] 1048477 pages RAM
[15874.505290] 0 pages HighMem/MovableOnly
[15874.505291] 37216 pages reserved
[15874.505291] 0 pages cma reserved
[15874.505292] 0 pages hwpoisoned
[15874.505293] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[15874.505298] [ 372] 0 372 8035 724 20 3 0 0 systemd-journal
[15874.505301] [ 427] 0 427 23693 167 17 3 0 0 lvmetad
[15874.505303] [ 457] 0 457 10669 657 23 3 0 -1000 systemd-udevd
[15874.505305] [ 706] 100 706 25081 431 20 3 0 0 systemd-timesyn
[15874.505307] [ 944] 0 944 4030 533 11 3 0 0 dhclient
[15874.505309] [ 1090] 0 1090 1305 29 9 3 0 0 iscsid
[15874.505312] [ 1091] 0 1091 1430 877 9 3 0 -17 iscsid
[15874.505314] [ 1095] 107 1095 10755 562 26 3 0 -900 dbus-daemon
[15874.505316] [ 1097] 0 1097 7136 497 18 3 0 0 systemd-logind
[15874.505318] [ 1103] 0 1103 68654 794 37 3 0 0 accounts-daemon
[15874.505320] [ 1111] 0 1111 75444 1573 31 6 0 0 amazon-ssm-agen
[15874.505322] [ 1118] 104 1118 65157 511 28 3 0 0 rsyslogd
[15874.505324] [ 1120] 0 1120 6511 416 18 3 0 0 atd
[15874.505326] [ 1125] 0 1125 77160 242 20 3 0 0 lxcfs
[15874.505329] [ 1132] 0 1132 6932 383 18 3 0 0 cron
[15874.505331] [ 1136] 0 1136 78097 3009 39 5 0 -900 snapd
[15874.505333] [ 1145] 0 1145 1099 306 8 3 0 0 acpid
[15874.505335] [ 1164] 0 1164 16377 569 35 3 0 -1000 sshd
[15874.505337] [ 1178] 0 1178 3343 36 11 3 0 0 mdadm
[15874.505339] [ 1179] 0 1179 69278 163 39 3 0 0 polkitd
[15874.505341] [ 1242] 0 1242 4868 251 15 3 0 0 irqbalance
[15874.505343] [ 1317] 0 1317 3664 351 12 3 0 0 agetty
[15874.505345] [ 1321] 0 1321 3618 419 12 3 0 0 agetty
[15874.505348] [ 1657] 1000 1657 11287 495 27 3 0 0 systemd
[15874.505350] [ 1661] 1000 1661 15389 566 33 3 0 0 (sd-pam)
[15874.505352] [ 2612] 0 2612 23200 678 51 3 0 0 sshd
[15874.505354] [ 2667] 1000 2667 23391 441 48 3 0 0 sshd
[15874.505356] [ 2668] 0 2668 23199 673 48 3 0 0 sshd
[15874.505358] [ 2670] 1000 2670 5428 944 15 3 0 0 bash
[15874.505360] [ 2696] 0 2696 13938 539 32 3 0 0 sudo
[15874.505362] [ 2697] 0 2697 12751 509 30 4 0 0 su
[15874.505364] [ 2698] 0 2698 5077 612 14 3 0 0 bash
[15874.505366] [ 2738] 1000 2738 23199 292 47 3 0 0 sshd
[15874.505368] [ 2739] 1000 2739 3220 442 12 3 0 0 sftp-server
[15874.505371] [19655] 0 19655 23200 648 46 3 0 0 sshd
[15874.505373] [19710] 1000 19710 23268 576 47 3 0 0 sshd
[15874.505375] [19711] 0 19711 23199 668 50 3 0 0 sshd
[15874.505377] [19713] 1000 19713 5428 961 15 3 0 0 bash
[15874.505379] [19770] 1000 19770 23199 375 48 3 0 0 sshd
[15874.505381] [19771] 1000 19771 3220 441 12 3 0 0 sftp-server
[15874.505383] [19772] 0 19772 13938 554 32 3 0 0 sudo
[15874.505385] [19773] 0 19773 12751 492 30 4 0 0 su
[15874.505387] [19774] 0 19774 5032 596 14 3 0 0 bash
[15874.505390] [ 3386] 1001 3386 2666794 771894 2535 15 0 0 kurento-media-s
[15874.505392] [ 3402] 0 3402 899797 141796 389 6 0 0 java
[15874.505394] [ 3417] 0 3417 1500 320 8 3 0 0 tailf
[15874.505396] [12376] 0 12376 10130 422 24 3 0 0 top
[15874.505398] Out of memory: Kill process 3386 (kurento-media-s) score 765 or sacrifice child
[15874.511950] Killed process 3386 (kurento-media-s) total-vm:10667176kB, anon-rss:3087576kB, file-rss:0kB
root@ip-20-3-20-240:/var/log/kurento-media-server
#

@puneet89 puneet89 changed the title Streamer-CRITICAL **: gst_mini_object_unlock: assertion 'state >= SHARE_ONE' failed Out of memory: Kill process 3386 (kurento-media-s) score 765 or sacrifice child Nov 23, 2018
@puneet89
Copy link
Author

i am using aws t2.medium instance ,i analysed further i can see with 14 user Kurento start with 35-40 % CPU utilization but suddenly it goes to 200% for a moment and that's why Kernel is killing kurento process,

@puneet89
Copy link
Author

the issue is coming when i am rejoining user with different video streams QCIF SD HD ,i can see kurento process stuck (KMS 6.7.2)
KMS stuck and utilization goes to 200%

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
bt
0x00007f1d89b6098d in pthread_join (threadid=139764602025728, thread_return=0x0) at pthread_join.c:90
90 pthread_join.c: No such file or directory.
(gdb) bt
#0 0x00007f1d89b6098d in pthread_join (threadid=139764602025728, thread_return=0x0) at pthread_join.c:90
#1 0x00007f1d87dc3b97 in std::thread::join() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x0000000000522133 in kurento::WebSocketTransport::stop() ()
#3 0x00000000004d209e in main ()
(gdb) bt
#0 0x00007f1d89b6098d in pthread_join (threadid=139764602025728, thread_return=0x0) at pthread_join.c:90
#1 0x00007f1d87dc3b97 in std::thread::join() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x0000000000522133 in kurento::WebSocketTransport::stop() ()
#3 0x00000000004d209e in main ()
(gdb)

@puneet89
Copy link
Author

@j1elo This is one more issue i am able to reproduce , i will enable debug symbos/logs and will share with you the exact steps and backtrace.

@j1elo
Copy link
Member

j1elo commented Nov 23, 2018

Note that we have received reports from other users of KMS with 6.8.1 and DEV, testing 1-to-400 WebRTC broadcasting, and also 30-to-30 video conference, and it all worked fine. We are investigating a possible memory leak. But it shouldn't break so easily with only 12 streams.

@puneet89
Copy link
Author

puneet89 commented Nov 23, 2018

@j1elo actually in my case it cpu utilision going high for a moment and kernel killing it.
CPU utilization going 200% even with 12 user when each user changing their resolution qcif to HD from video conference UI(we have developed UI for same to change video resolution) and when user changing resolution they are actually rejoining conference with higher resolution in sdp offer...so it makes kurento cpu spikes.
oom killer sees if a process out of memory or CPU utilzation beyont a limit that is in my case 200%(2vCPU) and kills it.

oom-killer will kill processes with the highest score first, generally targeting processes that are using up the most cpu and memory.

@puneet89
Copy link
Author

can you tell me how much was the cpu utilization for 400 Webrtc and 30-30 video conference.
if we don't change the resolution from qcif to HD(rejoing with hd) , normal calls working fine, but this issue coming only in above scenario what i have described.

@j1elo
Copy link
Member

j1elo commented Nov 23, 2018

Look out for the TRANSCODING is ACTIVE or TRANSCODING is INACTIVE in the KMS logs.

Also keep in mind that KMS has trouble processing 1080p streams. We had an user with a product for 4K videos, they tried to send 4K through KMS but after some experimentation they found out the maximum possible is 720p. And this was only 1 stream... if it's 12 streams I would say that it's normal to see the CPU getting exhausted.

Another user tried 1080p RTSP streams but found out something similar, max. 720p and 2MBps (see notes here)

@puneet89
Copy link
Author

@j1elo in our case it was 720p, we were selecting HD but browser actually sending 720p stream. I verified it with chrome://webrtc-internals

@j1elo
Copy link
Member

j1elo commented Nov 23, 2018

Ok, then start with only 1 participant, then monitor CPU and slowly add participants 1 by 1, so you can get how many participants are able to work at the same time. I'll ask the Kurento team bout these limits.

@ffxxrr
Copy link

ffxxrr commented Nov 28, 2018

Hey @puneet89 - we are considering moving forward and implementing the latest version of kms into our production environment to hopefully get away from the segmentation fault issue detailed in #247 .

Have you any further insights following your testing?
We are trying to decide whether its safe to take this latest version to production.

I would also appreciate it if you could detail the exact versions of kms and libnice that you have been using.

Thankyou!

@puneet89
Copy link
Author

issue #247 is definitely solved and its working fine on latest dev release.

@ffxxrr This week i am planning to test Kurento Dev release (latest) for other scenario (not specific to #247),as i am facing new issue like Gstreamer Critical error etc on latest release, I will share detailed analysis of issues on latest release asap as asked by @j1elo #247 (comment) .

@kylefoley
Copy link

Had a similar issue like this pop up yesterday. We had 8 users in SFU, all sharing video. M5-4XL instance. Kurento started pegging the CPU like I've never seen, normally I see around 30-40% (6% total) core utilization for 8 users all sharing video. About 10 minutes in, started getting above 100% core usage. At 20 minutes in, the core % was dancing between 700% and 1000%. This is on 6.8.1 btw.

I'll see if I can repro this today, so far I haven't had any luck. I'll also see if any of our users are using a 1080p camera

@j1elo
Copy link
Member

j1elo commented Dec 5, 2018

Please remember to share any finding you may have, if it helps to solve some performance issues with Kurento we'll all benefit from it.

Key things to check:

  • Currently Kurento has performance issues with source videos bigger or equal to 720p. 1080p is not recommended (although it might work but the Kurento team hasn't done any factual analysis to prove it)
  • Source and destination video codecs must be compatible. This has always been a source of performance problems in WebRTC communications.
    • For example, if some participants are using Firefox and talking in a room, they will probably negotiate VP8 codec with Kurento; then later someone enters with Safari, CPU usage explodes due to transcoding being now suddenly required, because Safari only supports H.264 (VP8 support added only since Desktop Safari v68, not sure about iOS Safari but I think it still doesn't support it).
    • Another example is you have some VP8 streams running nicely but then stream recording is enabled with MP4 recording profile, which uses H.264. Same story: video needs to be converted, and that uses a lot of CPU.
  • Also check if other processes are running in the same machine and using the CPU. For example, if Coturn is running and using a lot of resources because too many users end up connecting via Relay (TURN).

Of these, video transcoding is the main user of CPU cycles, because encoding video is a computationally expensive operation. As mentioned earlier, keep an eye on the TRANSCODING events sent from Kurento to your Application Server, or alternatively check the TRANSCODING is ACTIVE or TRANSCODING is INACTIVE messages in the KMS logs.

If you see that TRANSCODING is ACTIVE at some point, you may get a bit more information about why, by adding this:
export GST_DEBUG="${GST_DEBUG:-3},Kurento*:5,agnosticbin*:5"
to your daemon settings file, /etc/default/kurento-media-server

Then looking for these messages:

  • Current output caps: [...]
  • Downstream input caps: [...]
  • Find TreeBin with output caps: [...]

Which will end up with either of these sets of messages:

  • If source codec is compatible with destination:
    • TreeBin found! Reuse it
    • TRANSCODING is INACTIVE for this media
  • If source codec is not compatible with destination:
    • TreeBin not found! Connection requires transcoding
    • TRANSCODING is ACTIVE for this media

The input caps and output caps of the first messages can help understand what is being received by Kurento and what is being expected at the other side.

This comment ended up being quite verbose, I think I'll use it to add a new section in the Troubleshooting Issues documentation in case it helps somebody :)

@puneet89
Copy link
Author

puneet89 commented Dec 5, 2018

@j1elo sorry for late reply , actually i have compiled with all Debug symbols on one of servers but not able to reproduce crash or cpu high issue on this server.
But i have one production server where i am getting crash continuously after few intervals. I am discussion with our production team if we can enable debug symbols on production server.
on production server users are joining room with chrome, ios app, safari etc.
I will try to collect all requested logs on our production server and will share with you.

production server kurento logs without debug symbol enabled.
error.logs
" ^[[31;1mSegmentation fault^[[0m (thread ^[[33;1m140598692333696^[[0m, pid ^[[33;1m5066^[[0m)
Stack trace:
^[[34;1m[g_mutex_lock]^[[0m
/lib/x86_64-linux-gnu/libglib-2.0.so.0^[[32;1m:0x8EB15^[[0m
^[[34;1m[gst_buffer_pool_set_flushing]^[[0m
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0^[[32;1m:0x3FD1B^[[0m
^[[34;1m[g_list_sort_with_data]^[[0m
/lib/x86_64-linux-gnu/libglib-2.0.so.0^[[32;1m:0x471E6^[[0m
^[[34;1m[g_list_sort_with_data]^[[0m
/lib/x86_64-linux-gnu/libglib-2.0.so.0^[[32;1m:0x4744E^[[0m
^[[34;1m[g_main_context_check]^[[0m
/lib/x86_64-linux-gnu/libglib-2.0.so.0^[[32;1m:0x49CEA^[[0m
^[[34;1m[g_main_context_dispatch]^[[0m
/lib/x86_64-linux-gnu/libglib-2.0.so.0^[[32;1m:0x4A330^[[0m
^[[34;1m[g_main_loop_run]^[[0m
/lib/x86_64-linux-gnu/libglib-2.0.so.0^[[32;1m:0x4A712^[[0m
^[[32;1m0x555BB5846824^[[0m at /usr/bin/kurento-media-server
^[[34;1m[__libc_start_main]^[[0m
csu/libc-start.c^[[32;1m:325^[[0m
^[[32;1m0x555BB58487C9^[[0m at /usr/bin/kurento-media-server
"

@puneet89
Copy link
Author

puneet89 commented Dec 7, 2018

@j1elo
I got following random crash .

kurento-media-server Version - 6.9.0.xenial~201812 amd64
2018-12-07T122649.00000.pid7583.log

Debug level stack trace:( log file also attached)

Segmentation fault (thread 139921209538304, pid 26999)
Stack trace:
[sigc::slot_base::empty() const]
/usr/include/sigc++-2.0/sigc++/functors/slot_base.h:309
[kurento::MediaFlowOutStateChange::MediaFlowOutStateChange()]
/opt/kurento/obj-x86_64-linux-gnu/src/server/interface/generated-cpp/MediaFlowOutStateChange.hpp:33
[std::__shared_count<(__gnu_cxx::_Lock_policy)2>::
__shared_count()]
/usr/include/c++/5/bits/shared_ptr_base.h:659
[g_closure_invoke]
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gobject/gclosure.c:811 (discriminator 2)
[accumulate]
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gobject/gsignal.c:3134
[g_signal_emit_valist]
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gobject/gsignal.c:3413 (discriminator 1)
[g_signal_emit]
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./gobject/gsignal.c:3443
[cb_buffer_received]
/opt/kurento/src/gst-plugins/commons/kmselement.c:578
[probe_hook_marshal]
/opt/kurento/gst/gstpad.c:3450
[g_hook_list_marshal]
/build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/ghook.c:673
[do_probe_callbacks]
/opt/kurento/gst/gstpad.c:3605
[gst_pad_chain_data_unchecked]
/opt/kurento/gst/gstpad.c:4163
[gst_pad_push]
/opt/kurento/gst/gstpad.c:4556
[gst_rtp_base_depayload_push]
/opt/kurento/gst-libs/gst/rtp/gstrtpbasedepayload.c:823
[gst_rtp_base_depayload_handle_buffer]
/opt/kurento/gst-libs/gst/rtp/gstrtpbasedepayload.c:466
[gst_pad_chain_data_unchecked]
/opt/kurento/gst/gstpad.c:4185
^@-ne

2018-12-07T06:56:49+00:00 -- New execution

@puneet89
Copy link
Author

@j1elo Again i am getting following Gstreamer Critical error.

Can you please check log file once.
Full Log Files Attached .
2018-12-12T124822.00000.pid26101_12Dec.log

(kurento-media-server:9545): GStreamer-CRITICAL **: Element rtpbin460 already has a pad named send_rtp_sink_0, the behaviour of gst_element_get_request_pad() for existing pads is undefined!
2018-12-11T08:05:22+00:00 -- New execution

(kurento-media-server:595): GStreamer-CRITICAL **: Element rtpbin6 already has a pad named send_rtp_sink_0, the behaviour of gst_element_get_request_pad() for existing pads is undefined!
-ne

2018-12-11T08:08:33+00:00 -- New execution

(kurento-media-server:1031): GStreamer-CRITICAL **: Element rtpbin37 already has a pad named send_rtp_sink_0, the behaviour of gst_element_get_request_pad() for existing pads is undefined!
-ne

2018-12-11T08:16:40+00:00 -- New execution

(kurento-media-server:2219): GStreamer-CRITICAL **: Element rtpbin914 already has a pad named send_rtp_sink_0, the behaviour of gst_element_get_request_pad() for existing pads is undefined!
-ne

2018-12-11T14:32:24+00:00 -- New execution

(kurento-media-server:31700): GStreamer-CRITICAL **: Element rtpbin39 already has a pad named send_rtp_sink_0, the behaviour of gst_element_get_request_pad() for existing pads is undefined!
-ne

2018-12-11T14:39:31+00:00 -- New execution

(kurento-media-server:32590): GStreamer-CRITICAL **: Element rtpbin32 already has a pad named send_rtp_sink_0, the behaviour of gst_element_get_request_pad() for existing pads is undefined!

@puneet89
Copy link
Author

@j1elo please let me know if you need more logs as i am getting this crash on latest Kurento Only .

kurento-media-server 6.9.0.xenial~2018120517 amd64 Kurento Media Serve
2018-12-18T05:20:50+00:00 -- New execution
Segmentation fault (thread 139817040701184, pid 1465)
Stack trace:
[sigc::internal::signal_emit1<void, kurento::ElementDisconnected, sigc::nil>::emit(sigc::internal::signal_impl*, kurento::ElementDisconnected const&)]
/usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6:0x10BF29
[kurento::MediaElementImpl::mediaFlowOutStateChange(int, char*, KmsElementPadType)]
/usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6:0x102ECB
[virtual thunk to kurento::MediaElementImpl::getGstreamerDot(std::shared_ptrkurento::GstreamerDotDetails)]
/usr/lib/x86_64-linux-gnu/libkmscoreimpl.so.6:0xFA919
[g_closure_invoke]
/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0:0xFEFF
[g_signal_handler_disconnect]
/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0:0x21FC1
[g_signal_emit_valist]
/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0:0x2AD5C
[g_signal_emit]
/usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0:0x2B08F
[check_if_flow_media]
/usr/lib/x86_64-linux-gnu/libkmsgstcommons.so.6:0x1FAA4
[gst_mini_object_steal_qdata]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6C29B
[g_hook_list_marshal]
/lib/x86_64-linux-gnu/libglib-2.0.so.0:0x3A904
[gst_mini_object_steal_qdata]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6AAFB
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E98B
[gst_pad_push]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x76533
[gst_rtp_base_depayload_push]
/usr/lib/x86_64-linux-gnu/libgstrtp-1.5.so.0:0x12F69
[gst_rtp_base_depayload_push]
/usr/lib/x86_64-linux-gnu/libgstrtp-1.5.so.0:0x13583
[gst_flow_get_name]
/usr/lib/x86_64-linux-gnu/libgstreamer-1.5.so.0:0x6E5CF
^@

@j1elo
Copy link
Member

j1elo commented Dec 20, 2018

please let me know if you need more logs

Yes, stack traces without debug symbols are not very useful; please install all debug symbols before getting a stack trace, instructions here: https://doc-kurento.readthedocs.io/en/latest/user/troubleshooting.html#media-server-crashed

@j1elo
Copy link
Member

j1elo commented Dec 20, 2018

However in your previous logs the debug symbols were installed, as I can see file names and line numbers, but they don't point to any interesting place in the source code. I think the crash is happening because of memory corruption or other kind of issue that doesn't show up in the stack trace.

This probably requires debugging with GDB to get a breakpoint exactly when the crash happens. Please describe the situation when the crash occurs.

Is the CPU at 100% load when KMS crashes? I have seen [kurento::MediaElementImpl::mediaFlowOutStateChange in stack traces before, and the situation was that the server had its CPU overloaded, so moving to a bigger machine was a good counter-measure. Of course we still want to prevent crashes, so I am interested in finding the real cause.

@puneet89
Copy link
Author

puneet89 commented Feb 2, 2019

@j1elo
I have compiled kurento master ( 6.9.1) with all debug symbol enabled , I have commented VP8 codec in SdpEndpoint.conf.json as i have to use H264 Codec Only.
5 Chrome Desktop user joined conference , Kurento Crashed .

./kurento-media-server/server/kurento-media-server --version
Kurento Media Server version: 6.9.1~5.g3e41ba3

Crash BackTrace:

_'Core was generated by `kurento-media-server/server/kurento-media-server --modules-path=. --modules-con'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0  0x00007f88e1db7afb in _g_log_abort (breakpoint=1) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmessages.c:325
325     /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmessages.c: No such file or directory.
[Current thread is 1 (Thread 0x7f886d7b4700 (LWP 4363))]
(gdb) 
(gdb) 
(gdb) 
(gdb) bt
#0  0x00007f88e1db7afb in g_logv (breakpoint=1) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmessages.c:325
#1  0x00007f88e1db7afb in g_logv (log_domain=0x7f88e1df888e "GLib", log_level=G_LOG_LEVEL_ERROR, format=<optimized out>, args=args@entry=0x7f886d7b17b0) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmessages.c:1080
#2  0x00007f88e1db7c6f in g_log (log_domain=log_domain@entry=0x7f88e1df888e "GLib", log_level=log_level@entry=G_LOG_LEVEL_ERROR, format=format@entry=0x7f88e1e021e8 "%s: failed to allocate %lu bytes") at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmessages.c:1119
#3  0x00007f88e1db67e4 in g_malloc (n_bytes=n_bytes@entry=213942697) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmem.c:99
#4  0x00007f88e1dcd9c3 in g_slice_alloc (mem_size=mem_size@entry=213942697) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gslice.c:1025
#5  0x00007f88e29aa8ea in _sysmem_new_block (flags=(unknown: 0), maxsize=213942553, align=7, offset=0, size=213942546) at gstallocator.c:414
#6  0x00007f88e29b58d2 in gst_buffer_new_allocate (allocator=allocator@entry=0x0, size=size@entry=213942546, params=params@entry=0x0)
    at gstbuffer.c:767
#7  0x00007f8894332afd in gst_h264_parser_store_nal (h264parse=h264parse@entry=0x7f888004b730 [GstH264Parse], id=20, naltype=naltype@entry=GST_H264_NAL_SPS, nalu=nalu@entry=0x7f886d7b2bd0) at gsth264parse.c:467
#8  0x00007f8894335033 in gst_h264_parse_process_nal (h264parse=h264parse@entry=0x7f888004b730 [GstH264Parse], nalu=nalu@entry=0x7f886d7b2bd0) at gsth264parse.c:749
#9  0x00007f88943368bf in gst_h264_parse_handle_frame (frame=0x7f8880112c50, parse=0x7f888004b730 [GstH264Parse]) at gsth264parse.c:984
#10 0x00007f88943368bf in gst_h264_parse_handle_frame (parse=0x7f888004b730 [GstH264Parse], frame=0x7f8880112c50, skipsize=<optimized out>)
    at gsth264parse.c:1065
#11 0x00007f88df32b464 in gst_base_parse_handle_buffer (parse=parse@entry=0x7f888004b730 [GstH264Parse], buffer=<optimized out>, skip=skip@entry=0x7f886d7b2dc8, flushed=flushed@entry=0x7f886d7b2dcc) at gstbaseparse.c:2166
#12 0x00007f88df32fd4e in gst_base_parse_chain (pad=<optimized out>, parent=0x7f888004b730 [GstH264Parse], buffer=<optimized out>)
    at gstbaseparse.c:3230
#13 0x00007f88e29e95cf in gst_pad_push_data (data=0x7f88a80c7af0, type=4112, pad=0x7f88800442a0 [GstPad]) at gstpad.c:4183
#14 0x00007f88e29e95cf in gst_pad_push_data (pad=pad@entry=0x7f8880040580 [GstProxyPad], type=type@entry=4112, data=data@entry=0x7f88a80c7af0) at gstpad.c:4435
#15 0x00007f88e29f1533 in gst_pad_push (pad=pad@entry=0x7f8880040580 [GstProxyPad], buffer=buffer@entry=0x7f88a80c7af0) at gstpad.c:4554
#16 0x00007f88e29da5e3 in gst_proxy_pad_chain_default (pad=0x55831966fb30 [GstGhostPad], parent=<optimized out>, buffer=0x7f88a80c7af0)
    at gstghostpad.c:126
#17 0x00007f88e29e95cf in gst_pad_push_data (data=0x7f88a80c7af0, type=4112, pad=0x55831966fb30 [GstGhostPad]) at gstpad.c:4183
#18 0x00007f88e29e95cf in gst_pad_push_data (pad=pad@entry=0x7f8880040320 [GstTeePad], type=type@entry=4112, data=data@entry=0x7f88a80c7af0)
    at gstpad.c:4435
#19 0x00007f88e29f1533 in gst_pad_push (pad=pad@entry=0x7f8880040320 [GstTeePad], buffer=0x7f88a80c7af0) at gstpad.c:4554
#20 0x00007f88bc52ebd2 in gst_tee_handle_data (is_list=0, data=0x7f88a80c7af0, pad=0x7f8880040320 [GstTeePad], tee=0x7f88a802c9d0 [GstTee])
    at gsttee.c:603
#21 0x00007f88bc52ebd2 in gst_tee_handle_data (tee=tee@entry=0x7f88a802c9d0 [GstTee], data=data@entry=0x7f88a80c7af0, is_list=is_list@entry=0) at gsttee.c:683
#22 0x00007f88bc52f0a9 in gst_tee_chain (pad=<optimized out>, parent=0x7f88a802c9d0 [GstTee], buffer=0x7f88a80c7af0) at gsttee.c:766
#23 0x00007f88e29e95cf in gst_pad_push_data (data=0x7f88a80c7af0, type=4112, pad=0x7f88840194a0 [GstPad]) at gstpad.c:4183
#24 0x00007f88e29e95cf in gst_pad_push_data (pad=pad@entry=0x55831967b680 [GstProxyPad], type=type@entry=4112, data=data@entry=0x7f88a80c7af0)---Type <return> to continue, or q <return> to quit---
 at gstpad.c:4435
#25 0x00007f88e29f1533 in gst_pad_push (pad=pad@entry=0x55831967b680 [GstProxyPad], buffer=buffer@entry=0x7f88a80c7af0) at gstpad.c:4554
#26 0x00007f88e29da5e3 in gst_proxy_pad_chain_default (pad=0x55831966f170 [GstGhostPad], parent=<optimized out>, buffer=0x7f88a80c7af0)
    at gstghostpad.c:126
#27 0x00007f88bc75927c in kms_agnostic_bin2_sink_chain (pad=0x55831966f170 [GstGhostPad], parent=0x7f88ac014320 [KmsAgnosticBin2], buffer=0x7f88a80c7af0) at /home/singh1989p/VOICE/new/kms-omni-build/kms-core/src/gst-plugins/kmsagnosticbin.c:1400
#28 0x00007f88e29e95cf in gst_pad_push_data (data=0x7f88a80c7af0, type=4112, pad=0x55831966f170 [GstGhostPad]) at gstpad.c:4183
#29 0x00007f88e29e95cf in gst_pad_push_data (pad=pad@entry=0x7f8884019b60 [GstPad], type=type@entry=4112, data=data@entry=0x7f88a80c7af0)
    at gstpad.c:4435
#30 0x00007f88e29f1533 in gst_pad_push (pad=0x7f8884019b60 [GstPad], buffer=0x7f88a80c7af0) at gstpad.c:4554
#31 0x00007f88dfd7df69 in gst_rtp_base_depayload_push (filter=filter@entry=0x7f88a8071540 [GstRtpH264Depay], out_buf=out_buf@entry=0x7f88a80c7af0) at gstrtpbasedepayload.c:818
#32 0x00007f88dfd7e583 in gst_rtp_base_depayload_handle_buffer (filter=0x7f88a8071540 [GstRtpH264Depay], in=0x7f88983663e0, bclass=<optimized out>, bclass=<optimized out>) at gstrtpbasedepayload.c:466
#33 0x00007f88e29e95cf in gst_pad_push_data (data=0x7f88983663e0, type=4112, pad=0x7f8884019920 [GstPad]) at gstpad.c:4183
#34 0x00007f88e29e95cf in gst_pad_push_data (pad=pad@entry=0x55831966ef00 [GstGhostPad], type=type@entry=4112, data=data@entry=0x7f88983663e0) at gstpad.c:4435
#35 0x00007f88e29f1533 in gst_pad_push (pad=pad@entry=0x55831966ef00 [GstGhostPad], buffer=buffer@entry=0x7f88983663e0) at gstpad.c:4554
#36 0x00007f88e29da5e3 in gst_proxy_pad_chain_default (pad=0x55831967b1e0 [GstProxyPad], parent=<optimized out>, buffer=0x7f88983663e0)
    at gstghostpad.c:126
#37 0x00007f88e29e95cf in gst_pad_push_data (data=0x7f88983663e0, type=4112, pad=0x55831967b1e0 [GstProxyPad]) at gstpad.c:4183
#38 0x00007f88e29e95cf in gst_pad_push_data (pad=pad@entry=0x7f8884019260 [GstPad], type=type@entry=4112, data=data@entry=0x7f88983663e0)
    at gstpad.c:4435
#39 0x00007f88e29f1533 in gst_pad_push (pad=pad@entry=0x7f8884019260 [GstPad], buffer=buffer@entry=0x7f88983663e0) at gstpad.c:4554
#40 0x00007f88bd6a3752 in gst_rtp_pt_demux_chain (pad=<optimized out>, parent=0x558319676900 [GstRtpPtDemux], buf=0x7f88983663e0)
    at gstrtpptdemux.c:442
#41 0x00007f88e29e95cf in gst_pad_push_data (data=0x7f88983663e0, type=4112, pad=0x7f8884018de0 [GstPad]) at gstpad.c:4183
#42 0x00007f88e29e95cf in gst_pad_push_data (pad=pad@entry=0x7f8884018960 [GstPad], type=type@entry=4112, data=data@entry=0x7f88983663e0)
    at gstpad.c:4435
#43 0x00007f88e29f1533 in gst_pad_push (pad=0x7f8884018960 [GstPad], buffer=buffer@entry=0x7f88983663e0) at gstpad.c:4554
#44 0x00007f88bd69d9f0 in pop_and_push_next (jitterbuffer=jitterbuffer@entry=0x7f888001d310 [GstRtpJitterBuffer], seqnum=3490)
    at gstrtpjitterbuffer.c:3058
#45 0x00007f88bd69ea1a in gst_rtp_jitter_buffer_loop (jitterbuffer=0x7f888001d310 [GstRtpJitterBuffer]) at gstrtpjitterbuffer.c:3157
#46 0x00007f88bd69ea1a in gst_rtp_jitter_buffer_loop (jitterbuffer=0x7f888001d310 [GstRtpJitterBuffer]) at gstrtpjitterbuffer.c:3593
#47 0x00007f88e2a1bc5b in gst_task_func (task=0x558319676a70 [GstTask]) at gsttask.c:344
#48 0x00007f88e1dd85ee in g_thread_pool_thread_proxy (data=<optimized out>) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gthreadpool.c:307
#49 0x00007f88e1dd7c55 in g_thread_proxy (data=0x558319679990) at /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gthread.c:780
#50 0x00007f88e3e376ba in start_thread (arg=0x7f886d7b4700) at pthread_create.c:333
---Type <return> to continue, or q <return> to quit---
#51 0x00007f88e150c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109_

Please let me know if you need any further Inputs.

@puneet89
Copy link
Author

puneet89 commented Feb 2, 2019

I can also see following Warnings before crash :

0:04:55.288650955 4912 0x7f64d40734f0 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc1:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:04:55.426536854 4912 0x7f64cc06c8f0 FIXME basesink gstbasesink.c:3125:gst_base_sink_default_event: stream-start event without group-id. Consider implementing group-id handling in the upstream elements
0:04:55.427578970 4912 0x7f64cc0972d0 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc2:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:04:56.186355616 4912 0x7f64b8007800 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc3:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:04:56.187209604 4912 0x7f64b8007c50 FIXME basesink gstbasesink.c:3125:gst_base_sink_default_event: stream-start event without group-id. Consider implementing group-id handling in the upstream elements
0:04:56.771305732 4912 0x7f64b8007800 WARN dtlssrtpdemux gstdtlssrtpdemux.c:137:sink_chain: received invalid buffer: 1
0:05:01.154000575 4912 0x7f64e8067230 WARN rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 8834, expected 8836)
0:06:05.872303842 4912 0x7f64d800db70 FIXME basesink gstbasesink.c:3125:gst_base_sink_default_event: stream-start event without group-id. Consider implementing group-id handling in the upstream elements
0:06:05.884484427 4912 0x7f64e0033590 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc14:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:06:05.920824517 4912 0x7f64adcc1800 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc15:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:06:05.930113558 4912 0x55d453931b20 FIXME basesink gstbasesink.c:3125:gst_base_sink_default_event: stream-start event without group-id. Consider implementing group-id handling in the upstream elements
0:06:05.931547188 4912 0x7f64cc1281e0 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc16:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:06:06.020981418 4912 0x7f64ac6ad400 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc17:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:06:06.086993766 4912 0x7f64d002e0a0 FIXME basesink gstbasesink.c:3125:gst_base_sink_default_event: stream-start event without group-id. Consider implementing group-id handling in the upstream elements
0:06:06.087334088 4912 0x7f64ac289b70 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc18:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:06:06.204968899 4912 0x7f64b00ff280 FIXME basesink gstbasesink.c:3125:gst_base_sink_default_event: stream-start event without group-id. Consider implementing group-id handling in the upstream elements
0:06:06.207235915 4912 0x7f64cc0d5000 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc19:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:06:06.282522923 4912 0x7f64e40379e0 FIXME default gstutils.c:3766:gst_pad_create_stream_id_internal:nicesrc20:src Creating random stream-id, consider implementing a deterministic way of creating a stream-id
0:06:06.513107803 4912 0x7f64cc0d5000 WARN dtlssrtpdemux gstdtlssrtpdemux.c:137:sink_chain:<dtls-srtp-demux
0:05:59.332809174 4912 0x7f64cc0da370 WARN rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 1631, expected 1651)
0:05:59.393099890 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1149:gst_srtp_dec_decode_buffer:srtp-decoder:rtp_sink Unable to unprotect buffer (unprotect failed code 9)
0:05:59.393284015 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1178:gst_srtp_dec_decode_buffer: Other error, dropping
0:06:00.861235983 4912 0x7f64cc0da370 WARN rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 1751, expected 1764)
0:06:00.865357337 4912 0x7f64cc0da370 WARN rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 1752, expected 1764)
0:06:00.999526119 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1149:gst_srtp_dec_decode_buffer:srtp-decoder:rtp_sink Unable to unprotect buffer (unprotect failed code 9)
0:06:00.999629343 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1178:gst_srtp_dec_decode_buffer: Other error, dropping
0:06:01.002740198 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1149:gst_srtp_dec_decode_buffer:srtp-decoder:rtp_sink Unable to unprotect buffer (unprotect failed code 9)
0:06:01.002788213 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1178:gst_srtp_dec_decode_buffer: Other error, dropping
0:06:01.006516376 4912 0x7f64cc0da370 WARN rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 1760, expected 1776)
0:06:01.010574519 4912 0x7f64cc0da370 WARN rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 1761, expected 1776)
0:06:01.101570936 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1149:gst_srtp_dec_decode_buffer:srtp-decoder:rtp_sink Unable to unprotect buffer (unprotect failed code 9)
0:06:01.101606191 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1178:gst_srtp_dec_decode_buffer: Other error, dropping
0:06:01.106527867 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1149:gst_srtp_dec_decode_buffer:srtp-decoder:rtp_sink Unable to unprotect buffer (unprotect failed code 9)
0:06:01.106563187 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1178:gst_srtp_dec_decode_buffer: Other error, dropping
0:06:03.389789239 4912 0x7f64cc0da370 WARN rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 1915, expected 1926)
0:06:03.393055872 4912 0x7f64cc0da370 WARN rtpsource rtpsource.c:1147:update_receiver_stats: duplicate or reordered packet (seqnr 1916, expected 1926)
0:06:03.483342847 4912 0x7f64cc0da370 WARN srtpdec gstsrtpdec.c:1149:gst_srtp_dec_decode_buffer:srtp-decoder:rtp_sink Unable to unprotect buffer (unp

@j1elo
Copy link
Member

j1elo commented Feb 2, 2019

Hi, that looks like an abort message from GLib, the system library, complaining that it wasn't able to allocate some memory. I assume because all memory in the machine was full...

Did you see this message in the log?

failed to allocate NNN bytes

All other Warnings in your other message are unrelated to this issue. The Creating random stream-id, consider implementing a deterministic way of creating a stream-id is a GStreamer warning which doesn't cause any harm, and the other Warnings are more or less part of a typical streaming session.

Did you monitor memory usage during this crash?
It was probably full and the Kernel refused giving any more memory to Kurento, so the malloc() call fails and crashes.

@puneet89
Copy link
Author

puneet89 commented Feb 2, 2019

I observed that RAM was enough ( around 800MB free) But CPU was taking Spike randomly i.e. (200%) 2 core even with 5-6 users with H264 Only Mode. (note: VP8 commented in SdpEndpoint file)
I will run use case again and this time will closely observe (RAM and CPU etc system parameters ) and will share the findings .

@puneet89
Copy link
Author

puneet89 commented Feb 2, 2019

@j1elo
just now i am able to reproduce the crash again .
Findings ( 4-5 fake users Join conference H264 )
I am running kurento usingfollowing commands :
cd /home/user/VOICE/new/kms-omni-build/build-Debug
ulimit -n 65000
ulimit -c unlimited
sleep 1
export GST_DEBUG="3,Kurento*:1,kms*:1,sdp*:1,webrtc*:1,rtpendpoint:1,rtphandler:1,rtpsynchronizer:1,agnosticbin:1"
kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=./config --conf-file=./config/kurento.conf.json --gst-plugin-path=. --logs-path, -d /home/user/VOICE &

i do see error 2019-02-02T16:23:48,204248 11159 0x00007f58ae7fc700 error glib GLib:0 () /build/glib2.0-7ZsPUq/glib2.0-2.48.2/./glib/gmem.c:100: failed to allocate 213942697 bytes

I also noticed sudden spike in RAM (70%) consumption by kurento even with 4-5 fake users video , system had 2.3 GB RAM free when kurento was started.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11358 root 20 0 4092480 1.704g 2524 D 9.1 69.9 0:07.54 kurento-media-s
11360 root 20 0 3142440 386544 584 S 0.5 15.1 0:20.77 java
11558 root 20 0 83860 19032 5900 D 24.1 0.7 0:02.70 apport

Please find attached log file also.
2019-02-02T162924.00000.pid11586.log

Please let me know if you need more details.

@j1elo
Copy link
Member

j1elo commented Feb 2, 2019

failed to allocate 213942697 bytes

That's the most surprising, it tries to allocate 200 MB!

I'd need more information about these things:

  • What is the Media Pipeline in Kurento?
  • What are the characteristics of the video? (resolution, bitrate)
  • Can you see any TRANSCODING is ACTIVE in your logs? (I'm on mobile right now and cannot inspect your log file easily, will do next week)

@puneet89
Copy link
Author

puneet89 commented Feb 7, 2019

Hi @j1elo
sorry for delayed reply

1 .we have created same pipeline as used by java groupcall example. https://doc-kurento.readthedocs.io/en/stable/tutorials/java/tutorial-groupcall.html

  1. Video Resolution ( qcif, 144*176 -15fps, H264)

  2. Transcoding not active as VP8 Codec disabled in SdpEndpoints.conf file , only H264 Video Codec enabled.

Let me know if you need any further info.

Thanks & Regards
Puneet Singh

@puneet89
Copy link
Author

puneet89 commented Feb 12, 2019

HI @j1elo
i am facing same out of memory and cpu spike problems in multiple cases .
RIGHT NOW MY MAIN FOCUS is to RUN 50 USER in SINGLE CONFERENCE and i need your urgent help to know the root cause of cpu getting stuck with VP8 codecs ( no H264 , no transcoding) .

I am trying to run 50 User in Single conference on aws c5.large( 4 GB RAM) instance , but when i reach around 28-34 users CPU goes to 100 % or sometimes RAM utilization also goes full and kurento stuck with 100% cpu ( of all cores). **On same server when i am running 10 User each in 5 Conference or 5 User Each in 10 Conference its working perfectly fine. if 50 user are distributed in different conference no issue is coming ** . I am using latest master branch .
i also tried by increasing RAM to 8 GB( C5.xlarge and 4 core ), same issue coming . so it seems issue does not related to RAM etc but some bug in Kurento which get hit in multiple scenarios.

I noticed following when cpu get stuck , can following trace can help to know the root cause ? let me know if you need further details

root 12594 1 99 10:42 pts/1 00:31:45 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=./config --conf-file=./config/kurento.conf.json --gst-plugin-path=. --logs-path, -d /home/ubuntu/VOICE/
root@ip-20-3-20-98:/home/ubuntu# cat /proc/12594/stack
[] exit_to_usermode_loop+0x72/0xd0
[] prepare_exit_to_usermode+0x26/0x30
[] retint_user+0x8/0x5a
[] 0xffffffffffffffff

root@ip-20-3-20-98:/home/ubuntu# cat /proc/12594/stack
[] exit_to_usermode_loop+0x72/0xd0
[] prepare_exit_to_usermode+0x26/0x30
[] retint_user+0x8/0x5a
[] 0xffffffffffffffff
root@ip-20-3-20-98:/home/ubuntu# cat /proc/12594/stack
[] futex_wait_queue_me+0xc0/0x120
[] futex_wait+0x116/0x270
[] do_futex+0x120/0x5a0
[] SyS_futex+0x81/0x180
[] entry_SYSCALL_64_fastpath+0x22/0xcb
[] 0xffffffffffffffff
root@ip-20-3-20-98:/home/ubuntu#

@puneet89
Copy link
Author

Number of Threads 5386 for around 34-40 user in single Conference , each thread taking 1 % cpu it seems every thread got stuck somewhere

top - 13:33:09 up 46 min, 2 users, load average: 180.20, 158.27, 102.17
Threads: 5368 total, 179 running, 5189 sleeping, 0 stopped, 0 zombie
%Cpu(s): 99.0 us, 1.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 3800968 total, 482804 free, 2699996 used, 618168 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 762444 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8137 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 1:21.06 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
15971 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.49 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
15975 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.61 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
15982 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.57 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
15983 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.67 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
15984 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.66 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
15991 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.58 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
15995 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.66 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
16006 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.63 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
16008 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.70 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
16016 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.58 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
16017 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:10.34 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
16026 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:13.15 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
16511 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:06.32 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
16513 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:07.87 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
16516 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.35 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17014 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:01.60 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17021 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.38 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17029 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.26 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17032 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.26 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17033 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.26 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17034 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.26 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17036 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.26 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17072 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.25 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17087 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.20 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17097 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.20 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17101 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.20 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17130 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.20 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17131 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.20 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17132 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.20 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17135 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.20 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17137 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.20 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
17145 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:05.23 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+
1596 root 20 0 44.161g 1.267g 37132 R 1.1 35.0 0:12.27 kurento-media-server/server/kurento-media-server --modules-path=. --modules-config-path=.+

@puneet89
Copy link
Author

puneet89 commented Mar 8, 2019

Hi ,

we have implemented horizontal scaling to scale single conference across multiple kurento media server as we were not able to run big conference on single kurento Instance due to cpu spike issue,But even in this case also we got kurento cpu spike as described below .
Can anyone help to figure out the reason of kurento cpu spike after looking below gdb thread logs.

Conference Distributed across 5 c5.large aws VM's .
1 Conference - 35 user ( 7 user on each VM)
6 more conference - 30 users ( 5 user in each Conference running on First VM, so that first VM have total 37 users across multiple conference ) , CPU USAGE at this point was 46%.

As soon as I started 8th conference Kurento CPU spikes to 100% and gdb dump of all threads as below :

AWS Media Server Stats:
IP : 20.3.20.84 User Count : 37
IP : 20.3.20.92 User Count : 7
IP : 20.3.20.230 User Count : 7
IP : 20.3.20.17 User Count : 7
IP : 20.3.20.64 User Count : 7


Media Server Stats per MeetingId
Counts for Meeting Id: 162171330
IP : 20.3.20.17 Count : 7
IP : 20.3.20.230 Count : 7
IP : 20.3.20.64 Count : 7
IP : 20.3.20.84 Count : 7
P : 20.3.20.92 Count : 7
Counts for Meeting Id: 142170783
IP : 20.3.20.84 Count : 5
Counts for Meeting Id: 845120154
IP : 20.3.20.84 Count : 5
Counts for Meeting Id: 439580888
IP : 20.3.20.84 Count : 5
Counts for Meeting Id: 513497678
IP : 20.3.20.84 Count : 5
Counts for Meeting Id: 878191275
IP : 20.3.20.84 Count : 5
Counts for Meeting Id: 194074432
IP : 20.3.20.84 Count : 5

gdb thread LOGS attached .
gdb_thread_log.txt

Thanks & Regards
Puneet Singh

@j1elo
Copy link
Member

j1elo commented Mar 10, 2019

Hi, thanks for taking the time to get the traces, but lines like

#2  0x00007f79ece2d966 in ?? () from /usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstcoreelements.so

won't help much to know where is the problem (apart from knowing that is somewhere in GStreamer).

Please see: #313 (comment)

@puneet89
Copy link
Author

okay @j1elo , i will try to share more details .

@puneet89
Copy link
Author

Hi @j1elo
Please find attached DEBUG LEVEL backtrace .

thread_debug_logs.txt

Regards
Puneet Singh

@j1elo
Copy link
Member

j1elo commented Mar 20, 2019

@puneet89 We've been lately studying a similar crash in Kurento that happened because of a bug in GStreamer H.264 parser code: #352

This bug had been fixed in a more modern GStreamer version than that of the Kurento fork. @prlanzarin contributed a cherry-pick backport of the relevant commit in Kurento/gst-plugins-good#2, so now we have that bug solved.

The symptoms of that bug are very similar to the observations in your bug report, so maybe they are actually caused by the same bug. Please install Kurento nightly or apt-get upgrade if you already have nightly installed, then test your use case and let us know if the issue is gone.

@puneet89
Copy link
Author

@j1elo Thanks !
i have upgraded to nighlty release ,Now will test all my use case and will update you the result.

@j1elo j1elo self-assigned this Jun 2, 2019
@puneet89
Copy link
Author

puneet89 commented Jun 3, 2019

This issue is not coming anymore , just sometimes i see following warning

(kurento-media-server:26306): GStreamer-CRITICAL **: Element rtpbin33 already has a pad named send_rtp_sink_1, the behaviour of gst_element_get_request_pad() for existing pads is undefined!

(kurento-media-server:26306): GStreamer-CRITICAL **: Element rtpbin34 already has a pad named send_rtp_sink_1, the behaviour of gst_element_get_request_pad() for existing pads is undefined!

No out of memory ,no crash observed.

@j1elo
Copy link
Member

j1elo commented Jun 5, 2019

This issue is not coming anymore
[...]
No out of memory ,no crash observed.

Hi, so just to confirm: This issue does not happen any more to you since you installed Kurento nightly, is that correct?

@j1elo j1elo closed this as completed Sep 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants