Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

最近遇到一个偶发性的 Segmentation fault ,请问云大,这个一般是什么问题导致? #813

Closed
sue602 opened this issue Apr 4, 2018 · 4 comments

Comments

@sue602
Copy link

sue602 commented Apr 4, 2018

备注:skynet 一直是最新版本。

Program terminated with signal 11, Segmentation fault.
#0 0x000000000040fc10 in lua_rawgeti (L=0x7f0478886268, idx=, n=3) at lapi.c:663
663 lapi.c: No such file or directory.
in lapi.c
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 glibc-2.12-1.209.el6_9.2.x86_64 keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libcurl-7.19.7-53.el6_9.x86_64 libgcc-4.4.7-18.el6.x86_64 libidn-1.18-2.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 libssh2-1.4.2-2.el6_7.1.x86_64 libstdc++-4.4.7-18.el6.x86_64 libuuid-2.17.2-12.28.el6_9.1.x86_64 nspr-4.13.1-1.el6.x86_64 nss-3.28.4-4.el6_9.x86_64 nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64 nss-util-3.28.4-1.el6_9.x86_64 openldap-2.4.40-16.el6.x86_64 openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) where
#0 0x000000000040fc10 in lua_rawgeti (L=0x7f0478886268, idx=, n=3) at lapi.c:663
#1 0x00007f04a69f7446 in ?? ()
#2 0x000000000000002f in ?? ()
#3 0x00007f03b6133fa0 in ?? ()
#4 0x0000000000000081 in ?? ()
#5 0x0000000000000001 in ?? ()
#6 0x00007f03b61347c6 in ?? ()
#7 0x0000003d2ae18488 in Curl_client_write () from /usr/lib64/libcurl.so.4
#8 0x0000003d2ae2b016 in Curl_readwrite () from /usr/lib64/libcurl.so.4
#9 0x0000003d2ae2cbf8 in Curl_perform () from /usr/lib64/libcurl.so.4
#10 0x00007f04a69f88f4 in ?? ()
#11 0x00007f02eae3fa98 in ?? ()
#12 0x00007f0433cdd948 in ?? ()
#13 0x00007f0297fc8a60 in ?? ()
#14 0x0000000000413593 in luaD_precall (L=0x7f0433cdd948, func=0x7f03284f9130, nresults=-1) at ldo.c:434
#15 0x000000000041ebae in luaV_execute (L=0x7f0433cdd948) at lvm.c:1162
#16 0x000000000041381b in luaD_call (L=0x7f0433cdd948, func=, nResults=) at ldo.c:499
#17 0x00000000004100ec in lua_pcallk (L=0x7f0433cdd948, nargs=0, nresults=-1, errfunc=, ctx=2, k=) at lapi.c:981
#18 0x000000000042695f in luaB_xpcall (L=0x7f0433cdd948) at lbaselib.c:441
#19 0x0000000000413593 in luaD_precall (L=0x7f0433cdd948, func=0x7f03284f90e0, nresults=4) at ldo.c:434
#20 0x000000000041ec89 in luaV_execute (L=0x7f0433cdd948) at lvm.c:1146
#21 0x000000000041381b in luaD_call (L=0x7f0433cdd948, func=, nResults=) at ldo.c:499
#22 0x00000000004100ec in lua_pcallk (L=0x7f0433cdd948, nargs=3, nresults=-1, errfunc=, ctx=2, k=) at lapi.c:981
#23 0x000000000042695f in luaB_xpcall (L=0x7f0433cdd948) at lbaselib.c:441
#24 0x0000000000413593 in luaD_precall (L=0x7f0433cdd948, func=0x7f03284f8f80, nresults=3) at ldo.c:434
#25 0x000000000041ec89 in luaV_execute (L=0x7f0433cdd948) at lvm.c:1146
#26 0x00000000004132d0 in unroll (L=0x7f0433cdd948, ud=) at ldo.c:556
#27 0x00000000004127ee in luaD_rawrunprotected (L=0x7f0433cdd948, f=0x4136e0 , ud=0x7f04b11f86dc) at ldo.c:142
#28 0x0000000000412a87 in lua_resume (L=0x7f0433cdd948, from=, nargs=6) at ldo.c:664
#29 0x0000000000427877 in auxresume (L=0x7f04786e6048, co=0x7f0433cdd948, narg=6) at lcorolib.c:39
#30 0x0000000000427b27 in luaB_coresume (L=0x7f04786e6048) at lcorolib.c:60
#31 0x0000000000413593 in luaD_precall (L=0x7f04786e6048, func=0x7f047871f3d0, nresults=-1) at ldo.c:434
#32 0x000000000041ec89 in luaV_execute (L=0x7f04786e6048) at lvm.c:1146
#33 0x000000000041381b in luaD_call (L=0x7f04786e6048, func=, nResults=) at ldo.c:499
#34 0x0000000000413851 in luaD_callnoyield (L=0x7f04786e6048, func=, nResults=) at ldo.c:509
#35 0x00000000004127ee in luaD_rawrunprotected (L=0x7f04786e6048, f=0x410110 <f_call>, ud=0x7f04b11f8a20) at ldo.c:142
#36 0x000000000041286f in luaD_pcall (L=0x7f04786e6048, func=, u=, old_top=176, ef=) at ldo.c:729
#37 0x0000000000410042 in lua_pcallk (L=0x7f04786e6048, nargs=5, nresults=-1, errfunc=, ctx=0, k=) at lapi.c:969
#38 0x0000000000426a40 in luaB_pcall (L=0x7f04786e6048) at lbaselib.c:424
#39 0x0000000000413593 in luaD_precall (L=0x7f04786e6048, func=0x7f047871f2f0, nresults=2) at ldo.c:434
#40 0x000000000041ec89 in luaV_execute (L=0x7f04786e6048) at lvm.c:1146
#41 0x000000000041381b in luaD_call (L=0x7f04786e6048, func=, nResults=) at ldo.c:499
#42 0x0000000000413851 in luaD_callnoyield (L=0x7f04786e6048, func=, nResults=) at ldo.c:509
#43 0x00000000004127ee in luaD_rawrunprotected (L=0x7f04786e6048, f=0x410110 <f_call>, ud=0x7f04b11f8d20) at ldo.c:142
#44 0x000000000041286f in luaD_pcall (L=0x7f04786e6048, func=, u=, old_top=48, ef=) at ldo.c:729
#45 0x0000000000410042 in lua_pcallk (L=0x7f04786e6048, nargs=5, nresults=0, errfunc=, ctx=0, k=) at lapi.c:969
#46 0x00007f04ac9dc1b3 in ?? ()
#47 0x00007f04662b6520 in ?? ()
#48 0x00007f040346b070 in ?? ()
#49 0x00007f04b11f8da0 in ?? ()
#50 0x00007f01a0c750e0 in ?? ()
#51 0x00007f04786d6f20 in ?? ()
#52 0x00007f04786d6f20 in ?? ()
#53 0x00007f04b11f8e10 in ?? ()
#54 0x000000000000001a in ?? ()
#55 0x000000000000000a in ?? ()
#56 0x0000000000000001 in ?? ()
#57 0x00007f04b11f8e10 in ?? ()
---Type to continue, or q to quit---
#58 0x0000000000408a1d in dispatch_message (ctx=0x7f04786e6048, msg=0x0) at skynet-src/skynet_server.c:274
#59 0x0000000000408d1f in skynet_context_message_dispatch (sm=0x15c1b40, q=0x7f04786d6fa0, weight=-1) at skynet-src/skynet_server.c:334
#60 0x0000000000409a6d in thread_worker (p=) at skynet-src/skynet_start.c:162
#61 0x0000003d1e207aa1 in start_thread () from /lib64/libpthread.so.0
#62 0x0000003d1dee8bcd in clone () from /lib64/libc.so.6

@sue602
Copy link
Author

sue602 commented Apr 4, 2018

还有一段监控系统捕捉到的错误信息,系统使用阿里云上的centos 6.5

Apr 3 14:29:58 iZrj96t0x8tgt7jphqi4iwZ kernel: skynet[21447]: segfault at 4a ip 000000000040fc0d sp 00007fe5bd186f60 error 4 in skynet[400000+42000]
Apr 3 14:30:01 iZrj96t0x8tgt7jphqi4iwZ ntpdate[27099]: adjust time server 13.65.245.138 offset 0.013942 sec
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: INFO: task skynet:21441 blocked for more than 120 seconds.
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: Not tainted 2.6.32-696.18.7.el6.x86_64 #1
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: skynet D 0000000000000001 0 21441 1 0x00000000
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: ffff88092255bc98 0000000000000082 0000000000000000 ffff8810227ff680
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: ffff881021e19520 ffff8810227ff6e8 0000ca0122baec05 ffff88092255bc30
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: ffffffff8154ea76 000000010d387e8f ffff881021e19ad8 ffff88092255bfd8
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: Call Trace:
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] ? rwsem_down_read_failed+0x26/0x30
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] ? call_rwsem_down_read_failed+0x14/0x30
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] exit_mm+0x95/0x180
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] do_exit+0x15f/0x850
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] ? __sigqueue_free+0x3d/0x50
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] ? sched_clock_local+0x25/0x90
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] ? __dequeue_signal+0x102/0x200
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] do_group_exit+0x58/0xd0
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] get_signal_to_deliver+0x1f6/0x460
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] do_signal+0x75/0x870
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] ? wake_up_new_task+0xd3/0x120
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] ? check_for_xstate+0x3b/0x90
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] ? sys_futex+0x7b/0x170
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] do_notify_resume+0x90/0xc0
Apr 3 14:32:13 iZrj96t0x8tgt7jphqi4iwZ kernel: [] int_signal+0x12/0x17

@cloudwu
Copy link
Owner

cloudwu commented Apr 4, 2018

我就猜猜:

看起来调用到 curl 库里去了, 但是最后又 lua_rawgeti ,curl 里应该不会回调 lua 且前面有 ??? 。我觉得可能是 curl 访问的某块内存的生命期管理有问题。也可能是 curl 封装的时候把 stackframe 写坏了。

@Edimier
Copy link

Edimier commented Apr 12, 2018

个人建议最好不要在业务层调用系统的curl,我曾遇到过curl的调用造成skynet报‘maybe in an endless loop’的问题,还遇到过curl库本身的内存问题。我个人的建议是使用skynet提供的一些工具,自己写一个http请求的服务,这样更可控。

@sue602
Copy link
Author

sue602 commented Apr 23, 2018

感谢各位,后面使用skynet提供的工具解决的。
有时候https需要用到curl的东西,不然skynet也没有提供。

@cloudwu cloudwu closed this as completed Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants