-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core dump under macosx #99
Comments
我今天用 master 的 TOP 版本在 linux 上做了测试. 驱动了 2^25 次 agent 可以正常做完. 从上面的 core 信息看,
引起崩溃的地方在这里. 因为 lua 的 lalloc 的 ud 只能是 0 , 这里是 0x00000000017dc19c 应该是被其它地方写坏了内存. 但, 诡异的是, lua 的 global state 结构里, frealloc 函数指针和 ud 是排布在一起的。这里显然 frealloc 函数指针是正确的,但是 ud 却被修改为一个非 0 的值。 另外 osize 也太大了. 是错误的值. (多半是在销毁 string 的时候取出了错误的 string 长度) 如果有 core 文件, 可以看一下到底 global state 结构被什么东西篡改了. 有可能可以判断出问题的起因。 其实只是启动和销毁 service 是很简单的业务. 问题应该容易查证. 我尝试把 jemalloc 替换回 malloc , 似乎没有问题. 进一步确认问题, 还需要点时间排查. (可以考虑替换分配器, 关掉 -O 优化等) 如果可能, 协助找到 bug . |
后来试了几次, 再也不 core dump 了 :( |
这个问题应该是 jemalloc 造成的. 因为 skynet 没有直接使用 jemalloc 的 malloc zone (在 macosx 下) 缺少了 malloc zone 的 目前的解决方案是在 macosx 下关闭 jemalloc , 使用标准库的 malloc 就没有问题了. |
写了一个简单的生成、销毁agent的测试例子;配置8thread,开8个service,分别做循环启动、销毁agent(总共2^24次),想知道服务节点id超过24bit后会怎样?
每次都会跑一半core掉,如下
$ ulimit -c unlimited
$ ./skynet example/config
......
[:17dc074] LAUNCH snlua agent
[:17dc075] LAUNCH snlua agent
[:17dc076] LAUNCH snlua agent
[:17dc077] LAUNCH snlua agent
[:17dc078] LAUNCH snlua agent
[:17dc079] LAUNCH snlua agent
[:17dc07a] LAUNCH snlua agent
[:17dc07b] LAUNCH snlua agent
[:100000e] KILL :17dc074
[:17dc074] exit
[:100001e] KILL :17dc075
[:17dc075] exit
[:1000010] KILL :17dc076
[:17dc076] exit
[:100000f] KILL :17dc077
[:17dc077] exit
[:100001d] K./start.sh: line 3: 684 Segmentation fault: 11 (core dumped) ./skynet example/config
===========core dump如下=============
(lldb) bt all
thread some fixes #1: tid = 0x0000, 0x00007fff8b0e2a3a libsystem_kernel.dylib`__semwait_signal + 10, stop reason = signal SIGSTOP
__semwait_signal + 10 frame #1: 0x00007fff96ce17f3 libsystem_pthread.dylib
pthread_join + 433frame a bug #2: 0x000000010a6c8b7a skynet
_start(thread=<unavailable>) + 442 at skynet_start.c:174 frame #3: 0x000000010a6c8956 skynet
skynet_start(config=0x00007fff5553a7f8) + 214 at skynet_start.c:222frame Mac OSX 支持补丁 #4: 0x000000010a6c61f6 skynet`main(argc=, argv=) + 1158 at skynet_main.c:131
thread a bug #2: tid = 0x0001, 0x00007fff8b0e2a3a libsystem_kernel.dylib
__semwait_signal + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e2a3a libsystem_kernel.dylib
__semwait_signal + 10frame some fixes #1: 0x00007fff97fd6dc0 libsystem_c.dylib
nanosleep + 200 frame #2: 0x00007fff97fd6c1f libsystem_c.dylib
sleep + 42frame can't compile #3: 0x000000010a6c8cc5 skynet
_monitor(p=0x000000010b186020) + 101 at skynet_start.c:91 frame #4: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138frame Fixbug #5: 0x00007fff96cdd72a libsystem_pthread.dylib`_pthread_start + 137
thread can't compile #3: tid = 0x0002, 0x00007fff8b0e2a3a libsystem_kernel.dylib
__semwait_signal + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e2a3a libsystem_kernel.dylib
__semwait_signal + 10frame some fixes #1: 0x00007fff97fd6dc0 libsystem_c.dylib
nanosleep + 200 frame #2: 0x00007fff97fd6cb2 libsystem_c.dylib
usleep + 54frame can't compile #3: 0x000000010a6c8d2b skynet
_timer(p=0x000000010b186020) + 59 at skynet_start.c:105 frame #4: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138frame Fixbug #5: 0x00007fff96cdd72a libsystem_pthread.dylib`_pthread_start + 137
thread Mac OSX 支持补丁 #4: tid = 0x0003, 0x00007fff8b0e364a libsystem_kernel.dylib
kevent + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e364a libsystem_kernel.dylib
kevent + 10frame some fixes #1: 0x000000010a6ca7c3 skynet
socket_server_poll [inlined] sp_wait(max=<unavailable>) + 5 at socket_kqueue.h:70 frame #2: 0x000000010a6ca7be skynet
socket_server_poll(ss=0x000000010b400000, result=0x000000010aa0fe60, more=0x000000010aa0fe5c) + 382 at socket_server.c:835frame can't compile #3: 0x000000010a6c9bad skynet
skynet_socket_poll + 45 at skynet_socket.c:75 frame #4: 0x000000010a6c8db9 skynet
_socket(p=0x000000010b186020) + 89 at skynet_start.c:54frame Fixbug #5: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138 frame #6: 0x00007fff96cdd72a libsystem_pthread.dylib
_pthread_start + 137thread Fixbug #5: tid = 0x0004, 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10frame some fixes #1: 0x00007fff96cdfc3b libsystem_pthread.dylib
_pthread_cond_wait + 727 frame #2: 0x000000010a6c8e36 skynet
_worker(p=) + 102 at skynet_start.c:127frame can't compile #3: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138 frame #4: 0x00007fff96cdd72a libsystem_pthread.dylib
_pthread_start + 137thread 最小化兼容处理 #6: tid = 0x0005, 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10frame some fixes #1: 0x00007fff96cdfc3b libsystem_pthread.dylib
_pthread_cond_wait + 727 frame #2: 0x000000010a6c8e36 skynet
_worker(p=) + 102 at skynet_start.c:127frame can't compile #3: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138 frame #4: 0x00007fff96cdd72a libsystem_pthread.dylib
_pthread_start + 137thread BUG: expand buffer 后所取的 slot 不正确,与下次要取的 slot 是同一个 slot。 #7: tid = 0x0006, 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10frame some fixes #1: 0x00007fff96cdfc3b libsystem_pthread.dylib
_pthread_cond_wait + 727 frame #2: 0x000000010a6c8e36 skynet
_worker(p=) + 102 at skynet_start.c:127frame can't compile #3: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138 frame #4: 0x00007fff96cdd72a libsystem_pthread.dylib
_pthread_start + 137thread bug handle_name没有正确修改 #8: tid = 0x0007, 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10frame some fixes #1: 0x00007fff96cdfc3b libsystem_pthread.dylib
_pthread_cond_wait + 727 frame #2: 0x000000010a6c8e36 skynet
_worker(p=) + 102 at skynet_start.c:127frame can't compile #3: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138 frame #4: 0x00007fff96cdd72a libsystem_pthread.dylib
_pthread_start + 137thread Create test #9: tid = 0x0008, 0x000000010a6cca9e skynet
skynet_lalloc [inlined] skynet_free(ptr=0x000000010e051be0) + 111 at malloc_hook.c:173, stop reason = signal SIGSTOP frame #0: 0x000000010a6cca9e skynet
skynet_lalloc [inlined] skynet_free(ptr=0x000000010e051be0) + 111 at malloc_hook.c:173frame some fixes #1: 0x000000010a6cca2f skynet
skynet_lalloc(ud=0x00000000017dc19c, ptr=0x000000010e051be0, osize=4470229392, nsize=<unavailable>) + 31 at malloc_hook.c:194 frame #2: 0x000000010a6d8d87 skynet
luaM_realloc_ + 39frame can't compile #3: 0x000000010a6d5c45 skynet
sweeplist + 405 frame #4: 0x000000010a6d5a96 skynet
luaC_freeallobjects + 230frame Fixbug #5: 0x000000010a6dd6f2 skynet
close_state + 34 frame #6: 0x000000010a8867e1 snlua.so
snlua_release(l=0x000000010e0658a0) + 17 at service_snlua.c:277frame BUG: expand buffer 后所取的 slot 不正确,与下次要取的 slot 是同一个 slot。 #7: 0x000000010a6c7994 skynet
skynet_context_release [inlined] _delete_context(ctx=0x000000010e082be0) + 12 at skynet_server.c:152 frame #8: 0x000000010a6c7988 skynet
skynet_context_release(ctx=0x000000010e082be0) + 24 at skynet_server.c:161frame Create test #9: 0x000000010a6c6471 skynet
skynet_handle_retire(handle=25018780) + 113 at skynet_handle.c:79 frame #10: 0x000000010a6c8367 skynet
skynet_command [inlined] handle_exit(context=, handle=) + 5 at skynet_server.c:287frame The header file "inet.h" is included twice in the file lua_socket.c in line 13 and 14 #11: 0x000000010a6c8362 skynet
skynet_command(context=<unavailable>, cmd=<unavailable>, param=<unavailable>) + 1538 at skynet_server.c:372 frame #12: 0x000000010ab9c63f skynet.so
_command(L=0x000000010c49d2c0) + 95 at lua-skynet.c:85frame Fix bug in connection service #13: 0x000000010a6d3b08 skynet
luaD_precall + 520 frame #14: 0x000000010a6e153b skynet
luaV_execute + 1915frame gate bugs #15: 0x000000010a6d44c0 skynet
unroll + 160 frame #16: 0x000000010a6d35d6 skynet
luaD_rawrunprotected + 86frame bug fix in gate #17: 0x000000010a6d4153 skynet
lua_resume + 83 frame #18: 0x000000010a6e6482 skynet
auxresume + 82frame Little slip #19: 0x000000010a6e61b9 skynet
luaB_coresume + 73 frame #20: 0x000000010aba2bdd profile.so
lresume(L=0x000000010c41f200) + 189 at lua-profile.c:105frame message queue should shrink #21: 0x000000010a6d3b08 skynet
luaD_precall + 520 frame #22: 0x000000010a6e153b skynet
luaV_execute + 1915frame compat52.c luaL_traceback #23: 0x000000010a6d40c2 skynet
luaD_call + 66 frame #24: 0x000000010a6d35d6 skynet
luaD_rawrunprotected + 86frame socket.open failed #25: 0x000000010a6d4588 skynet
luaD_pcall + 56 frame #26: 0x000000010a6cf157 skynet
lua_pcallk + 215frame socket.write failed #27: 0x000000010a6e524c skynet
luaB_pcall + 76 frame #28: 0x000000010a6d3b08 skynet
luaD_precall + 520frame 小bug:luacompat/compat52.c:116 行返回语句多了一个 return #29: 0x000000010a6e153b skynet
luaV_execute + 1915 frame #30: 0x000000010a6d40c2 skynet
luaD_call + 66frame sendname bug #31: 0x000000010a6d35d6 skynet
luaD_rawrunprotected + 86 frame #32: 0x000000010a6d4588 skynet
luaD_pcall + 56frame Completion of error handling #33: 0x000000010a6cf157 skynet
lua_pcallk + 215 frame #34: 0x000000010ab9c8d6 skynet.so
_cb(context=0x000000010c498160, ud=0x000000010c41f200, type=, session=708227, source=16777222, msg=0x000000010c489250, sz=) + 182 at lua-skynet.c:33frame cause to "Aborted (core dumped)" #35: 0x000000010a6c7c0d skynet
skynet_context_message_dispatch [inlined] _dispatch_message(ctx=<unavailable>, msg=0x000ace8301000006) + 68 at skynet_server.c:205 frame #36: 0x000000010a6c7bc9 skynet
skynet_context_message_dispatch(sm=0x000000010b0104e0) + 217 at skynet_server.c:241frame skynet-src/skynet_mq.c racing condition #37: 0x000000010a6c8e08 skynet
_worker(p=<unavailable>) + 56 at skynet_start.c:121 frame #38: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138frame 无法超过两根工作线程同时工作 #39: 0x00007fff96cdd72a libsystem_pthread.dylib`_pthread_start + 137
thread Update connection/connection.c #10: tid = 0x0009, 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10frame some fixes #1: 0x00007fff96cdfc3b libsystem_pthread.dylib
_pthread_cond_wait + 727 frame #2: 0x000000010a6c8e36 skynet
_worker(p=) + 102 at skynet_start.c:127frame can't compile #3: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138 frame #4: 0x00007fff96cdd72a libsystem_pthread.dylib
_pthread_start + 137thread The header file "inet.h" is included twice in the file lua_socket.c in line 13 and 14 #11: tid = 0x000a, 0x000000010a6ddae8 skynet
luaS_newlstr + 232, stop reason = signal SIGSTOP frame #0: 0x000000010a6ddae8 skynet
luaS_newlstr + 232frame some fixes #1: 0x000000010a6ce37e skynet
lua_getglobal + 62 frame #2: 0x000000010a6c9968 skynet
skynet_getenv(key=) + 56 at skynet_env.c:26frame can't compile #3: 0x000000010a6c8392 skynet
skynet_command(context=<unavailable>, cmd=<unavailable>, param=0x000000010d932718) + 1586 at skynet_server.c:409 frame #4: 0x000000010ab9c63f skynet.so
_command(L=0x000000010d499700) + 95 at lua-skynet.c:85frame Fixbug #5: 0x000000010a6d3b08 skynet
luaD_precall + 520 frame #6: 0x000000010a6e1587 skynet
luaV_execute + 1991frame BUG: expand buffer 后所取的 slot 不正确,与下次要取的 slot 是同一个 slot。 #7: 0x000000010a6d40c2 skynet
luaD_call + 66 frame #8: 0x000000010a6cf059 skynet
lua_callk + 73frame Create test #9: 0x000000010a6eda49 skynet
ll_require + 489 frame #10: 0x000000010a6d3b08 skynet
luaD_precall + 520frame The header file "inet.h" is included twice in the file lua_socket.c in line 13 and 14 #11: 0x000000010a6e153b skynet
luaV_execute + 1915 frame #12: 0x000000010a6d40c2 skynet
luaD_call + 66frame Fix bug in connection service #13: 0x000000010a6d35d6 skynet
luaD_rawrunprotected + 86 frame #14: 0x000000010a6d4588 skynet
luaD_pcall + 56frame gate bugs #15: 0x000000010a6cf157 skynet
lua_pcallk + 215 frame #16: 0x000000010a8866b6 snlua.so
_init(l=, ctx=0x000000010d4a3890, args=) + 566 at service_snlua.c:223frame bug fix in gate #17: 0x000000010a886341 snlua.so
_launch(context=0x000000010d4a3890, ud=0x000000010d448f80, type=<unavailable>, session=<unavailable>, source=<unavailable>, msg=0x000000010cda1370, sz=6) + 113 at service_snlua.c:242 frame #18: 0x000000010a6c7c0d skynet
skynet_context_message_dispatch [inlined] _dispatch_message(ctx=, msg=0x00000000017dc1a7) + 68 at skynet_server.c:205frame Little slip #19: 0x000000010a6c7bc9 skynet
skynet_context_message_dispatch(sm=0x000000010b010520) + 217 at skynet_server.c:241 frame #20: 0x000000010a6c8e08 skynet
_worker(p=) + 56 at skynet_start.c:121frame message queue should shrink #21: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138 frame #22: 0x00007fff96cdd72a libsystem_pthread.dylib
_pthread_start + 137thread skynet #12: tid = 0x000b, 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10, stop reason = signal SIGSTOP frame #0: 0x00007fff8b0e2716 libsystem_kernel.dylib
__psynch_cvwait + 10frame some fixes #1: 0x00007fff96cdfc3b libsystem_pthread.dylib
_pthread_cond_wait + 727 frame #2: 0x000000010a6c8e36 skynet
_worker(p=) + 102 at skynet_start.c:127frame can't compile #3: 0x00007fff96cdd899 libsystem_pthread.dylib
_pthread_body + 138 frame #4: 0x00007fff96cdd72a libsystem_pthread.dylib
_pthread_start + 137The text was updated successfully, but these errors were encountered: