Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snlua_release core down #249

Closed
twohouses opened this issue Mar 11, 2015 · 9 comments
Closed

snlua_release core down #249

twohouses opened this issue Mar 11, 2015 · 9 comments

Comments

@twohouses
Copy link

版本为v0.9.3 环境为ubuntu 14.04 64bit

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000042e0b5 in je_tcache_dalloc_small (binind=, ptr=,

tcache=<optimized out>) at include/jemalloc/internal/tcache.h:370

370 tbin->avail[tbin->ncached] = ptr;
(gdb) bt
#0 0x000000000042e0b5 in je_tcache_dalloc_small (binind=, ptr=,

tcache=<optimized out>) at include/jemalloc/internal/tcache.h:370

#1 je_arena_dalloc (try_tcache=, ptr=, chunk=)

at include/jemalloc/internal/arena.h:1014

#2 je_idalloct (try_tcache=, ptr=)

at include/jemalloc/internal/jemalloc_internal.h:830

#3 je_iqalloct (try_tcache=, ptr=)

at include/jemalloc/internal/jemalloc_internal.h:849

#4 je_iqalloc (ptr=) at include/jemalloc/internal/jemalloc_internal.h:856
#5 ifree (ptr=) at src/jemalloc.c:1249
#6 je_free (ptr=0x7f010a806000) at src/jemalloc.c:1324
#7 0x000000000040dfbb in free (ptr=) at skynet-src/malloc_hook.c:154
#8 0x000000000040e139 in skynet_lalloc (ud=, ptr=,

osize=<optimized out>, nsize=<optimized out>) at skynet-src/malloc_hook.c:221

#9 0x0000000000414d9d in luaM_realloc_ ()
#10 0x0000000000413310 in sweeplist ()
#11 0x0000000000414982 in luaC_freeallobjects ()
#12 0x00000000004188de in close_state ()
#13 0x00007f0112dfe4dc in snlua_release (l=0x7f010b91d300) at service-src/service_snlua.c:156
#14 0x0000000000408f66 in delete_context (ctx=0x7f0109036630) at skynet-src/skynet_server.c:198
#15 skynet_context_release (ctx=ctx@entry=0x7f0109036630) at skynet-src/skynet_server.c:207
#16 0x00000000004094c8 in skynet_context_message_dispatch (sm=sm@entry=0x7f011380e420,

q=0x7f01094f2d00, q@entry=0x7f0109881e80, weight=weight@entry=-1)
at skynet-src/skynet_server.c:322

#17 0x0000000000409ae1 in _worker (p=) at skynet-src/skynet_start.c:128
#18 0x00007f0114cef182 in start_thread (arg=0x7f010fdf6700) at pthread_create.c:312
#19 0x00007f011430a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@twohouses twohouses changed the title core down snlua_release core down Mar 11, 2015
@twohouses
Copy link
Author

不是必挂,而是具有随机性。

@cloudwu
Copy link
Owner

cloudwu commented Mar 11, 2015

没有测试案例, 无法重现.
这是你的 lua 程序在关闭虚拟机时释放内存的问题, 不是 skynet 的问题.

@twohouses
Copy link
Author

发送给客户端的包太大,导致一系列随机出现的core down情况

@cloudwu
Copy link
Owner

cloudwu commented Mar 11, 2015

你可以制造出出错的场景, 并查一下在释放什么 C 对象时出的错。比较大的可能是对同一指针多次释放。

你也可以修改 malloc hook 自己加一些调试信息方便定位。

@twohouses
Copy link
Author

proto:
.test1{
atrr1 0:integer
atrr2 1:string
atrr3 2:*integer
}

.test2{
atrr1 0:integer
atrr2 1:string
atrr3 2:*test1
}

.test3{
testa1 0:integer
testa2 1:integer
testa3 2:*integer
}

#测试
test 999 {
request {

}
response {
    value1     0:*test1
    value2      1:*test2
    value3       2:*test3
}

}

agent代码:
local function make_test1( num )
local value1={}
for i=1,num do
table.insert(value1,{atrr1=1,atrr2="atrr2",atrr3={1,2,3,4,5,6,7,7,8}})
end
return value1
end

function REQUEST:test( )
local value1=make_test1(70)
local value2={}
local value3={}

for i=1,30 do
    local atrr3=make_test1(30)
    table.insert(value2,{atrr1=1,atrr2="atrr2",atrr3=atrr3})
end
for i=1,20 do
    table.insert(value3,{testa1=1,testa2=33,testa3={1,2,3,4,5,6,7,7,8}})
end
return {value1=value1,value2=value2,value3=value3}

end

client:
while true do
fd = assert(socket.connect("127.0.0.1", 8889))
send_request("test")
socket.usleep(1000_800)
socket.close(fd)
socket.usleep(1000_500)
end

@cloudwu
Copy link
Owner

cloudwu commented Mar 17, 2015

请问是否有更新到 alpha2 ?

在 alpha1 中,sproto 共享的地方有个 bug 会导致多次释放 sproto 对象。

03f7661 这个 patch 里修复了这个问题。

我想知道是否是这个问题引起的?

@cloudwu cloudwu reopened this Mar 17, 2015
@cloudwu
Copy link
Owner

cloudwu commented Mar 18, 2015

我已经定位到了这个 bug , 正在解决中.

cloudwu added a commit that referenced this issue Mar 18, 2015
@cloudwu
Copy link
Owner

cloudwu commented Mar 18, 2015

问题应该解决了. 谢谢.

@cloudwu cloudwu closed this as completed Mar 18, 2015
@twohouses
Copy link
Author

嗯,目前没有再出现类似现象。谢谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants