Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

线程安全问题 #816

Closed
xzzwandi opened this issue Apr 10, 2018 · 8 comments
Closed

线程安全问题 #816

xzzwandi opened this issue Apr 10, 2018 · 8 comments

Comments

@xzzwandi
Copy link

想请教一下:
之前skynet发生几次崩溃,都是在调用第三方库的时候,问了作者,说该库是非线程安全的。后面我就把该库的使用只在一个service上,似乎就解决了。最近在多个service上 require openssl 的时候有时会触发崩溃,怀疑也是线程安全导致的,openssl版本1.0.1f,以下是gdb的栈:
#0 __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:29
No locals.
#1 0x00007f03d1ad3699 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
No symbol table info available.
#2 0x00007f03d1ad3972 in lh_insert () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
No symbol table info available.
#3 0x00007f03d1a4f8aa in OBJ_NAME_add () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
No symbol table info available.
#4 0x00007f03c772daf5 in SSL_library_init () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
No symbol table info available.
#5 0x00007f03c35e55b4 in luaopen_ssl (L=0x7f03be8de7e8) at src/ssl.c:1862
i = 0
#6 0x00007f03c35d46b7 in luaopen_openssl (L=0x7f03be8de7e8) at src/openssl.c:374
init = 1
#7 0x0000000000414739 in luaD_precall (L=L@entry=0x7f03be8de7e8, func=0x7f03be6eba40, nresults=1) at ldo.c:434
n =
f = 0x7f03c35d4444 <luaopen_openssl>
ci = 0x7f03be7af4b0
#8 0x0000000000414a03 in luaD_call (L=L@entry=0x7f03be8de7e8, func=, nResults=) at ldo.c:498
No locals.
#9 0x0000000000414a61 in luaD_callnoyield (L=0x7f03be8de7e8, func=, nResults=) at ldo.c:509
No locals.
#10 0x0000000000411fb9 in lua_callk (L=L@entry=0x7f03be8de7e8, nargs=nargs@entry=2, nresults=nresults@entry=1, ctx=ctx@entry=0, k=k@entry=0x0) at lapi.c:925
func =
#11 0x000000000043188c in ll_require (L=0x7f03be8de7e8) at loadlib.c:609
name = 0x7f03c82dbcb8 "openssl"
#12 0x0000000000414739 in luaD_precall (L=0x7f03be8de7e8, func=0x7f03be6eba00, nresults=1) at ldo.c:434
n =
f = 0x4317d0 <ll_require>
ci = 0x7f03be7af460
#13 0x000000000041ffca in luaV_execute (L=) at lvm.c:1146
b =
nresults = 1
i =
ra = 0x7f03be6eba00
ci =
cl = 0x7f03bd480b40
k = 0x7f03b735c000
base =
#14 0x0000000000414a0f in luaD_call (L=L@entry=0x7f03be8de7e8, func=, nResults=) at ldo.c:499
No locals.
#15 0x0000000000414a61 in luaD_callnoyield (L=0x7f03be8de7e8, func=, nResults=) at ldo.c:509
No locals.
#16 0x0000000000411fb9 in lua_callk (L=L@entry=0x7f03be8de7e8, nargs=nargs@entry=2, nresults=nresults@entry=1, ctx=ctx@entry=0, k=k@entry=0x0) at lapi.c:925
func =
#17 0x000000000043188c in ll_require (L=0x7f03be8de7e8) at loadlib.c:609
name = 0x7f03bd480ac8 "room.PrivateRoom"
#18 0x0000000000414739 in luaD_precall (L=0x7f03be8de7e8, func=0x7f03be6eb960, nresults=1) at ldo.c:434
n =
f = 0x4317d0 <ll_require>
ci = 0x7f03be72ffb0
#19 0x000000000041ffca in luaV_execute (L=) at lvm.c:1146
b =
nresults = 1
i =
ra = 0x7f03be6eb960
ci =

对于非线程安全的库是否都只能运行在一个service上,还是有其他解决方法,一般怎么解决,谢谢

@cloudwu
Copy link
Owner

cloudwu commented Apr 10, 2018

因为 skynet 会为每个 service 开一个独立的虚拟机,也就是说,你要保证每一份 C 库在每个虚拟机里拥有独立的状态。

如果有共享状态,那么就只在一个 service 里开,然后其它 service 用 skynet.call 去调用它。

@cloudwu
Copy link
Owner

cloudwu commented Apr 10, 2018

另外,你可以设计一个基于 skynet 本身机制的锁服务,然后用锁来封装你的非线程安全的库的 api 。

即:你可以单独写一个服务,按名字来管理锁,把每个锁请求排在自己的队列中,用起来大致是这样的:

skynet.call(lock_service, "lua", "lock", "openssl") -- 等待加锁成功。
pcall(do_something_with_openssl) -- 调用线程不安全的代码。
skynet.send(lock_service, "lua", "unlock", "openssl") -- 不需要返回。

这里注意,要处理好服务退出的时候漏掉 unlock 的情况。

这样做的话,好处是不会把对 openssl 的调用参数和结果都跨服务调用,因为锁是基于 skynet 自身调度的,并没有浪费总的 cpu 时间;坏处是,每次调用都有异步让出的可能,单个调用延迟较大。

@xzzwandi
Copy link
Author

明白了,谢谢指点!

@xzzwandi
Copy link
Author

哦,对了,还有,service在运行中是会切换线程的吧,这样对调用非线程安全的库会不会有问题

@xzzwandi
Copy link
Author

哦,对了,还有,service在运行中是会切换线程的吧,这样对调用非线程安全的库会不会有问题 @cloudwu

@xzzwandi xzzwandi reopened this Apr 10, 2018
@harrywong
Copy link
Contributor

harrywong commented Apr 10, 2018 via email

@cloudwu
Copy link
Owner

cloudwu commented Apr 10, 2018

我说的第二种方法不建议使用,因为我认为在实现的人搞清楚本质问题之前,很难实践的好。我认为这只是解决问题的一种理论方法。

鉴于看了太多理论上正确,实际上无法正确实践的例子。skynet 中一个简单的消息内存分配释放问题,即使明确的写在文档中,还是反反复复有人出问题。所以我后来把 skynet.netpack 这个库的使用参考建议的 wiki 页都删了。直接明确说明不要去调用它最简单。

还是只使用一个 service 比较妥当。

@xzzwandi
Copy link
Author

嗯,现在改成用一个 service,多谢了

@cloudwu cloudwu closed this as completed Oct 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants