Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

进程退出时清理线程局部存储导致crash #731

Closed
kueiwoodwolf opened this issue Sep 6, 2017 · 15 comments
Closed

进程退出时清理线程局部存储导致crash #731

kueiwoodwolf opened this issue Sep 6, 2017 · 15 comments

Comments

@kueiwoodwolf
Copy link

skynet退出时,清理线程本地存储会crash,堆栈如下:
#0 atomic_load_p (mo=atomic_memory_order_relaxed, a=0x13e58) at include/jemalloc/internal/atomic.h:55
#1 rtree_leaf_elm_bits_read (tsdn=, rtree=, dependent=true, elm=0x13e58) at include/jemalloc/internal/rtree.h:175
#2 rtree_leaf_elm_szind_read (tsdn=, rtree=, dependent=true, elm=0x13e58) at include/jemalloc/internal/rtree.h:215
#3 rtree_szind_read (dependent=true, key=41725968, rtree_ctx=, rtree=, tsdn=) at include/jemalloc/internal/rtree.h:422
#4 arena_salloc (ptr=0x27cb010, tsdn=) at include/jemalloc/internal/arena_inlines_b.h:127
#5 isalloc (ptr=0x27cb010, tsdn=) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:19
#6 je_malloc_usable_size (ptr=ptr@entry=0x27cb010) at src/jemalloc.c:2978
#7 0x0000000000410543 in clean_prefix (ptr=0x27cb010 "") at skynet-src/malloc_hook.c:136
#8 free (ptr=0x27cb010) at skynet-src/malloc_hook.c:219
#9 0x00007f5f10ffe0f9 in __GI__dl_deallocate_tls (tcb=0x7f5f0d4f7700, dealloc_tcb=) at dl-tls.c:555
#10 0x00007f5f10dd6e07 in __free_stacks (limit=limit@entry=41943040) at allocatestack.c:282
#11 0x00007f5f10dd6f1f in queue_stack (stack=0x7f5f0b2f3700) at allocatestack.c:310
#12 __deallocate_stack (pd=pd@entry=0x7f5f0b2f3700) at allocatestack.c:746
#13 0x00007f5f10dd8029 in __free_tcb (pd=pd@entry=0x7f5f0b2f3700) at pthread_create.c:224
#14 0x00007f5f10dd8f33 in pthread_join (threadid=140046186264320, thread_return=thread_return@entry=0x0) at pthread_join.c:113
#15 0x000000000040c0d9 in start (thread=8) at skynet-src/skynet_start.c:226
#16 skynet_start (config=config@entry=0x7ffdc3649690) at skynet-src/skynet_start.c:276
#17 0x000000000040939d in main (argc=, argv=) at skynet-src/skynet_main.c:163

有个第三方库使用线程局部存储,而pthread中tls内存是用__libc_memalign分配的, 绕过了skynet的hook,但释放的时候是free,从而调用到skynet_free

static void *
allocate_and_init (struct link_map *map)
{
void *newp;

newp = __libc_memalign (map->l_tls_align, map->l_tls_blocksize);
if (newp == NULL)
oom ();

/* Initialize the memory. */
memset (__mempcpy (newp, map->l_tls_initimage, map->l_tls_initimage_size),
'\0', map->l_tls_blocksize - map->l_tls_initimage_size);

return newp;
}

@cloudwu
Copy link
Owner

cloudwu commented Sep 6, 2017

你确定更新了? a9122a9

@kueiwoodwolf
Copy link
Author

是的, 我确定更新memalign那块代码了

@cloudwu
Copy link
Owner

cloudwu commented Sep 6, 2017

那你要查一下为什么 memalign 没有被替换而 free 被替换掉了。

@kueiwoodwolf
Copy link
Author

有个第三方库使用线程局部存储,而pthread中tls内存是用__libc_memalign分配的, 绕过了skynet的hook

@cloudwu
Copy link
Owner

cloudwu commented Sep 6, 2017

直接调用 __libc_memalign 是不对的吧,应该调用 memalign

@cloudwu
Copy link
Owner

cloudwu commented Sep 6, 2017

为什么会调用 __libc_memalign 却不调用 __libc_free 释放呢?

试一下在 malloc_hook.c 多定义一个这个函数,看能否解决:

void *
__libc_memalign(size_t alignment, size_t size) {
  return skynet_memalign(alignment, size);
}

或者把整个 jemalloc 去掉。

@kueiwoodwolf
Copy link
Author

第三方库是这么写的
static __thread unsigned int sThreadIndex = kNullIndex;
...
if ( sThreadIndex == kNullIndex )
{
sThreadIndex = gThreadCounter++;
}
if那句就会调用到pthread的dl-ts.c里的tls_get_addr_tail,又到了allocate_and_init, allocate_and_init直接使用了__libc_memalign, 我也觉得pthread似乎不应该用__libc_memalign

@kueiwoodwolf
Copy link
Author

定义__libc_memalign函数可以解决问题
另外问下为啥不使用__malloc_hook那套hook机制呢, __libc_malloc系列有判断__malloc_hook,我看man里说__malloc_hook不推荐使用了?

@cloudwu
Copy link
Owner

cloudwu commented Sep 6, 2017

malloc hook 已经废弃了, 也不是 posix 标准。

@cloudwu
Copy link
Owner

cloudwu commented Sep 6, 2017

我先把这个 patch 打上,不知道在其它系统是否有潜在的问题。

@cloudwu
Copy link
Owner

cloudwu commented Sep 6, 2017

pthread 为什么会用 __libc_memalign 却不对应的用 __libc_free 呢?好奇怪。

@kueiwoodwolf
Copy link
Author

https://sourceware.org/bugzilla/show_bug.cgi?id=17730
貌似glibc2.25已经修了这个问题了

@kueiwoodwolf
Copy link
Author

只定义__libc_memalign是不是还有风险, 系统要是用__libc_memalign分配, 用 __libc_free释放。。。。
比较靠谱的是升级glibc, 或者全部不hook了, 损失点性能。

@cloudwu
Copy link
Owner

cloudwu commented Sep 6, 2017

我还是退回这个 patch ,升级 glibc 看起来更好一些。否则还需要定义 __libc_free 而且也不彻底。

cloudwu added a commit that referenced this issue Sep 6, 2017
@kueiwoodwolf
Copy link
Author

嗯, 我这边自己先把相关函数hook去掉了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants