Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

链接jemalloc后可能会有死锁的问题 #859

Closed
niukuo opened this issue Jul 29, 2019 · 9 comments · Fixed by #2684
Closed

链接jemalloc后可能会有死锁的问题 #859

niukuo opened this issue Jul 29, 2019 · 9 comments · Fixed by #2684
Labels
bug the code does not work as expected

Comments

@niukuo
Copy link
Contributor

niukuo commented Jul 29, 2019

Describe the bug (描述bug)
jemalloc在申请内存时会有lock/unlock操作,brpc劫持了pthread_mutex_unlock函数后在submit_contention里面又有内存操作,导致出现死锁。

To Reproduce (复现方法)
在内存操作时有小概率出现。

Expected behavior (期望行为)

Versions (各种版本)
OS: centos 5.7
Compiler: CC=/usr/local/gcc-4.9.2/bin/gcc CXXFLAGS='-gdwarf-2'
brpc: 3becc72
protobuf:2.4.1

Additional context/screenshots (更多上下文/截图)
Thread 10 (Thread 0x7fe2009ff700 (LWP 17081)):
#0 0x00007fe28a98d51d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fe28a988e51 in _L_lock_1022 () from /lib64/libpthread.so.0
#2 0x00007fe28a988df2 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000ac982b in pthread_mutex_lock ()
#4 0x000000000050d860 in je_arena_choose_hard ()
#5 0x0000000000544028 in je_tcache_get_hard ()
#6 0x000000000051165b in malloc ()
#7 0x00000000011e97ec in operator new(unsigned long, std::nothrow_t const&) ()
#8 0x0000000000aca071 in bthread::submit_contention(bthread_contention_site_t const&, long) ()
#9 0x0000000000ac9bb7 in pthread_mutex_unlock ()
#10 0x0000000000522750 in je_base_alloc ()
#11 0x0000000000516f4f in je_arena_new ()
#12 0x000000000050d7ab in arena_init_locked ()
#13 0x000000000050d94d in je_arena_choose_hard ()
#14 0x0000000000544028 in je_tcache_get_hard ()
#15 0x000000000051165b in malloc ()
#16 0x00000000011f4cd8 in operator new(unsigned long) ()
#17 0x0000000000693e5e in std::function ()>::function, std::__future_base::_Result_base::_Deleter>, rocksdb::Status>, void>(std::__future_base::_Task_setter, std::__future_base::_Result_base::_Deleter>, rocksdb::Status>) ()

@niukuo
Copy link
Contributor Author

niukuo commented Jul 29, 2019

是在进程刚启动时发生的。

@smartlee
Copy link

是在进程刚启动时发生的。
这个问题解决了吗

@niukuo
Copy link
Contributor Author

niukuo commented Feb 19, 2020

是在进程刚启动时发生的。
这个问题解决了吗

没有啊 只是概率比较小 jemalloc里面某个缓存池不够了会触发

@niukuo
Copy link
Contributor Author

niukuo commented Feb 25, 2020

bthread::submit_contention 中butil::get_object()在线程第一次调用时会初始化tls,分配内存。

@niukuo
Copy link
Contributor Author

niukuo commented Feb 25, 2020

临时解决方式: 设置 bvar_collector_expected_per_second=0

@lrita
Copy link
Contributor

lrita commented Nov 17, 2020

我这面几乎是每次点开debug http 里的contention页面,进程必死。

目前使用tcmalloc会比jemalloc慢10%左右,调了几次tcmalloc,收益都不明显。

目前考虑在submit_contention先通过backtrace判断在不在je_函数的调用栈内,避免死锁。

@haoyixin
Copy link

临时解决方式: 设置 bvar_collector_expected_per_second=0

这个什么原理?

@wwbmmm wwbmmm added the bug the code does not work as expected label Apr 19, 2023
@chenBright
Copy link
Contributor

chenBright commented Jul 9, 2024

临时解决方式: 设置 bvar_collector_expected_per_second=0

这个什么原理?

相当于关闭采样,不触发contention profiler吧。

@chenBright
Copy link
Contributor

@niukuo @lrita 可以使用#2684 看还有没有问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug the code does not work as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants