Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] BE crash occasionally #3929

Open
morningman opened this issue Jun 23, 2020 · 1 comment
Open

[Bug] BE crash occasionally #3929

morningman opened this issue Jun 23, 2020 · 1 comment
Labels
help wanted kind/fix Categorizes issue or PR as related to a bug.

Comments

@morningman
Copy link
Contributor

morningman commented Jun 23, 2020

Describe the bug
BE crash occasionally and be.out shows:

palo_be: ../nptl/pthread_mutex_lock.c:80: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
*** Aborted at 1592890873 (unix time) try "date -d @1592890873" if you are using GNU date ***
PC: @     0x7f599a2af3f7 __GI_raise
*** SIGABRT (@0x1f4000081a6) received by PID 33190 (TID 0x7f59797b6700) from PID 33190; stack trace: ***
    @     0x7f599a2af470 (unknown)
    @     0x7f599a2af3f7 __GI_raise
    @     0x7f599a2b07d8 __GI_abort
    @     0x7f599a2a8516 __assert_fail_base
    @     0x7f599a2a85c2 __GI___assert_fail
    @     0x7f599a06658c __GI___pthread_mutex_lock
    @          0x1ba34d6 pthread_mutex_lock
    @          0x145f4ac doris::OlapScanNode::scanner_thread()
    @           0xfa8a35 doris::PriorityThreadPool::work_thread()
    @          0x1a5bbed thread_proxy
    @     0x7f599a0641c3 start_thread
    @     0x7f599a36112d __clone

The reason is that when trying to lock a mutex, the assertion failed at mutex->__data.__owner == 0. It expected __owner == 0, which is not.

But when I look into the core dump file, the __owner field of that mutex is 0.

#0  0x00007f599a2af3f7 in raise () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
#1  0x00007f599a2b07d8 in abort () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
#2  0x00007f599a2a8516 in __assert_fail_base () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
#3  0x00007f599a2a85c2 in __assert_fail () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
#4  0x00007f599a06658c in pthread_mutex_lock () from /opt/compiler/gcc-4.8.2/lib64/libpthread.so.0
#5  0x0000000001ba34d6 in pthread_mutex_lock_impl (mutex=0x67521610) at /home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp:551
#6  pthread_mutex_lock (__mutex=__mutex@entry=0x67521610) at /home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp:809
#7  0x000000000145f4ac in pthread_mutex_scoped_lock (m_=0x67521610, this=<synthetic pointer>) at /home/palo/thirdparty/installed/include/boost/thread/pthread/pthread_mutex_scoped_lock.hpp:26
#8  notify_one (this=0x67521610) at /home/palo/thirdparty/installed/include/boost/thread/pthread/condition_variable.hpp:126
#9  doris::OlapScanNode::scanner_thread (this=0x67521000, scanner=0x20200bd40) at /home/palo/be/src/exec/olap_scan_node.cpp:1322
#10 0x0000000000fa8a35 in operator() (this=0x7f59797b2828) at /home/palo/thirdparty/installed/include/boost/function/function_template.hpp:759
#11 doris::PriorityThreadPool::work_thread (this=0x50ac300, thread_id=<optimized out>) at /home/palo/be/src/util/priority_thread_pool.hpp:138
#12 0x0000000001a5bbed in thread_proxy ()
#13 0x00007f599a0641c3 in start_thread () from /opt/compiler/gcc-4.8.2/lib64/libpthread.so.0
#14 0x00007f599a36112d in clone () from /opt/compiler/gcc-4.8.2/lib64/libc.so.6
(gdb) f 5
#5  0x0000000001ba34d6 in pthread_mutex_lock_impl (mutex=0x67521610) at /home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp:551
551	/home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp: No such file or directory.
(gdb) p mutex
$1 = (pthread_mutex_t *) 0x67521610
(gdb) p *mutex
$2 = {
  __data = {
    __lock = 0,
    __count = 0,
    __owner = 0,
    __nusers = 4294967295,
    __kind = 0,
    __spins = 0,
    __elision = 0,
    __list = {
      __prev = 0x0,
      __next = 0x0
    }
  },
  __size = '\000' <repeats 12 times>, "����", '\000' <repeats 23 times>,
  __align = 0
}

That mutex is a internal mutex of boost::condition_variable. I have no idea why.

@morningman morningman added kind/fix Categorizes issue or PR as related to a bug. help wanted labels Jun 23, 2020
@vagetablechicken
Copy link
Contributor

The binary use /home/palo/thirdparty/src/incubator-brpc-0.9.5/src/bthread/mutex.cpp:809, may be closely related to this coredump?

morningman added a commit that referenced this issue Jun 29, 2020
Replace some boost to std in OlapScanNode.

This refactor seems solve the problem describe in #3929.
Because I found that BE will crash to calling `boost::condition_variable.notify_all()`.
But after upgrade to this, BE does not crash any more.
morningman added a commit to morningman/doris that referenced this issue Jun 29, 2020
Replace some boost to std in OlapScanNode.

This refactor seems solve the problem describe in apache#3929.
Because I found that BE will crash to calling `boost::condition_variable.notify_all()`.
But after upgrade to this, BE does not crash any more.

Change-Id: I4baeeb6e76ecc751cb042796d52692d79432994e
wuyunfeng pushed a commit to wuyunfeng/incubator-doris that referenced this issue Jun 30, 2020
Replace some boost to std in OlapScanNode.

This refactor seems solve the problem describe in apache#3929.
Because I found that BE will crash to calling `boost::condition_variable.notify_all()`.
But after upgrade to this, BE does not crash any more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted kind/fix Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants