Skip to content

progress stuck due to deadlock #363

@Besroy

Description

@Besroy

Environment

  Cluster: 908
  Namespace: nuobject2sh-dev
  Pod: sm-long-running4-1010-6db4d944f-rzrbj
  HomeObject Version: homeobject/3.0.6@oss/main
  HomeStore Version: homestore/7.0.0@oss/master

Description

One SM stuck and cannot process more requests during SH test, and the reason seems to be a deadlock issue between _pg_lock (HomeObject PG lock) and m_meta_mtx (MetaBlkService mutex).

More details in SH isssue records Issue75

Threads

Thread 73 (LWP 99) - GC Worker Thread

Thread 73 (Thread 0x7c6bddffb680 (LWP 99)):
 #0  0x00007c6d18c1ef70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
 #1  0x00007c6d18c26101 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libc.so.6
 #2  0x00005ced90b217a9 in __gthread_mutex_lock (__mutex=0x5ceda0018ce0)
 #3  std::mutex::lock (this=0x5ceda0018ce0)
 #4  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>)
 #5  homestore::MetaBlkService::update_sub_sb (this=0x5ceda0018cd0, context_data=0x5ceda05776f0 "", sz=1751, cookie=0x5ceda032ac00) 
     at /home/ubuntu/.conan2/p/b/homes10e04144d6517/b/src/lib/meta/meta_blk_service.cpp:806
 #6  0x00005ced9096cffe in homeobject::HSHomeObject::update_pg_meta_after_gc (this=0x5ced9ffd0690, pg_id=<optimized out>, move_from_chunk=<optimized out>, move_to_chunk=<optimized out>, task_id=<optimized out>) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/hs_pg_manager.cpp:1010
 #7  0x00005ced90a6dcef in homeobject::GCManager::pdev_gc_actor::replace_blob_index (this=this@entry=0x5ceda0693100, move_from_chunk=<optimized out>, move_from_chunk@entry=224, move_to_chunk=<optimized out>, move_to_chunk@entry=187, 
 valid_blob_indexes=std::vector of length 154, capacity 256 = {...}, task_id=<optimized out>, task_id@entry=496)
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:582
 #8  0x00005ced90a6e609 in homeobject::GCManager::pdev_gc_actor::process_after_gc_metablk_persisted (this=this@entry=0x5ceda0693100, gc_task_sb=..., valid_blob_indexes=std::vector of length 154, capacity 256 = {...}, 
 task_id=task_id@entry=496)
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:1184
 #9  0x00005ced90a788b8 in homeobject::GCManager::pdev_gc_actor::process_gc_task (this=0x5ceda0693100, move_from_chunk=<optimized out>, priority=<optimized out>, task=..., task_id=<optimized out>) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:1148
 #10 0x00005ced90a7981f in operator() (__closure=0x7c6bddfecf40) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:346

Thread 25 (LWP 23) - GC Scanner Thread

holds this mutex:

  (gdb) p *(pthread_mutex_t*)0x5ceda0018ce0
  $1 = {__data = {__lock = 2, __count = 0, __owner = 23, __nusers = 1, ...}

trace:

 Thread 25 (Thread 0x7c6d15825680 (LWP 23)):
#0  0x00007c6d18c1ec37 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
 #1  0x00007c6d18c2933b in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libc.so.6
 #2  0x00005ced90962217 in std::__glibcxx_rwlock_wrlock (__rwlock=0x5ced9ffd0710)
 #3  std::__shared_mutex_pthread::lock (this=0x5ced9ffd0710)
 #4  std::shared_mutex::lock (this=0x5ced9ffd0710)
 #5  std::scoped_lock<std::shared_mutex>::scoped_lock (__m=..., this=<synthetic pointer>)
 #6  homeobject::HSHomeObject::can_chunks_in_pg_be_gc (this=0x5ced9ffd0690, pg_id=<optimized out>, pg_id@entry=0) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/hs_pg_manager.cpp:517
 #7  0x00005ced90a6acf1 in homeobject::GCManager::pdev_gc_actor::add_gc_task (this=0x5ceda0693100, priority=priority@entry=1 '\001', move_from_chunk=227) 
     at /home/ubuntu/HomeObject/src/lib/homestore_backend/gc_manager.cpp:325
 #8  0x00005ced90a7a31c in homeobject::GCManager::scan_chunks_for_gc (this=0x5ceda0582840)
 #9  0x00005ced90dbfd17 in std::function<void (void*)>::operator()(void*) const
 #10 iomgr::timer_epoll::on_timer_armed (this=0x5ced9ffe0740, iodev=<optimized out>)
 #11 0x00005ced90dbff57 in iomgr::timer_epoll::on_timer_fd_notification (iodev=iodev@entry=0x5ced9ff30af0)
 #12 0x00005ced90dfc728 in iomgr::IOReactorEPoll::listen (this=0x7c6cf8000980)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions