Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw_file: alternate fix deadlock on lru eviction #20034

Merged
merged 1 commit into from Jan 22, 2018

Conversation

mattbenjamin
Copy link
Contributor

@mattbenjamin mattbenjamin commented Jan 19, 2018

Fixes: https://tracker.ceph.com/issues/22736

This change is an alternate fix for two problems found and fixed
by Yao Zongyou yaozongyou@vip.qq.com.

The deadlock can be avoided just by not taking it in the recycle
case, which invariantly holds the lock.

The invalidation of the insert iterator by the recyle derived dtor
we'd like to handle as a condition in order to preserve the cached
insertion point optimization we get in the common case. (The
original behavior was, indeed, incorrect.)

Signed-off-by: Matt Benjamin mbenjamin@redhat.com

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untested, but the approach is sane and it preserves the latch in the common case

@mattbenjamin
Copy link
Contributor Author

@yaozongyou @cbodley I'll try to get proper testing finished shortly, thanks for your help!

@yaozongyou
Copy link
Contributor

I have been testing this fix in my environment, still hangs, here is the backtrace:

(gdb) Thread 135
[Switching to thread 135 (Thread 0x7f3cfcf7a700 (LWP 1742))]
#6  0x00007f3d7f0a1110 in rgw::RGWFileHandle::reclaim (this=0x7f3d5e0db200)
    at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.cc:1014
1014          fs->fh_cache.remove(fh.fh_hk.object, this, FHCache::FLAG_LOCK);
(gdb) bt
#0  0x00007f3d8fae51bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f3d8fae0d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f3d8fae0c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f3d7f0b2df3 in __gthread_mutex_lock (__mutex=0x7f3d82045748)
    at /data/home/richardyao/bin/gcc72/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
#4  std::mutex::lock (this=this@entry=0x7f3d82045748)
    at /data/home/richardyao/bin/gcc72/include/c++/7.2.0/bits/std_mutex.h:103
#5  0x00007f3d7f0ccd5d in cohort::lru::TreeX<rgw::RGWFileHandle, boost::intrusive::rbtree<rgw::RGWFileHandle, boost::intrusive::compare<rgw::RGWFileHandle::FhLT>, boost::intrusive::member_hook<rgw::RGWFileHandle, boost::intrusive::set_member_hook<boost::intrusive::link_mode<(boost::intrusive::link_mode_type)1>, void, void, void>, &rgw::RGWFileHandle::fh_hook>, void, void, void, void>, rgw::RGWFileHandle::FhLT, rgw::RGWFileHandle::FhEQ, rgw::fh_key, std::mutex>::remove (this=0x7f3d820cee10, 
    hk=5070069916126884224, v=v@entry=0x7f3d5e0db200, flags=flags@entry=1)
    at /data/home/richardyao/workspace/github/ceph/src/common/cohort_lru.h:447
#6  0x00007f3d7f0a1110 in rgw::RGWFileHandle::reclaim (this=0x7f3d5e0db200)
    at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.cc:1014
#7  0x00007f3d7f0d1db2 in evict_block (this=<optimized out>)
    at /data/home/richardyao/workspace/github/ceph/src/common/cohort_lru.h:147
#8  insert (flags=<synthetic pointer>, edge=cohort::lru::MRU, fac=<synthetic pointer>, this=<optimized out>)
    at /data/home/richardyao/workspace/github/ceph/src/common/cohort_lru.h:239
#9  rgw::RGWLibFS::lookup_fh (this=this@entry=0x7f3d820cec00, parent=parent@entry=0x7f3d820cec20, 
    name=name@entry=0x7f3d73036058 "61", flags=flags@entry=68)
    at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.h:1065
#10 0x00007f3d7f0a5ec6 in rgw::RGWLibFS::create (this=0x7f3d820cec00, parent=0x7f3d820cec20, name=0x7f3d73036058 "61", 
    st=0x7f3cfcf75a80, mask=7, flags=0) at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.cc:645
#11 0x00007f3d7f0a62a8 in rgw_create (rgw_fs=<optimized out>, parent_fh=<optimized out>, name=<optimized out>, 
    st=<optimized out>, mask=<optimized out>, fh=0x7f3cfcf75998, posix_flags=193, flags=0)
---Type <return> to continue, or q <return> to quit---
    at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.cc:1667
#12 0x00007f3d7fdcb4c3 in rgw_fsal_open2 (obj_hdl=0x7f3d7ce0b200, state=0x7f3d73048d40, openflags=2, 
    createmode=FSAL_EXCLUSIVE, name=0x7f3d73036058 "61", attrib_set=0x7f3cfcf766e0, verifier=0x7f3cfcf767c8 "\241^\353-:\t", 
    new_obj=0x7f3cfcf75d00, attrs_out=0x7f3cfcf75c10, caller_perm_check=0x7f3cfcf75daf)
    at /data/home/richardyao/workspace/github/nfs-ganesha/src/FSAL/FSAL_RGW/handle.c:1060
#13 0x0000000000535a48 in mdcache_open2 (obj_hdl=0x7f3d7cd7e738, state=0x7f3d73048d40, openflags=2, 
    createmode=FSAL_EXCLUSIVE, name=0x7f3d73036058 "61", attrs_in=0x7f3cfcf766e0, verifier=0x7f3cfcf767c8 "\241^\353-:\t", 
    new_obj=0x7f3cfcf767e0, attrs_out=0x0, caller_perm_check=0x7f3cfcf75daf)
    at /data/home/richardyao/workspace/github/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:672
#14 0x000000000042fda3 in open2_by_name (in_obj=0x7f3d7cd7e738, state=0x7f3d73048d40, openflags=2, 
    createmode=FSAL_EXCLUSIVE, name=0x7f3d73036058 "61", attr=0x7f3cfcf766e0, verifier=0x7f3cfcf767c8 "\241^\353-:\t", 
    obj=0x7f3cfcf767e0, attrs_out=0x0) at /data/home/richardyao/workspace/github/nfs-ganesha/src/FSAL/fsal_helper.c:406
#15 0x0000000000432a8c in fsal_open2 (in_obj=0x7f3d7cd7e738, state=0x7f3d73048d40, openflags=2, createmode=FSAL_EXCLUSIVE, 
    name=0x7f3d73036058 "61", attr=0x7f3cfcf766e0, verifier=0x7f3cfcf767c8 "\241^\353-:\t", obj=0x7f3cfcf767e0, 
    attrs_out=0x0) at /data/home/richardyao/workspace/github/nfs-ganesha/src/FSAL/fsal_helper.c:1788
#16 0x000000000047054d in open4_ex (arg=0x7f3cb918e918, data=0x7f3cfcf76950, res_OPEN4=0x7f3d73090128, 
    clientid=0x7f3cb8019100, owner=0x7f3cb201d200, file_state=0x7f3cfcf76898, new_state=0x7f3cfcf76897)
    at /data/home/richardyao/workspace/github/nfs-ganesha/src/Protocols/NFS/nfs4_op_open.c:1440
#17 0x000000000047191a in nfs4_op_open (op=0x7f3cb918e910, data=0x7f3cfcf76950, resp=0x7f3d73090120)
    at /data/home/richardyao/workspace/github/nfs-ganesha/src/Protocols/NFS/nfs4_op_open.c:1845
#18 0x000000000045c791 in nfs4_Compound (arg=0x7f3cb928b630, req=0x7f3cb928ae28, res=0x7f3cafc0e280)
    at /data/home/richardyao/workspace/github/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:743
#19 0x000000000044b19d in nfs_rpc_execute (reqdata=0x7f3cb928ae00)
    at /data/home/richardyao/workspace/github/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1290
#20 0x000000000044b9b7 in worker_run (ctx=0x7f3d3ffca680)
    at /data/home/richardyao/workspace/github/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1562
---Type <return> to continue, or q <return> to quit---
#21 0x0000000000504bc5 in fridgethr_start_routine (arg=0x7f3d3ffca680)
    at /data/home/richardyao/workspace/github/nfs-ganesha/src/support/fridgethr.c:550
#22 0x00007f3d8fadedc5 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f3d8ec9474d in clone () from /lib64/libc.so.6
(gdb) 
(gdb) f 6
#6  0x00007f3d7f0a1110 in rgw::RGWFileHandle::reclaim (this=0x7f3d5e0db200)
    at /data/home/richardyao/workspace/github/ceph/src/rgw/rgw_file.cc:1014
1014          fs->fh_cache.remove(fh.fh_hk.object, this, FHCache::FLAG_LOCK);
(gdb) print fs->fh_cache.part[2].lock
$7 = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 2, __count = 0, __owner = 1742, __nusers = 1, __kind = 0, 
        __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
      __size = "\002\000\000\000\000\000\000\000\316\006\000\000\001", '\000' <repeats 26 times>, 
      __align = 2}}, <No data fields>}
(gdb) 

fs->fh_cache.remove(fh.fh_hk.object, this, FHCache::FLAG_LOCK);
/* in this case, we are being called from a context which holds
* the partition lock */
fs->fh_cache.remove(fh.fh_hk.object, this, FHCache::FLAG_NONE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we should keep this as it is, and instead change RGWFileHandle::reclaim:
https://github.com/ceph/ceph/pull/20034/files#diff-b03934e90d5d464122f42ce5e12cca65R1014

@mattbenjamin
Copy link
Contributor Author

mattbenjamin commented Jan 20, 2018 via email

@yaozongyou
Copy link
Contributor

We should change the flag in RGWFileHandle::reclaim instead of RGWFileHandle::~RGWFileHandle()

The stack trace seems to have flag_lock in the call to fh_cache.remove. It
should be flag_none, yes?

I'll reread tomorrow. Thanks for testing.

Matt

@mattbenjamin
Copy link
Contributor Author

mattbenjamin commented Jan 20, 2018 via email

@mattbenjamin
Copy link
Contributor Author

mattbenjamin commented Jan 20, 2018

@yaozongyou update; It looks to me as if the cond-unlink belongs in the RGWFileHandle dtor, but NOT in the reclaim method; that method predates the factory method which I now have calling the RGWFileHandle dtor--which is more symmetrical; reclaim() should just indicate if we think an object must not be reclaimed; i.e., it seems clear that reclaim() and ~RGWFileHandle both attempted the conditional unlink--presumably in the reclaim path, we were always unlinking in reclaim first, but suggesting we don't anymore?

@mattbenjamin
Copy link
Contributor Author

@yaozongyou actually, I think you're right that having unlink in reclaim() is preferable. updated

This change is an alternate fix for two problems found and fixed
by Yao Zongyou <yaozongyou@vip.qq.com>.

The deadlock can be avoided just by not taking it in the recycle
case, which invariantly holds the lock.

The invalidation of the insert iterator by the recyle-path unlink
we'd like to handle as a condition in order to preserve the cached
insertion point optimization we get in the common case.  (The
original behavior was, indeed, incorrect.)

Based on feedback from Yao, removed the RGWFileHandle dtor version
of the unlink check, which I think happened twice.

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
@yaozongyou
Copy link
Contributor

tested, it works greatly well in my environment.

@mattbenjamin
Copy link
Contributor Author

(accidentally pushed additional commits, reverted to top commit 3cf0880)

@mattbenjamin mattbenjamin merged commit debb6b9 into ceph:master Jan 22, 2018
@smithfarm
Copy link
Contributor

@mattbenjamin I'm guessing this is the fix for https://tracker.ceph.com/issues/22736 ?

@mattbenjamin
Copy link
Contributor Author

@smithfarm right, sorry to leave that off

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants