Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

journal: fix race between player shut down and cache rebalance #28748

Merged

Conversation

trociny
Copy link
Contributor

@trociny trociny commented Jun 25, 2019

Signed-off-by: Mykola Golub mgolub@suse.com

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Signed-off-by: Mykola Golub <mgolub@suse.com>
@trociny
Copy link
Contributor Author

trociny commented Jun 25, 2019

The crash example [1], when shut_down is called just after handle_cache_rebalanced but before prefetch, which is called asynchronously by handle_cache_rebalanced:

2019-06-25T10:41:15.249+0000 7f31b9435700  5 JournalPlayer: 0x6fb6ac0 commit position: [positions=[[object_number=1, tag_tid=2, entry_tid=101], [object_number=0, tag_tid=2, entry_tid=100], [object_number=3, tag_tid=2, entry_tid=99], [object_number=2, tag_tid=2, entry_tid=98]]]
2019-06-25T10:41:16.105+0000 7f31eecd1340 10 JournalPlayer: 0x6fb6ac0 handle_cache_rebalanced: new_cache_bytes=134217728, max_fetch_bytes=33554432
2019-06-25T10:41:16.109+0000 7f31b9435700 20 JournalPlayer: 0x6fb6ac0 shut_down
2019-06-25T10:41:16.141+0000 7f31df481700 10 JournalPlayer: 0x6fb6ac0 prefetch: prefetching 4 objects
2019-06-25T10:41:16.141+0000 7f31df481700 10 JournalPlayer: 0x6fb6ac0 fetch: journal_data.2.17149983bc35.0
2019-06-25T10:41:16.141+0000 7f31df481700 10 JournalPlayer: 0x6fb6ac0 fetch: journal_data.2.17149983bc35.1
2019-06-25T10:41:16.141+0000 7f31df481700 10 JournalPlayer: 0x6fb6ac0 fetch: journal_data.2.17149983bc35.2
2019-06-25T10:41:16.141+0000 7f31df481700 10 JournalPlayer: 0x6fb6ac0 fetch: journal_data.2.17149983bc35.3
2019-06-25T10:41:16.149+0000 7f31cb459700 10 JournalPlayer: 0x6fb6ac0 handle_fetched: journal_data.2.17149983bc35.0: r=0
2019-06-25T10:41:16.149+0000 7f31cb459700 10 JournalPlayer: 0x6fb6ac0 handle_fetched: journal_data.2.17149983bc35.1: r=0
2019-06-25T10:41:16.149+0000 7f31cb459700 10 JournalPlayer: 0x6fb6ac0 handle_fetched: journal_data.2.17149983bc35.2: r=0
2019-06-25T10:41:16.149+0000 7f31cb459700 10 JournalPlayer: 0x6fb6ac0 handle_fetched: journal_data.2.17149983bc35.3: r=0

journal/JournalPlayer.cc: 105: FAILED ceph_assert(m_async_op_tracker.empty())

(gdb) bt
#0  0x00007f31e53d9269 in raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#1  0x0000000000a41cc0 in reraise_fatal (signum=6) at /build/ceph-15.0.0-2168-g2f35ab7/src/global/signal_handler.cc:81
#2  handle_fatal_signal (signum=6) at /build/ceph-15.0.0-2168-g2f35ab7/src/global/signal_handler.cc:326
#3  <signal handler called>
#4  0x00007f31e4c27428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00007f31e4c2902a in __GI_abort () at abort.c:89
#6  0x00007f31e58ad56f in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=0xb954c0 <journal::JournalPlayer::~JournalPlayer()::__PRETTY_FUNCTION__> "journal::JournalPlayer::~JournalPlayer()")
    at /build/ceph-15.0.0-2168-g2f35ab7/src/common/assert.cc:73
#7  0x00007f31e58ad6f9 in ceph::__ceph_assert_fail (ctx=...) at /build/ceph-15.0.0-2168-g2f35ab7/src/common/assert.cc:78
#8  0x00000000009bc132 in journal::JournalPlayer::~JournalPlayer (this=0x6fb6ac0, __in_chrg=<optimized out>) at /build/ceph-15.0.0-2168-g2f35ab7/src/journal/JournalPlayer.cc:105
#9  0x00000000009a8b4a in journal::Journaler::<lambda(int)>::operator() (r=0, __closure=0x7136b60) at /build/ceph-15.0.0-2168-g2f35ab7/src/journal/Journaler.cc:381
#10 boost::detail::function::void_function_obj_invoker1<journal::Journaler::stop_replay(Context*)::<lambda(int)>, void, int>::invoke(boost::detail::function::function_buffer &, int) (function_obj_ptr=..., a0=0)
    at /build/ceph-15.0.0-2168-g2f35ab7/obj-x86_64-linux-gnu/boost/include/boost/function/function_template.hpp:159
#11 0x00000000005ddcdc in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>) at /build/ceph-15.0.0-2168-g2f35ab7/obj-x86_64-linux-gnu/boost/include/boost/function/function_template.hpp:768
#12 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /build/ceph-15.0.0-2168-g2f35ab7/src/include/Context.h:487
#13 0x00000000005dbda9 in Context::complete (this=0x7136b50, r=<optimized out>) at /build/ceph-15.0.0-2168-g2f35ab7/src/include/Context.h:77
#14 0x00007f31e595566a in ThreadPool::worker (this=0x2ed4700, wt=<optimized out>) at /build/ceph-15.0.0-2168-g2f35ab7/src/common/WorkQueue.cc:118
#15 0x00007f31e5956920 in ThreadPool::WorkThread::entry (this=<optimized out>) at /build/ceph-15.0.0-2168-g2f35ab7/src/common/WorkQueue.h:465
#16 0x00007f31e53cf6ba in start_thread (arg=0x7f31df481700) at pthread_create.c:333
#17 0x00007f31e4cf941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) p this
$1 = (journal::JournalPlayer * const) 0x6fb6ac0

[1] http://qa-proxy.ceph.com/teuthology/trociny-2019-06-25_06:56:00-rbd-wip-mgolub-testing-distro-basic-smithi/4066734/teuthology.log

Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants