Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

journal: fix flush by age and in-flight byte tracking #31392

Merged
merged 3 commits into from Nov 20, 2019

Conversation

dillaman
Copy link

@dillaman dillaman commented Nov 5, 2019

The flush by age was always causing an immediate flush due to a
backwards comparison. Additionally, the in-flight byte tracker was
never decremented which caused premature closure of the journal
object. Finally, there was a potential race condition between
closing the object and in-flight notification callbacks executing.

Fixes: https://tracker.ceph.com/issues/42598
Signed-off-by: Jason Dillaman dillaman@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@zhengchengyao
Copy link
Contributor

jenkins test make check arm64

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@trociny
Copy link
Contributor

trociny commented Nov 8, 2019

@dillaman Interestingly, the arm64 make check failed in in librbd unittest though I can't see the exact test as the output is truncated.

@trociny
Copy link
Contributor

trociny commented Nov 8, 2019

In my environment it reproduced running unittest_librbd with RBD_FEATURES=109. It hanged here:

[ RUN      ] DiffIterateTest/0.DiffIterateStress
using new format!
 i=0 exists [37355854~3427138,45162559~3059654,91388154~2042384,94214682~5520490,114340599~29642,141867571~2142690,144039207~467521,231295015~2522088,262103328~921101,270697387~5865069] wrote [37355854~3427138,45162559~3059654,91388154~2042384,94214682~5520490,114340599~29642,141867571~2142690,144039207~467521,231295015~2522088,262103328~921101,270697387~5865069]
 i=1 exists [37355854~3427138,41904811~25677,45162559~3059654,48997342~759544,57992185~1571362,91021878~59114,91388154~2042384,94214682~5520490,114340599~29642,141867571~2142690,144039207~467521,213592770~6114229,231295015~2522088,262103328~921101,270697387~5865069,280998390~5515261,329093409~3137244,360410602~8168757,402407194~1670168] wrote [41904811~25677,48997342~759544,57992185~1571362,91021878~59114,213592770~6114229,272213070~1336722,280998390~5515261,329093409~3137244,360410602~8168757,402407194~1670168]

If it does not reproduce in your environment I can try to debug.

@trociny
Copy link
Contributor

trociny commented Nov 8, 2019

In logs I see it hangs when after "already closed or overflowed" there is no "JournalRecorder: handle_advance_object_set", so it looks like the "notify_handler" is not fired.

@dillaman
Copy link
Author

dillaman commented Nov 8, 2019

@trociny I was able to hit it, thanks

@dillaman
Copy link
Author

Should be fixed now.

@trociny
Copy link
Contributor

trociny commented Nov 13, 2019

@dillaman still there is an issue. I was observing this crash in a test:

[ RUN      ] TestMigration.Stress2
...
snap_create: user3
/home/mgolub/ceph/ceph.ci/src/journal/JournalRecorder.cc: In function 'void journal::JournalRecorder::open_object_set()' thread 7fdac77fe700 time 2019-11-12T17:42:46.294782+0000
/home/mgolub/ceph/ceph.ci/src/journal/JournalRecorder.cc: 267: FAILED ceph_assert(object_recorder->is_closed())
 ceph version 15.0.0-7138-g0b8b6fdb79 (0b8b6fdb79b86ad4231d96e209f6680fa72de95d) octopus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19d) [0x7fdb5453d091]
 2: (()+0x15e6313) [0x7fdb5453d313]
 3: (journal::JournalRecorder::open_object_set()+0x27d) [0x5622300144c7]
 4: (journal::JournalRecorder::handle_advance_object_set(int)+0x347) [0x56223001414b]
 5: (journal::JournalRecorder::C_AdvanceObjectSet::finish(int)+0x24) [0x562230016c2c]
 6: (Context::complete(int)+0x27) [0x56222fae714f]
 7: (journal::JournalMetadata::C_AioNotify::finish(int)+0x55) [0x56222fffda1f]
 8: (Context::complete(int)+0x27) [0x56222fae714f]
 9: (journal::utils::rados_ctx_callback(void*, void*)+0x40) [0x56223002d958]

Also, it was hit by rbd_mirror.sh test, for "laggy" image, which is created with --journal-object-size 64K: the bench write got stuck in my case. And I was able to reproduce with:

rbd --cluster cluster2 create --size 128 --image-feature layering,exclusive-lock,journaling --journal-object-size 64K mirror/laggy1
rbd --cluster cluster2 bench mirror/laggy1 --io-type write --io-size 16384 --io-threads 1 --io-total 327680000 --io-pattern rand

It gets stuck. Interestingly, when I add --debug-rbd=30 --debug-journaler=30 it starts crashing to me with JournalRecorder.cc: 267: FAILED ceph_assert(object_recorder->is_closed()), so I suppose it is the same problem that was hit by TestMigration.Stress2.

@dillaman dillaman force-pushed the wip-42598 branch 2 times, most recently from 4b9f4b8 to 0d70963 Compare November 14, 2019 01:33
@dillaman
Copy link
Author

Found a brand new race condition in the code that I guess my batching fixes exposed. I will run through teuthology just to double-check that nothing else shakes out.

@dillaman dillaman force-pushed the wip-42598 branch 2 times, most recently from 9443f9c to 01cbbea Compare November 19, 2019 03:00
Jason Dillaman added 3 commits November 19, 2019 08:45
Use move semantics and RAII to control the locking of the
per-object recorder lock.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
The flush by age was always causing an immediate flush due to a
backwards comparison. Additionally, the in-flight byte tracker was
never decremented which caused premature closure of the journal
object.

Finally, there was a potential race condition between closing the
object and in-flight notification callbacks executing. Now we keep
the lock held for both closed and overflow callbacks to prevent
the small chance of a race.

Fixes: https://tracker.ceph.com/issues/42598
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
While the old set is being closed, additional IO can be queued up within
the old, closed objects while the in-flight IO settles. It's therefore
possible that the queued IO that is transferred from the old set to the
new set causes an immediate overflow of the new set.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
@dillaman
Copy link
Author

I am pretty confident that I've been chasing an unrelated failure [1] that has been introduced into the master branch [2] that is affecting the the dynamic features test. It seems to fail consistently on Ubuntu 18 but not on other distros.

The failure is related to CRC errors causing the messenger layer to get into a connect/resend/fail CRC/disconnect spin loop.

[1] http://pulpito.ceph.com/jdillaman-2019-11-19_16:32:12-rbd-wip-42598-distro-basic-smithi/
[2] http://pulpito.ceph.com/teuthology-2019-11-18_02:01:01-rbd-master-distro-basic-smithi/

@dillaman
Copy link
Author

Build failures from PR #31672

@dillaman
Copy link
Author

@dillaman
Copy link
Author

jenkins test make check

@dillaman
Copy link
Author

jenkins test make check arm64

@trociny trociny merged commit 840fc07 into ceph:master Nov 20, 2019
@dillaman dillaman deleted the wip-42598 branch November 20, 2019 16:31
@trociny
Copy link
Contributor

trociny commented Nov 20, 2019

@dillaman Just FYI (as it is not related to the PR), when testing it locally and running something like rbd bench --io-type write --io-size 64K --io-threads 16 --io-total 1G test for a journaling enabled image, I hit this assertion in the messenger once:

recursive lock of OSDSession::lock (27)
 ceph version 15.0.0-7401-g6dcef068ff (6dcef068ffa21c0c9e0084dd4dab49c897e128b8) octopus (dev)
 1: (ceph::mutex_debug_detail::mutex_debugging_base::_will_lock(bool)+0x4a) [0x7fdd48e931fc]
 2: (ceph::shared_mutex_debug::lock()+0x30) [0x7fdd48f34226]
 3: (()+0x13272f) [0x7fdd5201a72f]
 4: (()+0x12d039) [0x7fdd52015039]
 5: (()+0x16bfd9) [0x7fdd52053fd9]
 6: (()+0x15bb38) [0x7fdd52043b38]
 7: (()+0x15d826) [0x7fdd52045826]
 8: (()+0x15b786) [0x7fdd52043786]
 9: (()+0x125fda) [0x7fdd5200dfda]
 10: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xe9) [0x7fdd48f46881]
 11: (DispatchQueue::entry()+0x622) [0x7fdd48f45234]
 12: (DispatchQueue::DispatchThread::entry()+0x1c) [0x7fdd4908fcda]
 13: (Thread::entry_wrapper()+0x78) [0x7fdd48d3e752]
 14: (Thread::_entry_func(void*)+0x18) [0x7fdd48d3e6d0]
 15: (()+0x8f2a) [0x7fdd47443f2a]
 16: (clone()+0x3f) [0x7fdd46f7b4af]

@smithfarm
Copy link
Contributor

@trociny @dillaman ( Referring to https://tracker.ceph.com/issues/42953#change-175846 ) Would a successful nautilus backport of this PR fix the following "make check" failure that we occasionally see in nautilus?

177/177 Test #123: unittest_journal ........................***Timeout 3600.02 sec                              
...                                                                                                             
Failed to load class: cas (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cas.so):     
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cas.so: undefined symbol:              
+_Z27cls_cxx_chunk_write_and_setPviiPN4ceph6buffer7v14_2_04listEjS4_i                                           
Failed to load class: cephfs                                                                                    
+(/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cephfs.so):                           
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cephfs.so: undefined symbol:           
+_ZTI10PGLSFilter                                                                                               
Failed to load class: hello (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_hello.so): 
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_hello.so: undefined symbol:            
+_ZTI10PGLSFilter                                                                                               
Failed to load class: log (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_log.so):     
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_log.so: undefined symbol:              
+_Z23cls_cxx_map_read_headerPvPN4ceph6buffer7v14_2_04listE                                                      
Failed to load class: rgw (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_rgw.so):     
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_rgw.so: undefined symbol:              
+_Z23cls_cxx_map_read_headerPvPN4ceph6buffer7v14_2_04listE                                                      
Failed to load class: user (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_user.so):   
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_user.so: undefined symbol:             
+_Z23cls_cxx_map_read_headerPvPN4ceph6buffer7v14_2_04listE                                                      
Failed to load class: version                                                                                   
+(/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_version.so):                          
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_version.so: undefined symbol:          
+_Z19cls_gen_rand_base64Pci

@dillaman
Copy link
Author

Not sure -- but the original issue in that linked tracker was a crash, not a hang so I'd lean towards no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants