journal: fix flush by age and in-flight byte tracking #31392

dillaman · 2019-11-05T01:16:14Z

The flush by age was always causing an immediate flush due to a
backwards comparison. Additionally, the in-flight byte tracker was
never decremented which caused premature closure of the journal
object. Finally, there was a potential race condition between
closing the object and in-flight notification callbacks executing.

Fixes: https://tracker.ceph.com/issues/42598
Signed-off-by: Jason Dillaman dillaman@redhat.com

Checklist

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

Show available Jenkins commands

jenkins retest this please
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox

zhengchengyao · 2019-11-05T07:13:21Z

jenkins test make check arm64

trociny

LGTM

trociny · 2019-11-08T15:43:34Z

@dillaman Interestingly, the arm64 make check failed in in librbd unittest though I can't see the exact test as the output is truncated.

trociny · 2019-11-08T15:53:26Z

In my environment it reproduced running unittest_librbd with RBD_FEATURES=109. It hanged here:

[ RUN      ] DiffIterateTest/0.DiffIterateStress
using new format!
 i=0 exists [37355854~3427138,45162559~3059654,91388154~2042384,94214682~5520490,114340599~29642,141867571~2142690,144039207~467521,231295015~2522088,262103328~921101,270697387~5865069] wrote [37355854~3427138,45162559~3059654,91388154~2042384,94214682~5520490,114340599~29642,141867571~2142690,144039207~467521,231295015~2522088,262103328~921101,270697387~5865069]
 i=1 exists [37355854~3427138,41904811~25677,45162559~3059654,48997342~759544,57992185~1571362,91021878~59114,91388154~2042384,94214682~5520490,114340599~29642,141867571~2142690,144039207~467521,213592770~6114229,231295015~2522088,262103328~921101,270697387~5865069,280998390~5515261,329093409~3137244,360410602~8168757,402407194~1670168] wrote [41904811~25677,48997342~759544,57992185~1571362,91021878~59114,213592770~6114229,272213070~1336722,280998390~5515261,329093409~3137244,360410602~8168757,402407194~1670168]

If it does not reproduce in your environment I can try to debug.

trociny · 2019-11-08T16:02:54Z

In logs I see it hangs when after "already closed or overflowed" there is no "JournalRecorder: handle_advance_object_set", so it looks like the "notify_handler" is not fired.

dillaman · 2019-11-08T16:10:52Z

@trociny I was able to hit it, thanks

dillaman · 2019-11-11T18:14:44Z

Should be fixed now.

trociny · 2019-11-13T08:40:23Z

@dillaman still there is an issue. I was observing this crash in a test:

[ RUN      ] TestMigration.Stress2
...
snap_create: user3
/home/mgolub/ceph/ceph.ci/src/journal/JournalRecorder.cc: In function 'void journal::JournalRecorder::open_object_set()' thread 7fdac77fe700 time 2019-11-12T17:42:46.294782+0000
/home/mgolub/ceph/ceph.ci/src/journal/JournalRecorder.cc: 267: FAILED ceph_assert(object_recorder->is_closed())
 ceph version 15.0.0-7138-g0b8b6fdb79 (0b8b6fdb79b86ad4231d96e209f6680fa72de95d) octopus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19d) [0x7fdb5453d091]
 2: (()+0x15e6313) [0x7fdb5453d313]
 3: (journal::JournalRecorder::open_object_set()+0x27d) [0x5622300144c7]
 4: (journal::JournalRecorder::handle_advance_object_set(int)+0x347) [0x56223001414b]
 5: (journal::JournalRecorder::C_AdvanceObjectSet::finish(int)+0x24) [0x562230016c2c]
 6: (Context::complete(int)+0x27) [0x56222fae714f]
 7: (journal::JournalMetadata::C_AioNotify::finish(int)+0x55) [0x56222fffda1f]
 8: (Context::complete(int)+0x27) [0x56222fae714f]
 9: (journal::utils::rados_ctx_callback(void*, void*)+0x40) [0x56223002d958]

Also, it was hit by rbd_mirror.sh test, for "laggy" image, which is created with --journal-object-size 64K: the bench write got stuck in my case. And I was able to reproduce with:

rbd --cluster cluster2 create --size 128 --image-feature layering,exclusive-lock,journaling --journal-object-size 64K mirror/laggy1
rbd --cluster cluster2 bench mirror/laggy1 --io-type write --io-size 16384 --io-threads 1 --io-total 327680000 --io-pattern rand

It gets stuck. Interestingly, when I add --debug-rbd=30 --debug-journaler=30 it starts crashing to me with JournalRecorder.cc: 267: FAILED ceph_assert(object_recorder->is_closed()), so I suppose it is the same problem that was hit by TestMigration.Stress2.

dillaman · 2019-11-14T03:06:36Z

Found a brand new race condition in the code that I guess my batching fixes exposed. I will run through teuthology just to double-check that nothing else shakes out.

Use move semantics and RAII to control the locking of the per-object recorder lock. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

The flush by age was always causing an immediate flush due to a backwards comparison. Additionally, the in-flight byte tracker was never decremented which caused premature closure of the journal object. Finally, there was a potential race condition between closing the object and in-flight notification callbacks executing. Now we keep the lock held for both closed and overflow callbacks to prevent the small chance of a race. Fixes: https://tracker.ceph.com/issues/42598 Signed-off-by: Jason Dillaman <dillaman@redhat.com>

While the old set is being closed, additional IO can be queued up within the old, closed objects while the in-flight IO settles. It's therefore possible that the queued IO that is transferred from the old set to the new set causes an immediate overflow of the new set. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

dillaman · 2019-11-19T23:37:59Z

I am pretty confident that I've been chasing an unrelated failure [1] that has been introduced into the master branch [2] that is affecting the the dynamic features test. It seems to fail consistently on Ubuntu 18 but not on other distros.

The failure is related to CRC errors causing the messenger layer to get into a connect/resend/fail CRC/disconnect spin loop.

[1] http://pulpito.ceph.com/jdillaman-2019-11-19_16:32:12-rbd-wip-42598-distro-basic-smithi/
[2] http://pulpito.ceph.com/teuthology-2019-11-18_02:01:01-rbd-master-distro-basic-smithi/

dillaman · 2019-11-20T02:19:55Z

Build failures from PR #31672

dillaman · 2019-11-20T13:16:21Z

http://pulpito.ceph.com/jdillaman-2019-11-19_22:50:01-rbd-wip-42598-distro-basic-smithi/

dillaman · 2019-11-20T13:16:32Z

jenkins test make check

dillaman · 2019-11-20T13:16:42Z

jenkins test make check arm64

trociny · 2019-11-20T16:49:42Z

@dillaman Just FYI (as it is not related to the PR), when testing it locally and running something like rbd bench --io-type write --io-size 64K --io-threads 16 --io-total 1G test for a journaling enabled image, I hit this assertion in the messenger once:

recursive lock of OSDSession::lock (27)
 ceph version 15.0.0-7401-g6dcef068ff (6dcef068ffa21c0c9e0084dd4dab49c897e128b8) octopus (dev)
 1: (ceph::mutex_debug_detail::mutex_debugging_base::_will_lock(bool)+0x4a) [0x7fdd48e931fc]
 2: (ceph::shared_mutex_debug::lock()+0x30) [0x7fdd48f34226]
 3: (()+0x13272f) [0x7fdd5201a72f]
 4: (()+0x12d039) [0x7fdd52015039]
 5: (()+0x16bfd9) [0x7fdd52053fd9]
 6: (()+0x15bb38) [0x7fdd52043b38]
 7: (()+0x15d826) [0x7fdd52045826]
 8: (()+0x15b786) [0x7fdd52043786]
 9: (()+0x125fda) [0x7fdd5200dfda]
 10: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xe9) [0x7fdd48f46881]
 11: (DispatchQueue::entry()+0x622) [0x7fdd48f45234]
 12: (DispatchQueue::DispatchThread::entry()+0x1c) [0x7fdd4908fcda]
 13: (Thread::entry_wrapper()+0x78) [0x7fdd48d3e752]
 14: (Thread::_entry_func(void*)+0x18) [0x7fdd48d3e6d0]
 15: (()+0x8f2a) [0x7fdd47443f2a]
 16: (clone()+0x3f) [0x7fdd46f7b4af]

smithfarm · 2020-09-26T14:13:55Z

@trociny @dillaman ( Referring to https://tracker.ceph.com/issues/42953#change-175846 ) Would a successful nautilus backport of this PR fix the following "make check" failure that we occasionally see in nautilus?

177/177 Test #123: unittest_journal ........................***Timeout 3600.02 sec                              
...                                                                                                             
Failed to load class: cas (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cas.so):     
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cas.so: undefined symbol:              
+_Z27cls_cxx_chunk_write_and_setPviiPN4ceph6buffer7v14_2_04listEjS4_i                                           
Failed to load class: cephfs                                                                                    
+(/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cephfs.so):                           
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_cephfs.so: undefined symbol:           
+_ZTI10PGLSFilter                                                                                               
Failed to load class: hello (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_hello.so): 
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_hello.so: undefined symbol:            
+_ZTI10PGLSFilter                                                                                               
Failed to load class: log (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_log.so):     
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_log.so: undefined symbol:              
+_Z23cls_cxx_map_read_headerPvPN4ceph6buffer7v14_2_04listE                                                      
Failed to load class: rgw (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_rgw.so):     
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_rgw.so: undefined symbol:              
+_Z23cls_cxx_map_read_headerPvPN4ceph6buffer7v14_2_04listE                                                      
Failed to load class: user (/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_user.so):   
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_user.so: undefined symbol:             
+_Z23cls_cxx_map_read_headerPvPN4ceph6buffer7v14_2_04listE                                                      
Failed to load class: version                                                                                   
+(/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_version.so):                          
+/home/jenkins-build/build/workspace/ceph-pull-requests/build/lib/libcls_version.so: undefined symbol:          
+_Z19cls_gen_rand_base64Pci

dillaman · 2020-09-28T13:57:55Z

Not sure -- but the original issue in that linked tracker was a crash, not a hang so I'd lean towards no.

dillaman added bug-fix rbd tests labels Nov 5, 2019

dillaman force-pushed the wip-42598 branch from 2526892 to 2dad5f4 Compare November 5, 2019 02:48

trociny approved these changes Nov 8, 2019

View reviewed changes

trociny added needs-qa wip-mgolub-testing labels Nov 8, 2019

dillaman force-pushed the wip-42598 branch from 2dad5f4 to 64a2d6b Compare November 8, 2019 19:19

dillaman added the DNM label Nov 9, 2019

dillaman force-pushed the wip-42598 branch from 64a2d6b to 188ae5c Compare November 11, 2019 16:54

dillaman removed the DNM label Nov 11, 2019

dillaman force-pushed the wip-42598 branch 2 times, most recently from 4b9f4b8 to 0d70963 Compare November 14, 2019 01:33

dillaman force-pushed the wip-42598 branch 2 times, most recently from 9443f9c to 01cbbea Compare November 19, 2019 03:00

Jason Dillaman added 3 commits November 19, 2019 08:45

journal: simplify object recorder locker aggregation

5a5b5c5

Use move semantics and RAII to control the locking of the per-object recorder lock. Signed-off-by: Jason Dillaman <dillaman@redhat.com>

dillaman force-pushed the wip-42598 branch from 01cbbea to 2773e68 Compare November 19, 2019 23:35

trociny merged commit 840fc07 into ceph:master Nov 20, 2019

dillaman deleted the wip-42598 branch November 20, 2019 16:31

smithfarm mentioned this pull request Dec 6, 2019

nautilus: rbd: journal: fix flush by age and in-flight byte tracking #32061

Closed

trociny mentioned this pull request Jun 8, 2021

nautilus: rbd: journal: fix flush by age and in-flight byte tracking #41748

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

journal: fix flush by age and in-flight byte tracking #31392

journal: fix flush by age and in-flight byte tracking #31392

dillaman commented Nov 5, 2019 •

edited

zhengchengyao commented Nov 5, 2019

trociny left a comment

trociny commented Nov 8, 2019

trociny commented Nov 8, 2019

trociny commented Nov 8, 2019

dillaman commented Nov 8, 2019

dillaman commented Nov 11, 2019

trociny commented Nov 13, 2019

dillaman commented Nov 14, 2019

dillaman commented Nov 19, 2019

dillaman commented Nov 20, 2019

dillaman commented Nov 20, 2019

dillaman commented Nov 20, 2019

dillaman commented Nov 20, 2019

trociny commented Nov 20, 2019

smithfarm commented Sep 26, 2020

dillaman commented Sep 28, 2020

journal: fix flush by age and in-flight byte tracking #31392

journal: fix flush by age and in-flight byte tracking #31392

Conversation

dillaman commented Nov 5, 2019 • edited

Checklist

zhengchengyao commented Nov 5, 2019

trociny left a comment

Choose a reason for hiding this comment

trociny commented Nov 8, 2019

trociny commented Nov 8, 2019

trociny commented Nov 8, 2019

dillaman commented Nov 8, 2019

dillaman commented Nov 11, 2019

trociny commented Nov 13, 2019

dillaman commented Nov 14, 2019

dillaman commented Nov 19, 2019

dillaman commented Nov 20, 2019

dillaman commented Nov 20, 2019

dillaman commented Nov 20, 2019

dillaman commented Nov 20, 2019

trociny commented Nov 20, 2019

smithfarm commented Sep 26, 2020

dillaman commented Sep 28, 2020

dillaman commented Nov 5, 2019 •

edited