New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: fix bugs in bluefs and bdev flush #13911

Merged
merged 3 commits into from Mar 17, 2017

Conversation

Projects
None yet
2 participants
@liewegas
Member

liewegas commented Mar 9, 2017

liewegas added some commits Mar 9, 2017

os/bluestore/KernelDevice: make flush() thread safe
flush() may be called from multiple racing threads (notably, rocksdb can call fsync via
bluefs at any time), and we need to make sure that if one thread sees the io_since_flush
command and does an actual flush, that other racing threads also wait until that flush is
complete.  This is accomplished with a simple mutex!

Also, set the flag on IO *completion*, since flush is only a promise about
completed IOs, not submitted IOs.

Document.

Fixes: http://tracker.ceph.com/issues/19251
Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore/BlueFS: fix flush_bdev placement
We need to flush any new writes on any fsync().  Notably, this includes
the rocksdb log. However, previously _fsync was only doing a bdev flush if
we also had a dirty bluefs journal and called into _sync_and_flush_journal.
If we didn't, we weren't doing a flush() at all, which could lead to
corrupted data.

Fix this by moving the first flush_bdev *out* of _sync_and_flush_log.  (The
second one is there to flush the bluefs journal; the first one was to
ensure prior writes are stable.)  Instead, flush prior writes in all of the
callers prior to calling _sync_and_flush_log.  This includes _fsync (and
fixes the bug by covering the non-journal-flush path) as well as several
other callers.

Fixes: http://tracker.ceph.com/issues/19250
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas requested a review from yuyuyu101 Mar 9, 2017

// aio completion notification will not return before that aio is
// stable on disk: whichever thread sees the flag first will block
// followers until the aio is stable.
std::lock_guard<std::mutex> l(flush_mutex);

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 10, 2017

Member

I suggest we add a perf counter to this flush_mutex lock contention. I suspect it will cause serious performance spike...

@yuyuyu101

yuyuyu101 Mar 10, 2017

Member

I suggest we add a perf counter to this flush_mutex lock contention. I suspect it will cause serious performance spike...

This comment has been minimized.

@yuyuyu101

yuyuyu101 Mar 10, 2017

Member

we could fix this now.

@yuyuyu101

yuyuyu101 Mar 10, 2017

Member

we could fix this now.

@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Mar 10, 2017

Member
Member

liewegas commented Mar 10, 2017

@yuyuyu101

This comment has been minimized.

Show comment
Hide comment
@yuyuyu101

yuyuyu101 Mar 10, 2017

Member

OH, I got the actual idea.

Member

yuyuyu101 commented Mar 10, 2017

OH, I got the actual idea.

@liewegas liewegas merged commit 6c6b1be into ceph:master Mar 17, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@liewegas liewegas deleted the liewegas:wip-bluestore-fix-flush branch Mar 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment