Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: fix bugs in bluefs and bdev flush #13911

Merged
merged 3 commits into from Mar 17, 2017

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Mar 9, 2017

flush() may be called from multiple racing threads (notably, rocksdb can call fsync via
bluefs at any time), and we need to make sure that if one thread sees the io_since_flush
command and does an actual flush, that other racing threads also wait until that flush is
complete.  This is accomplished with a simple mutex!

Also, set the flag on IO *completion*, since flush is only a promise about
completed IOs, not submitted IOs.

Document.

Fixes: http://tracker.ceph.com/issues/19251
Signed-off-by: Sage Weil <sage@redhat.com>
We need to flush any new writes on any fsync().  Notably, this includes
the rocksdb log. However, previously _fsync was only doing a bdev flush if
we also had a dirty bluefs journal and called into _sync_and_flush_journal.
If we didn't, we weren't doing a flush() at all, which could lead to
corrupted data.

Fix this by moving the first flush_bdev *out* of _sync_and_flush_log.  (The
second one is there to flush the bluefs journal; the first one was to
ensure prior writes are stable.)  Instead, flush prior writes in all of the
callers prior to calling _sync_and_flush_log.  This includes _fsync (and
fixes the bug by covering the non-journal-flush path) as well as several
other callers.

Fixes: http://tracker.ceph.com/issues/19250
Signed-off-by: Sage Weil <sage@redhat.com>
// aio completion notification will not return before that aio is
// stable on disk: whichever thread sees the flag first will block
// followers until the aio is stable.
std::lock_guard<std::mutex> l(flush_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add a perf counter to this flush_mutex lock contention. I suspect it will cause serious performance spike...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could fix this now.

@liewegas
Copy link
Member Author

liewegas commented Mar 10, 2017 via email

@yuyuyu101
Copy link
Member

OH, I got the actual idea.

@liewegas liewegas merged commit 6c6b1be into ceph:master Mar 17, 2017
@liewegas liewegas deleted the wip-bluestore-fix-flush branch March 17, 2017 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants