New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: rewrite deferred write handling #14491

Merged
merged 4 commits into from Apr 27, 2017

Conversation

Projects
None yet
3 participants
@liewegas
Member

liewegas commented Apr 12, 2017

No description provided.

@liewegas liewegas requested a review from ifed01 Apr 14, 2017

@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Apr 14, 2017

Member

retest this please

Member

liewegas commented Apr 14, 2017

retest this please

Show outdated Hide outdated src/os/bluestore/BlueStore.cc
@@ -3165,7 +3267,11 @@ void *BlueStore::MempoolThread::entry()
static void aio_cb(void *priv, void *priv2)
{
BlueStore *store = static_cast<BlueStore*>(priv);
store->_txc_aio_finish(priv2);
if ((unsigned long long)priv2 & 0x1ull) {
store->deferred_aio_finish((void*)((unsigned long long)priv2 & ~1ull));

This comment has been minimized.

@ifed01

ifed01 Apr 17, 2017

Contributor

What's about having more universal and less error-prone approach here -> inherit both TransContext & DeferredBatch from an abstract class, implementing single virtual callback method:
struct AioContext{
virtual void aio_finish(BlueStore*) = 0;
};

@ifed01

ifed01 Apr 17, 2017

Contributor

What's about having more universal and less error-prone approach here -> inherit both TransContext & DeferredBatch from an abstract class, implementing single virtual callback method:
struct AioContext{
virtual void aio_finish(BlueStore*) = 0;
};

deferred_queue_size -= osr->deferred_pending.size();
auto b = osr->deferred_pending;
deferred_queue_size -= b->seq_bytes.size();

This comment has been minimized.

@ifed01

ifed01 Apr 17, 2017

Contributor

It looks like deferred_queue_size tracks amount of txcs, not bytes. See ++deferred_queue_size in _deferred_queue()

@ifed01

ifed01 Apr 17, 2017

Contributor

It looks like deferred_queue_size tracks amount of txcs, not bytes. See ++deferred_queue_size in _deferred_queue()

@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Apr 17, 2017

Member
Member

liewegas commented Apr 17, 2017

@liewegas

This comment has been minimized.

Show comment
Hide comment
@liewegas

liewegas Apr 17, 2017

Member

Updated to use AioContext

Member

liewegas commented Apr 17, 2017

Updated to use AioContext

@ifed01

ifed01 approved these changes Apr 18, 2017

@@ -7964,16 +8072,29 @@ bluestore_deferred_op_t *BlueStore::_get_deferred_op(
void BlueStore::_deferred_queue(TransContext *txc)
{
dout(20) << __func__ << " txc " << txc << " on " << txc->osr << dendl;
dout(20) << __func__ << " txc " << txc << " osr " << txc->osr << dendl;
std::lock_guard<std::mutex> l(deferred_lock);

This comment has been minimized.

@ifed01

ifed01 Apr 18, 2017

Contributor

Just some thoughts I want to share - having such a common lock might probably impact the performance negatively - it seems the only global entity really needed such a protection is deferred_queue itself. Other stuff is per-OSR and hence might have another lock with narrower scope. May be refactor later?

@ifed01

ifed01 Apr 18, 2017

Contributor

Just some thoughts I want to share - having such a common lock might probably impact the performance negatively - it seems the only global entity really needed such a protection is deferred_queue itself. Other stuff is per-OSR and hence might have another lock with narrower scope. May be refactor later?

This comment has been minimized.

@liewegas

liewegas Apr 19, 2017

Member

I agree, but the locking gets pretty complicated. I prefer to make that a follow-on change (when we see this as a point of contention)

@liewegas

liewegas Apr 19, 2017

Member

I agree, but the locking gets pretty complicated. I prefer to make that a follow-on change (when we see this as a point of contention)

Show outdated Hide outdated src/os/bluestore/BlueStore.cc
DeferredBatch *b = osr->deferred_running;
{
std::lock_guard<std::mutex> l2(osr->qlock);
std::lock_guard<std::mutex> l(kv_lock);

This comment has been minimized.

@ifed01

ifed01 Apr 18, 2017

Contributor

Do we really need this lock here?

@ifed01

ifed01 Apr 18, 2017

Contributor

Do we really need this lock here?

This comment has been minimized.

@liewegas

liewegas Apr 19, 2017

Member

hmm we could move it down, but we need it for the deferred_done_queue.emplace.

@liewegas

liewegas Apr 19, 2017

Member

hmm we could move it down, but we need it for the deferred_done_queue.emplace.

@liewegas liewegas added needs-qa and removed wip-sage-testing labels Apr 19, 2017

@markhpc markhpc added cleanup performance and removed cleanup labels Apr 20, 2017

liewegas added some commits Apr 12, 2017

os/bluestore: restructure deferred writes
Explicitly aggregate deferred writes into a batch.  When we
submit, take the opportunity to coalesce contiguous writes.
Handle aio completion independently from the original txcs.

Note that this paves the way for a few additional steps:

1- we could make deallocations cancel deferred writes.
2- we could drop the txc deferred states entirely and rely on
the explicit deferred write batch machinery instead... if we
build an alternative way to complete the SharedBlob writes
and ensure the lifecycle issue are dealt with.  (I'm not sure
it would be worth it, but it might be.)

Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore: use superclass for aio completions
This is less fragile and more explicit than using the lower bit
of the priv pointer.

Signed-off-by: Sage Weil <sage@redhat.com>
os/bluestore: narrow lock scope in _deferred_aio_finish
Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas merged commit 930d215 into ceph:master Apr 27, 2017

4 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
arm build successfully built on arm
Details
default Build finished.
Details

@liewegas liewegas deleted the liewegas:wip-bluestore-deferred-rewrite branch Apr 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment