os/bluestore: include modified objects in flush list even if onode unchanged #12541

Merged
merged 1 commit into from Dec 20, 2016

Projects

None yet

3 participants

@liewegas
Member

We use the onode flush list so that we can ->flush() as
a barrier before doing any read/modify/write. For
example, omap_rmkeyrange will flush before reading to
see what keys to erase in order to ensure that any
previous inserts are applied to the db and we see them
and remove them.

However, some omap operations don't update the onode
itself, which means write_onode() doesn't get called and
we aren't put on this list.

Add a note_modified_object() helper that can be called
instead of write_onode() for those cases. That way we
get on the list and flush() works as expected.

Signed-off-by: Sage Weil sage@redhat.com

@liewegas liewegas added this to the kraken milestone Dec 16, 2016
src/os/bluestore/BlueStore.cc
+ if (o->flush_txns.empty())
+ o->flush_cond.notify_all();
+ }
+ }
// clear out refs
txc->onodes.clear();
@ifed01
ifed01 Dec 19, 2016 Contributor

clear modified_objects here similar as we do for onodes?

src/os/bluestore/BlueStore.cc
@@ -6417,6 +6417,13 @@ void BlueStore::_txc_write_nodes(TransContext *txc, KeyValueDB::Transaction t)
o->flush_txns.insert(txc);
}
+ for (auto o : txc->modified_objects) {
+ if (txc->onodes.count(o) == 0) {
@ifed01
ifed01 Dec 19, 2016 edited Contributor

We can remove onode from modified_objects here if it's present in onodes and hence avoid the same check in _txc_finish.

src/os/bluestore/BlueStore.cc
@@ -6489,7 +6496,17 @@ void BlueStore::_txc_finish(TransContext *txc)
if ((*p)->flush_txns.empty())
(*p)->flush_cond.notify_all();
}
-
+ for (auto o : txc->modified_objects) {
@ifed01
ifed01 Dec 19, 2016 edited Contributor

If modified_objects is normalized in _txc_write_nodes then we can replace both loops with something like
set* sets[] = {&(txc->onodes), &(txc->modified_objects), nullptr);
for( set** _set = sets; *_set != nullptr, ++_set){
for( set::iterator p = _set->begin();
...
}

@ifed01
Contributor
ifed01 commented Dec 19, 2016

Shouldn't we call txc->modified_object.erase(o) in _do_remove similar to txc->onodes.erase(o) ???

@ifed01
Contributor
ifed01 commented Dec 19, 2016

LGTM

@liewegas liewegas os/bluestore: include modified objects in flush list even if onode un…
…changed

We use the onode flush list so that we can ->flush() as
a barrier before doing any read/modify/write.  For
example, omap_rmkeyrange will flush before reading to
see what keys to erase in order to ensure that any
previous inserts are applied to the db and we see them
and remove them.

However, some omap operations don't update the onode
itself, which means write_onode() doesn't get called and
we aren't put on this list.

Add a note_modified_object() helper that can be called
instead of write_onode() for those cases.  That way we
get on the list and flush() works as expected.

We could have resolved this by just putting ourselves on
the dirty onode list, but in practice every OSD op is
writing omap keys to the pgmeta object and there is no
need to touch the onode key in this case, so doing so
would be a big regression.

Signed-off-by: Sage Weil <sage@redhat.com>
1b319f2
@liewegas liewegas merged commit f674fa2 into ceph:master Dec 20, 2016

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
@liewegas liewegas deleted the liewegas:wip-bluestore-omap-flush branch Dec 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment