Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: include modified objects in flush list even if onode unchanged #12541

Merged
merged 1 commit into from Dec 20, 2016

Conversation

liewegas
Copy link
Member

We use the onode flush list so that we can ->flush() as
a barrier before doing any read/modify/write. For
example, omap_rmkeyrange will flush before reading to
see what keys to erase in order to ensure that any
previous inserts are applied to the db and we see them
and remove them.

However, some omap operations don't update the onode
itself, which means write_onode() doesn't get called and
we aren't put on this list.

Add a note_modified_object() helper that can be called
instead of write_onode() for those cases. That way we
get on the list and flush() works as expected.

Signed-off-by: Sage Weil sage@redhat.com

@liewegas liewegas added this to the kraken milestone Dec 16, 2016
if (o->flush_txns.empty())
o->flush_cond.notify_all();
}
}
// clear out refs
txc->onodes.clear();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear modified_objects here similar as we do for onodes?

@@ -6417,6 +6417,13 @@ void BlueStore::_txc_write_nodes(TransContext *txc, KeyValueDB::Transaction t)
o->flush_txns.insert(txc);
}

for (auto o : txc->modified_objects) {
if (txc->onodes.count(o) == 0) {
Copy link
Contributor

@ifed01 ifed01 Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove onode from modified_objects here if it's present in onodes and hence avoid the same check in _txc_finish.

@@ -6489,7 +6496,17 @@ void BlueStore::_txc_finish(TransContext *txc)
if ((*p)->flush_txns.empty())
(*p)->flush_cond.notify_all();
}

for (auto o : txc->modified_objects) {
Copy link
Contributor

@ifed01 ifed01 Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If modified_objects is normalized in _txc_write_nodes then we can replace both loops with something like
set* sets[] = {&(txc->onodes), &(txc->modified_objects), nullptr);
for( set** _set = sets; *_set != nullptr, ++_set){
for( set::iterator p = _set->begin();
...
}

@ifed01
Copy link
Contributor

ifed01 commented Dec 19, 2016

Shouldn't we call txc->modified_object.erase(o) in _do_remove similar to txc->onodes.erase(o) ???

@liewegas liewegas force-pushed the wip-bluestore-omap-flush branch 4 times, most recently from 9864bf8 to 00f277e Compare December 19, 2016 14:36
@ifed01
Copy link
Contributor

ifed01 commented Dec 19, 2016

LGTM

…changed

We use the onode flush list so that we can ->flush() as
a barrier before doing any read/modify/write.  For
example, omap_rmkeyrange will flush before reading to
see what keys to erase in order to ensure that any
previous inserts are applied to the db and we see them
and remove them.

However, some omap operations don't update the onode
itself, which means write_onode() doesn't get called and
we aren't put on this list.

Add a note_modified_object() helper that can be called
instead of write_onode() for those cases.  That way we
get on the list and flush() works as expected.

We could have resolved this by just putting ourselves on
the dirty onode list, but in practice every OSD op is
writing omap keys to the pgmeta object and there is no
need to touch the onode key in this case, so doing so
would be a big regression.

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas liewegas merged commit f674fa2 into ceph:master Dec 20, 2016
@liewegas liewegas deleted the wip-bluestore-omap-flush branch December 20, 2016 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants