New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/bluestore: include modified objects in flush list even if onode unchanged #12541
Conversation
05ea2ad
to
1f77e5c
Compare
if (o->flush_txns.empty()) | ||
o->flush_cond.notify_all(); | ||
} | ||
} | ||
// clear out refs | ||
txc->onodes.clear(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clear modified_objects here similar as we do for onodes?
@@ -6417,6 +6417,13 @@ void BlueStore::_txc_write_nodes(TransContext *txc, KeyValueDB::Transaction t) | |||
o->flush_txns.insert(txc); | |||
} | |||
|
|||
for (auto o : txc->modified_objects) { | |||
if (txc->onodes.count(o) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove onode from modified_objects here if it's present in onodes and hence avoid the same check in _txc_finish.
@@ -6489,7 +6496,17 @@ void BlueStore::_txc_finish(TransContext *txc) | |||
if ((*p)->flush_txns.empty()) | |||
(*p)->flush_cond.notify_all(); | |||
} | |||
|
|||
for (auto o : txc->modified_objects) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If modified_objects is normalized in _txc_write_nodes then we can replace both loops with something like
set* sets[] = {&(txc->onodes), &(txc->modified_objects), nullptr);
for( set** _set = sets; *_set != nullptr, ++_set){
for( set::iterator p = _set->begin();
...
}
Shouldn't we call txc->modified_object.erase(o) in _do_remove similar to txc->onodes.erase(o) ??? |
9864bf8
to
00f277e
Compare
LGTM |
00f277e
to
30455c0
Compare
…changed We use the onode flush list so that we can ->flush() as a barrier before doing any read/modify/write. For example, omap_rmkeyrange will flush before reading to see what keys to erase in order to ensure that any previous inserts are applied to the db and we see them and remove them. However, some omap operations don't update the onode itself, which means write_onode() doesn't get called and we aren't put on this list. Add a note_modified_object() helper that can be called instead of write_onode() for those cases. That way we get on the list and flush() works as expected. We could have resolved this by just putting ourselves on the dirty onode list, but in practice every OSD op is writing omap keys to the pgmeta object and there is no need to touch the onode key in this case, so doing so would be a big regression. Signed-off-by: Sage Weil <sage@redhat.com>
30455c0
to
1b319f2
Compare
We use the onode flush list so that we can ->flush() as
a barrier before doing any read/modify/write. For
example, omap_rmkeyrange will flush before reading to
see what keys to erase in order to ensure that any
previous inserts are applied to the db and we see them
and remove them.
However, some omap operations don't update the onode
itself, which means write_onode() doesn't get called and
we aren't put on this list.
Add a note_modified_object() helper that can be called
instead of write_onode() for those cases. That way we
get on the list and flush() works as expected.
Signed-off-by: Sage Weil sage@redhat.com