New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os/bluestore/BlueFS: move release unused extents work in _flush_and_syn_log #17684
Conversation
src/os/bluestore/BlueFS.cc
Outdated
alloc[i]->release(p.get_start(), p.get_len()); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to retry the reserve() call, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actualy, no, this still isn't right.. we have to flush the log before we can release and potentially re-use (i.e. write to) previously referenced extents or else we may crash and have overwritten data we previously referenced.
If we want to improve this then we should be able to move that release dance in sync_metadata() into _sync_and_flush_log(). I.e., we need to repvent a case where an unlink causes us to release extents, reallocate, and overwrite old data before the unlink event has committed to the log..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. But i have a question: your commit : e0ec06e which introduce deferred release unused extents. I'm not understand. Why? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlink A: release extents(a,b,c...)
A allocate: a,b.
But unlink A will update log_t before log_t update when allocate(a,b). If crash, those logs all missed. Which case cause log(allocate) exist but log(release) missed.
Earlier versions of the allocators would handle the delay of releases. Released extents would go on a pending_release lease, and after each commit there was a call into allocator telling it prevoiusly queued releases are now safe to actaully release. That commit moved that logic into the caller. In your example, the problem is that after we alllocate teh blocks we will presumably (over)write them before queueing the log_t record.. which will overwrite content that was there before. We we crash and the unlink (and allocate) are lost, we still have a valid pointer to those blocsk from the old version of A and it's contents has been corrupted. |
d601c99
to
f17b944
Compare
@liewegas , Thanks your detail explain. By your suggestion: move release into _flush_and_sync_log. And call _flush(log_write, true) if reserve faild. |
The first commit looks good. The second commit is problematic, because _flush will drop the lock, and most of the _allocate() callers aren't prepared for that. I suspect what actually needs to happen is each of those callers independently needs to handle _allocate() failure instead of asserting, and be adjusted one-by-one. Honestly, though, I'm not sure it is worth the effort. With the first patch it seems very unlikely that we'll be in a situation where we need to allocate and have enough space in the pending free list to continue. |
f17b944
to
2fc80f9
Compare
ASAP to release unused extents. Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
This looks right to me but bluefs data safety is critical so I'd like a few more eyes on it! Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm also
os/bluestore: ceph#17354; ceph#17627; ceph#17684 See merge request clove/ceph!251
Now only in sync_metadata release unused extents. So if allocate extents
failed, we should try relase unused extents.
Signed-off-by: Jianpeng Ma jianpeng.ma@intel.com