New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds/MDLog: manage the lifetime of a LogSegment with shared pointers #53086
base: main
Are you sure you want to change the base?
Conversation
430d867
to
57df0f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern with this PR is that it doesn't actually address:
"These really ought to be ref-counted in some way to prevent early expiry."
from https://tracker.ceph.com/issues/4744
It's not well laid out what the concern is but I suspect it's something like to prevent tickets like:
https://tracker.ceph.com/issues/59119
You could potentially resolve that ticket with this PR by checking that the LogSegment
reference count is 1 before trimming* the segment.
- I believe Greg actually meant "early trimming" as that's the real concern: that the log segment will be removed from the physical journal.
|
||
class MDSRank; | ||
class LogSegment; | ||
class EMetaBlob; | ||
using AutoSharedLogSegment = auto_shared_ptr<LogSegment>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm mystified by the utility of this class. Why couldn't it be:
using foo = std::shared_ptr<LogSegment>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the commit message, the utility of this class is to automatically convert a pointer to an object that inherits from enable_shared_from_this
into a shared pointer and also adds an implicit conversion from the shared pointer back to a plain pointer. This helps minimize refactoring.
I'll elaborate on how I believe this helps address the original ticket in a separate comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we use boost::intrusive_ptr
if other accessors have a raw pointer. That's somewhat inappropriate in a code base we fully control, the mds.
Your autoshared class is interesting though. I'll have to think about this more.
|
||
[[nodiscard]] static AutoSharedLogSegment create(uint64_t _seq, loff_t off = -1) | ||
{ | ||
return std::shared_ptr<LogSegment>(new LogSegment(_seq, off)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_shared
is inaccessible here because the constructor is private. See class Best()
in the example from the documentation on shared_ptr
Additionally, make_shared
comes at the cost of pinning the memory of the whole object when only weak pointers are left since the same memory block is used to hold the shared pointer control and the class data. This is mentioned in the second note on the documentation page you've referenced.
To clarify, I'm not sure this will be a blocker for us, and maybe we could even benefit from it, but I think it's an important point to keep in mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make_shared
is inaccessible here because the constructor is private. See classBest()
in the example from the documentation on shared_ptr
ok
Additionally,
make_shared
comes at the cost of pinning the memory of the whole object when only weak pointers are left since the same memory block is used to hold the shared pointer control and the class data. This is mentioned in the second note on the documentation page you've referenced. To clarify, I'm not sure this will be a blocker for us, and maybe we could even benefit from it, but I think it's an important point to keep in mind
I don't really think that will be significant but I also wonder if a clever implementation is permitted to realloc
the memory block when T
is destroyed. What an odd caveat I had the opposite understanding about.
@batrick I do think that this PR addresses the original ticket. It does so by changing all containers to hold a shared pointer to the LogSegment instead of a plain pointer. It wouldn't be sufficient to only update the storage semantics, we also needed to correctly create shared pointers to the same original object. To achieve this I decided to utilize the To further reduce the scope of refactoring I've introduced a helper class
This point needs discussion, as I don't think this requirement is still applicable - though I'll be happy to learn if I'm wrong about this. At the time the original ticket was created, mds trimming logic was different - and simpler. Back then trimming was done synchronously, but now trimming is staged through
I haven't yet fully understood the logic, but it looks like there's already an asynchronous process that only trims segments that were reported as unused. |
You've refcounted Lines 842 to 851 in ce9771b
to not trim (i.e. delete) a segment unless its refcount is 1. |
Well, I don't need to do anything for the log segment to be deleted only when the last strong reference to it is gone, that's the primary function of a shared pointer. In other words, if the delete operation is what we'd like to delay - that's already the case with the current implementation. |
57df0f1
to
f50344f
Compare
f50344f
to
03faff6
Compare
03faff6
to
47028d4
Compare
@batrick @vshankar I ended up changing quite a bit. Unfortunately, there is no unittest infrastructure for MDLog, but it's worth investing in it. Most if not all of my changes can be relatively quickly verified with unit tests, and MDLog does not have that many external dependencies compared to other core classes. I've changed this PR to a draft so that I can upload the current version which compiles, and have you go over it and start giving overall comments. If we want to proceed with this, I will have another commit where I make MDLog constructible in a unit test environment and start adding unit tests to verify the invariants that I have most probably violated For details about the changes please see the commit message |
47028d4
to
dc9d658
Compare
Specific questions to look at here:
|
src/mds/MDLog.cc
Outdated
logger->set(l_mdl_wrpos, ls->end); | ||
// acquire the mds lock until the end of the scope | ||
// to update the LS offsets and major offsets | ||
std::lock_guard l(mds->mds_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to go through these changes in more detail but this will not fly. The mds_lock
has too much thread contention already. Also, the submit thread is intended to operate completely outside the mds_lock
. I do not think we should ever change that.
I believe there may be another way to protect the LogSegment members behind the mds_lock
: add the LogSegment
to C_MDL_Flushed
and update the segment's end/offset in the completion (::finish
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C_MDL_Flushed will take mds_lock, so not much difference in terms of contention. But I agree that we could benefit from performing the update on the finisher thread instead of the submit thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you said, the journaler thread already regularly acquires the mds_lock
so it's advantageous to use it for this purpose instead of adding a new thread to the contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see a1c9d1a, I removed the need for locking.
339c4ba
to
c8f70be
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
c8f70be
to
ed82c7c
Compare
ed82c7c
to
e5bb5d5
Compare
jenkins retest this please |
@vshankar apparently, there are some mdlog unit test timeouts that I need to investigate. I will get back to it after the quiesce db |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
e5bb5d5
to
4164e8a
Compare
damaged_unlocked is another method in the same class that has the same semantics but a better name, so now hande_write_error_unlocked will take the mds lock before executing handle_write_error. Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
* Make sure that any structure referring to a LogSegment is using a shared pointer to do that * Limit the API changes by enhancing the LogSegment class with an `enable_shared_from_this` base - Make LogSegment::LogSegment private and expose a `create` method to correctly initialize the base class - Introduce AutoSharedLogSegment helper class that recovers a shared pointer from a LogSegment* Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/4744
The main goal of this change is to keep expired segments as weak references, allowing to spot "zombie" segments which should've been deleted by the time we decide to trim them. Since it's not a fatal error to have a dangling log segment given that it was properly expired, the system will ignore zombies unless a dedicated new debug flag is enabled: - name: mds_debug_zombie_log_segments type: bool level: advanced default: false services: - mds fmt_desc: MDS will abort if an expired log segment is still strongly referenced (for developers only). To achieve this it was necessary to change how segments are kept in the log. Specifically, it was necessary to break up the `segments` map into two: unexpired_segments and expired_segments, where the latter holds structs of ExpiredSegmentInfo with a weak ref to the original log segment, as well as some importany by-value copies for accounting and syncronization. Since the change already required significant refactoring, it was used to deliver a number of additional improvements: * expiring segments don't need a separate container, they are now kept in `unexpired_segments` until they complete expiration. The expiration boundary is kept in a separate variable `last_expiring_segment_seq` which is sufficient. * submit_mutex is now used exclusively to protect the `pending_events` when sending them to the submit thread. All other internal state is already protected by the mds_lock * expiration waiters are now added as a single context associated with a segment sequence, which will be finished when no more unexpired segments exists with the sequence or below. * expiration and trimming logic were optimized to make as much use of the O(log N) lookup in the std containers as possible. * methods and fields were renamed to reduce ambiguity Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
4164e8a
to
c3c87ed
Compare
enable_shared_from_this
basecreate
method to correctly initialize the base classFixes: https://tracker.ceph.com/issues/4744
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows