-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crimson/os/seastore/lba_manager: don't increase intermediate mappings' refcount if LBAManager::clone_mapping() is called to remap mappings #57262
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
2a3c498
to
f50253d
Compare
jenkins test windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about how to restructure the logic properly, let me know what you think.
@@ -146,6 +145,37 @@ class LBAManager { | |||
laddr_t addr, | |||
int delta) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be dropped from LBAManager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be used by the UT test_transaction_manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not able to see the user directly, can you point to the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ceph/src/test/crimson/seastore/test_transaction_manager.cc
Lines 689 to 691 in fab66be
auto refcnt = with_trans_intr(*(t.t), [&](auto& trans) { | |
return tm->inc_ref(trans, offset); | |
}).unsafe_get0(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like it is calling ref_ret LBAManager::incref_extent(t, addr, delta)
, right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But TM::inc_ref
will call LBAManager::incref_extent()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it doesn't call the version of ref_ret LBAManager::incref_extent(t, addr, delta)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, will remove it:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, ref_ret LBAManager::incref_extent(t, addr, delta)
is called by the LBAManager::remap_mappings()
in this PR. Although we only remap at most two mappings for now, which means we can call ref_ret LBAManager::incref_extent(t, addr)
instead of ref_ret LBAManager::incref_extent(t, addr, delta)
, but I think maybe we can keep the way it is in case that we remap more than two mappings in the future. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BtreeLBAManager::remap_mappings()
is reusing BtreeLBAManager::incref_extent(t, addr, delta)
, which is fine, and should already support to remap more than 2 mappings IIUC.
But I don't see why incref_extent(t, addr, delta)
is necessary to be exposed at the level of LBAManager
.
return seastar::now(); | ||
}); | ||
} | ||
return ref_iertr::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Towards the direction (#57262 (comment)) to avoid unnecessary lba tree accesses, how about restructuring the logic like below:
To remap non-clone mapping:
- For the 1st remap, update the original mapping directly (no need to do a remove and an insert).
- For the 2-n-th remaps, probably
alloc_extents()
is more efficient than iterative calls toalloc_extent()
?
To remap mapping with clone:
- For the 1st remap, update the original direct mapping
- For the 2-n-th remaps, reuse
alloc_cloned_mappings()
(mentioned in another comment) - Adjust the ref-counter of the indirect mapping at last (instead of the current way to adjust in 2 places)
I think the above structure is also cleaner without mixing the very-different clone and non-clone pathes.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I think we should assert that:
- At non-clone mapping step 1, assert that the mapping is direct and ref-counter must be 1.
- At mapping with clone step 3, assert that the mapping ref-counter must be >= 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. For the 2-n-th remaps, probably
alloc_extents()
is more efficient than iterative calls toalloc_extent()
?
LBAManager::alloc_extents()
can only allocate laddr continuous mappings, which is not the case for the remap
scenario.
Maybe we should add a LBAManager::batch_modify()
interface, so that ObjectDataHandler
's do_remappings
, do_removals
and do_insertions
can be done with one lba tree search. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LBAManager::alloc_extents() can only allocate laddr continuous mappings, which is not the case for the remap scenario.
I see, remapping might need to allocate non-continuous mappings with a hole for the new write.
Maybe we should add a LBAManager::batch_modify() interface, so that ObjectDataHandler's do_remappings, do_removals and do_insertions can be done with one lba tree search. What do you think?
I agree that the new interface is more flexible, so that update (remove+insert) the first mapping and insert the follow-up mappings can be done in the same lookup.
f50253d
to
fdde90e
Compare
test-transaction-manager backtrace:
|
Yeah, looking into it. |
a49e5a5
to
e2a76e8
Compare
Just fixed. |
BTW, this failure pattern looks similar to https://tracker.ceph.com/issues/65635, not sure whether it can be fixed by 032ae27. |
e2a76e8
to
41b58ef
Compare
This failure is because this PR actually moved the |
41b58ef
to
d74a155
Compare
@cyx1231st Just address the comments, please take a look again, thanks:-) |
7e976e5
to
ddec6d0
Compare
@cyx1231st I've just refactored |
fae4574
to
9cf026c
Compare
if (!pin->is_indirect()) { | ||
auto fut2 = base_iertr::make_ready_future<TCachedExtentRef<T>>(); | ||
if (full_extent_integrity_check) { | ||
fut2 = read_pin<T>(t, pin->duplicate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, extended discussion.
As for read_pin(), I cannot recall why pin is an unique-ptr
and duplicate()
is needed. Probably it's simpler and faster if convert the pin ref to a shared ptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your above comment is an example of why keeping pins unique_ptr
. A pin may get invalidated once its parent lba leaf node is modified, and mandating duplicating a pin explicitly
can, to some extent, remind developers of the danger of using an invalid pin.
I think we can make it shared once we have implemented LBAMapping::is_valid()
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(extended discussion)
For remap_pin(), it is fine to require an rvalue pin to clarify the intention that the original pin will become invalid.
For read_pin(), it seems inconvenient to require an rvalue pin as the pin can still be valid after reading (e.g. looks object data handler wants to keep track the pin)
A pin may get invalidated once its parent lba leaf node is modified, mandating duplicating a pin explicit can, to some extent, remind developers of the danger of using an invalid pin.
That's true, and this defines how object data handler should work -- that it should not modify lba tree and reading a pin concurrently in the same transaction.
However, the method duplicate()
looks to be an overkill because it involves an extra allocation, so it is not merely a reminder. Also I think duplicate()
cannot make it safer in case of "A pin may get invalidated once its parent lba leaf node is modified".
I think we can make it shared once we have implemented LBAMapping::is_valid(), what do you think?
I think keeping the pin as unique-ptr is also fine, so that read-pin can become read_pin(Transaction &, LBAMapping &)
-- user needs to make sure the pin outlives reading.
Anyway, for this PR, I suppose we can keep the style of remap_pin(std::move(pin) ... )
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, totally agree.
original_laddr, | ||
intermediate_base, | ||
remapped_intermediate_key, | ||
std::move(original_bptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, this was wrong.
t, original_laddr, original_paddr, original_len, remaps.size()); | ||
// The according extent might be stable or pending. | ||
auto fut = base_iertr::now(); | ||
if (!pin->is_indirect()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, we don't need to deal with the cached extent if the pin is indirect.
9cf026c
to
71b202b
Compare
refcount if LBAManager::clone_mapping() is called to remap mappings TransactionManager::remap_pin() will increase the refcount itself Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
…nterface Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
dec_ref/alloc/clone/inc_ref of lba mappings caused by extent remappings are now integrated into a single LBAManager::remap_lba_mappings() interface Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
interface Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
`Cache::retire_extent_addr` in the future Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
71b202b
to
ac49811
Compare
jenkins test api |
LGTM, and how do you think about #57262 (comment) ? |
Ah, sorry, I forgot to address that comment, will change it. |
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
Done:-) |
jenkins test windows |
TransactionManager::remap_pin() will increase the refcount itself
Signed-off-by: Xuehan Xu xuxuehan@qianxin.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e