Skip to content

crimson/os/seastore/cache: initial_pending extents also do "set_io_wait/complete_io/wait_io"#62878

Closed
xxhdx1985126 wants to merge 5 commits intoceph:mainfrom
xxhdx1985126:wip-70976
Closed

crimson/os/seastore/cache: initial_pending extents also do "set_io_wait/complete_io/wait_io"#62878
xxhdx1985126 wants to merge 5 commits intoceph:mainfrom
xxhdx1985126:wip-70976

Conversation

@xxhdx1985126
Copy link
Copy Markdown
Member

Fixes: https://tracker.ceph.com/issues/70976
Signed-off-by: Xuehan Xu xuxuehan@qianxin.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

@xxhdx1985126
Copy link
Copy Markdown
Member Author

jenkins test make check

@xxhdx1985126
Copy link
Copy Markdown
Member Author

jenkins test api

@xxhdx1985126 xxhdx1985126 force-pushed the wip-70976 branch 3 times, most recently from 913d2bc to ee2e878 Compare April 18, 2025 09:52
@xxhdx1985126
Copy link
Copy Markdown
Member Author

jenkins test make check

@cyx1231st cyx1231st moved this to Awaits review in Crimson Apr 21, 2025
// stable and visible, see prepare_record().
//
// XXX: It might be good to mark this case as DIRTY from the definition,
// XXX: It might be good to mark this case as DIRTY/CLEAN from the definition,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The major reason is that I think it is inappropriate (and wrong) to expose the pending states that are not from (irrelavent to) this transaction. These states should actually be DIRTY/CLEAN from this transaction's perspective.

This will make things challenging (even worse when initial-pending is also affecting now) as there are lots of different places relying on deciding whether the extent is pending in this transaction.

@xxhdx1985126
Copy link
Copy Markdown
Member Author

@cyx1231st I've just modified the code as discussed offline, please take a look, thanks:-)

// XXX: It might be good to mark this case as DIRTY/CLEAN from the definition,
// which probably can make things simpler.
return is_mutation_pending() && is_pending_io();
if (is_pending_io()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if needs to be wrapped in DEBUG macro to build conditionally as well, otherwise non-debug build will see an empty if.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, IIUC, the empty if would be optimized by the compiler: https://godbolt.org/z/zd6oz6sbv

@cyx1231st
Copy link
Copy Markdown
Member

cyx1231st commented Apr 22, 2025

The essential gap from the discussions is that:

  • We cannot add stable initial-pending extents to the transaction read-set when its paddr is unknown.
  • We need to make sure transaction conflict still works in whatever way to work around the above gap.

@xxhdx1985126
Copy link
Copy Markdown
Member Author

xxhdx1985126 commented Apr 23, 2025

@cyx1231st It seems that the use cases in which an initial_pending extents might be met are as follows:

  1. In the omap tree, the user transactions might hit an initial pending omap nodes that are being committed by trimmier/cleaner transactions.
  2. In the future, when doing bottom-up lba iterator construction, we might hit an initial pending parent node.

Both of the above cases must go through Cache::get_extent_viewable_by_trans(). So I tried to add transaction interfaces for the seperated Transaction::add_to_read_set() steps ---- adding transactions to extents' read_transactions and adding extents to transactions' read_set ---- and do the seperated steps for initial pending extents in Cache::get_extent_viewable_by_trans().

I also added asserts to make sure the seperated steps are only used for initial pending and stable writing extents and prevent other cases from wrongly using the interfaces.

Please take a look, thanks:-)

Copy link
Copy Markdown
Member

@cyx1231st cyx1231st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it makes sense to me that only get_extent_viewable_by_trans() is and will be related to this issue in the near future, and there are proper asserts added.

Can you help to take a look at the paddr type validations in #62938 ?

If possible, I prefer to merge the code consolidation/cleanup work before moving forward.

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cyx1231st
Copy link
Copy Markdown
Member

cyx1231st commented May 15, 2025

The test shows this PR intensifies the issue https://tracker.ceph.com/issues/71317, which depends the PR #63218

See https://pulpito.ceph.com/yingxin-2025-05-15_02:31:43-crimson-rados-wip-yingxin-testing-2025-05-14-1435-distro-default-smithi/8285808/ osd2

ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.3.0-354-g69e4144d/rpm/el9/BUILD/ceph-20.3.0-354-g69e4144d/
src/crimson/os/seastore/transaction.h:303:
void crimson::os::seastore::Transaction::replace_placeholder(
  crimson::os::seastore::CachedExtent&,
  crimson::os::seastore::CachedExtent&):
Assertion  `where->ref.get() == &placeholder' failed.

Copy link
Copy Markdown
Member

@cyx1231st cyx1231st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Copy Markdown
Member

@cyx1231st cyx1231st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding to "Assertion `where->ref.get() == &placeholder' failed."

I see further gaps related to retired-placeholer (probably still incomplete):

  • retired-placeholder is not added atomically upon get_extent_if_linked(), it is only added later into the cache index
  • retired-placeholder is not linked atomically in the lba tree, so it is possible that
    • a retired placeholder can be added multiple times repeatedly
    • a real extent can be linked in the middle, making the errors looking more confusing
  • even after it is linked in the lba tree, there is no logic to handle this case properly (see comments).

Probably we should merge #63218 with this PR to fix the issues together.

Let me know your thoughts.

touch_extent(*p_extent, &t_src, t.get_cache_hint());
if (ret.is_paddr_known) {
touch_extent(*p_extent, &t_src, t.get_cache_hint());
} else {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the else case, validate p_extent must not be a retired-placeholder.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert:
assert(!is_retired_placeholder_type(p_extent->get_type()));

++stats.access.cache_lru;
}
touch_extent(*p_extent, &t_src, t.get_cache_hint());
if (ret.is_paddr_known) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #63218 (comment), after retired-placeholder is linked in the lba tree, it is possible to see a retired-placeholder here, I think the proper way to handle this is to:

  1. check if ret is a retired-placeholder.
  2. create a real one (which means we need to know the real type here)
  3. and replace placeholder by the real one immediately (see Cache::do_get_caching_extent())

There might be asserts needed somewhere to make sure things are working properly, still thinking.

CachedExtent* p_extent;
bool needs_step_2 = false;
bool needs_touch = false;
if (extent->is_stable()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #63218 (comment), and starting from this case, it should be possible to see the extent is a retired-placeholder.

bool needs_step_2 = false;
bool needs_touch = false;
if (extent->is_stable()) {
p_extent = extent->get_transactional_view(t);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a retired-placeholder cannot get-transactional-view, might need some adjustments.

needs_touch = true;
}
} else {
// already exists
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must not be a retired-placeholder if already exists.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert:
assert(!is_retired_placeholder_type(p_extent->get_type()));

@cyx1231st
Copy link
Copy Markdown
Member

I'll proceed to see if we can merge #62071 first.

@xxhdx1985126
Copy link
Copy Markdown
Member Author

Probably we should merge #63218 with this PR to fix the issues together.

Yeah, will do.

@xxhdx1985126
Copy link
Copy Markdown
Member Author

jenkins test api

auto it = read_set.emplace_hint(where, this, &extent);
placeholder.read_transactions.erase(
read_trans_set_t<Transaction>::s_iterator_to(*where));
read_items.emplace_back(this, &extent);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment:
// Note, the retired-placeholder is not removed from read_items after replace.

touch_extent(*p_extent, &t_src, t.get_cache_hint());
if (ret.is_paddr_known) {
touch_extent(*p_extent, &t_src, t.get_cache_hint());
} else {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert:
assert(!is_retired_placeholder_type(p_extent->get_type()));

needs_touch = true;
}
} else {
// already exists
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add assert:
assert(!is_retired_placeholder_type(p_extent->get_type()));

auto it = read_set.emplace_hint(where, this, &extent);
placeholder.read_transactions.erase(
read_trans_set_t<Transaction>::s_iterator_to(*where));
read_items.emplace_back(this, &extent);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the only changed place is here compared with the last version, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes:-)

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

set_io_wait/complete_io/wait_io

Fixes: https://tracker.ceph.com/issues/70976
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
…ntrusive set

Fixes: https://tracker.ceph.com/issues/70976
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
…tion::do_add_to_read_set()

The current implementation of Transaction::do_add_to_read_set() can
be seperated into two steps:
1. attach the extent to the transaction, i.e. insert the extent into the
   transaction's read_set
2. attach the transaction to the extent, i.e. insert the transaction
   into the extent's read_transactions

For initial pending and stable writing extents, we need to do the second
step before doing "CachedExtent::wait_io()" and to do the first step
after "CachedExtent::wait_io()". This commit add interfaces
corresponding to those two steps.

Fixes: https://tracker.ceph.com/issues/70976
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
initial pending and stable writing extents

Fixes: https://tracker.ceph.com/issues/70976
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
`BaseChildNode::get_parent_node()` to
`BaseChildNode::peek_parent_node()`

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
@xxhdx1985126
Copy link
Copy Markdown
Member Author

closing this PR as its commits have been merged to #63218

@Matan-B Matan-B removed this from Crimson Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants