Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crimson/osd: decouple cross-core pg submission out of the OrderedExclusivePhase #53537

Merged
merged 11 commits into from
Nov 7, 2023

Conversation

cyx1231st
Copy link
Member

@cyx1231st cyx1231st commented Sep 20, 2023

The first 7 commits are cleanups and potential fixes to the pipelining infrastructure:

  • Drop the unused interface from PipelineExitBarrierI.
  • Generalize the shared logic around the remote pg submission.
  • Fix the usages of PipelineHandle::complete() and exit().

The last 2 commits are to decouple cross-core pg submission out of the OrderedExclusivePhase, by:

  • Introducing 2 independent phases before and after the cross-core pg submission.
  • Preserve the ordering during cross-core pg submissing using per-core sequence ids.

The rough evaluation shows at most 235% end performance improvements at 8 cores with cyanstore and 1 OSD:
iops-ref-opt

Note that the OPT case was evaluated with #53130.

@liu-chunmei has also observed the excessive latencies around the cross-core submissions.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@cyx1231st cyx1231st requested a review from a team as a code owner September 20, 2023 06:37
@cyx1231st cyx1231st force-pushed the wip-crimson-osd-fix-complete-exit branch 3 times, most recently from d8300ef to 144e568 Compare September 20, 2023 08:07
@cyx1231st cyx1231st marked this pull request as draft September 26, 2023 02:00
@cyx1231st
Copy link
Member Author

Convert to draft, will submit the changes to address the crimson OSD starvation issue in the same PR.

@cyx1231st cyx1231st force-pushed the wip-crimson-osd-fix-complete-exit branch from 144e568 to dfecd83 Compare September 27, 2023 03:08
@cyx1231st cyx1231st changed the title crimson/osd/osd_operations: cleanups and potential fixes around the pipeline operations crimson/osd: decouple cross-core pg submission out of the OrderedExclusivePhase Sep 27, 2023
@cyx1231st cyx1231st marked this pull request as ready for review September 27, 2023 03:20
@cyx1231st cyx1231st force-pushed the wip-crimson-osd-fix-complete-exit branch from dfecd83 to 3bb2e74 Compare September 27, 2023 05:16
@cyx1231st
Copy link
Member Author

Sorry submitted an outdated branch, repushed.

phase = nullptr;
}
if (phase) {
std::ignore = seastar::smp::submit_to(
Copy link
Member Author

@cyx1231st cyx1231st Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos Probably it's simpler to use OrderedConcurrentPhaseT only in the local reactor, and for the special cross-core case, relying on the newly introduced class crosscore_t .

Unlike the original OrderedConcurrentPhaseT, crosscore_t doesn't rely on any additional submit_to() to preserve the ordering. submit_to() is usually expensive and should be avoided wherever possible.

Copy link
Contributor

@athanatos athanatos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, much cleaner.

@@ -11,13 +13,80 @@

namespace crimson::osd {

/**
* crosscore_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crosscore_ordering_t ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

@cyx1231st cyx1231st force-pushed the wip-crimson-osd-fix-complete-exit branch from 3bb2e74 to be23265 Compare October 9, 2023 07:39
@cyx1231st
Copy link
Member Author

changeset:

  • afbf0a8 rename crosscore_t to crosscore_ordering_t with related readability improvements.

@cyx1231st cyx1231st force-pushed the wip-crimson-osd-fix-complete-exit branch from be23265 to 9587ffd Compare October 10, 2023 06:13
@cyx1231st
Copy link
Member Author

rebased to prepare for the CI build

@liu-chunmei
Copy link
Contributor

jenkins test api

@cyx1231st
Copy link
Member Author

Looks like the ref-counter is somehow broken:

ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.0.0-6684-g9587ffdf/rpm/el9/BUILD/ceph-18.0.0-6684-g9587ffdf/redhat-linux-build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:201: T* boost::intrusive_ptr::operator->() const [with T = MOSDPGUpdateLogMissingReply]: Assertion `px != 0' failed.

Aborting on shard 1.

}, [](std::exception_ptr) {
return seastar::now();
}, pg).finally([this, ref] {
logger().debug("{}: exit", *this);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@athanatos FYI, the ref-counter issue of #53537 (comment) turns out to because that this line is trying to print this op without a valid req (moved away at line 70 above), causing seg fault according to

void LogMissingRequestReply::print(std::ostream& os) const
{
os << "LogMissingRequestReply("
<< "from=" << req->from
<< " req=" << *req
<< ")";
}

@cyx1231st
Copy link
Member Author

changeset:

@cyx1231st
Copy link
Member Author

Another suspious failure from https://pulpito.ceph.com/yingxin-2023-10-18_02:44:02-crimson-rados-wip-yingxin-crimson-osd-crosscore-pg-submission-3-distro-default-smithi/7431249/ :

DEBUG 2023-10-18 03:23:27,258 [shard 2] osd - log is not dirty
=================================================================
==79463==ERROR: AddressSanitizer: heap-use-after-free on address 0x619000191fec at pc 0x55812caa7ba8 bp 0x7fc7f56e90f0 sp 0x7fc7f56e90e0
READ of size 1 at 0x619000191fec thread T2

    #0 0x55812caa7ba7 in seastar::shared_mutex::unlock() (/usr/bin/ceph-osd+0x1edccba7)
    #1 0x55812ee16ffc in auto seastar::futurize_invoke<crimson::OrderedConcurrentPhaseT<crimson::osd::SnapTrimObjSubEvent::WaitRepop>::ExitBarrier<crimson::OrderedConcurrentPhaseT<crimson::osd::SnapTrimObjSubEvent::WaitRepop>::BlockingEvent::Trigger<crimson::osd::SnapTrimObjSubEvent> >::exit()::{lambda()#1}&>(crimson::OrderedConcurrentPhaseT<crimson::osd::SnapTrimObjSubEvent::WaitRepop>::     ExitBarrier<crimson::OrderedConcurrentPhaseT<crimson::osd::SnapTrimObjSubEvent::WaitRepop>::BlockingEvent::Trigger<crimson::osd::SnapTrimObjSubEvent> >::exit()::{lambda()#1}&) (/usr/bin/ceph-osd+0x2113bffc)
    #2 0x55812ee1734f in _ZN7seastar20noncopyable_functionIFNS_6futureIvEEvEE17direct_vtable_forIZNS2_4thenIZN7crimson23OrderedConcurrentPhaseTINS7_3osd19SnapTrimObjSubEvent9WaitRepopEE11ExitBarrierINSC_13BlockingEvent7TriggerISA_EEE4exitEvEUlvE_S2_EET0_OT_EUlDpOT_E_E4callEPKS4_ (/usr/bin/ceph-osd+0x2113c34f)
    #3 0x55812c67f0ba in auto seastar::internal::future_invoke<seastar::noncopyable_function<seastar::future<void> ()>&, seastar::internal::monostate>(seastar::noncopyable_function<seastar::future<void> ()>&, seastar::internal::monostate&&) (/usr/bin/ceph-osd+0x1e9a40ba)
    #4 0x55812c6a4477 in void seastar::futurize<seastar::future<void> >::satisfy_with_result_of<seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<seastar::future<void> ()>, seastar::future<void> >(seastar::noncopyable_function<seastar::future<void> ()>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> ()>&,     seastar::future_state<seastar::internal::monostate>&&)#1}::operator()(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> ()>&, seastar::future_state<seastar::internal::monostate>&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<void>&&, seastar::noncopyable_function<seastar::future<void> ()>&&) (/usr/bin/ceph-osd+0x1e9c9477)
    #5 0x55812c6c283f in seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::noncopyable_function<seastar::future<void> ()>, seastar::future<void>::then_impl_nrvo<seastar::noncopyable_function<seastar::future<void> ()>, seastar::future<void> >(seastar::noncopyable_function<seastar::future<void> ()>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&,      seastar::noncopyable_function<seastar::future<void> ()>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void>::run_and_dispose() (/usr/bin/ceph-osd+0x1e9e783f)
    #6 0x55813a49032d in seastar::reactor::run_tasks(seastar::reactor::task_queue&) (/usr/bin/ceph-osd+0x2c7b532d)
    #7 0x55813a52e148 in seastar::reactor::run_some_tasks() (/usr/bin/ceph-osd+0x2c853148)
    #8 0x55813a751efa in seastar::reactor::do_run() (/usr/bin/ceph-osd+0x2ca76efa)
    #9 0x55813a75582e in seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::{lambda()#3}::operator()() const (/usr/bin/ceph-osd+0x2ca7a82e)
    #10 0x55813a755dda in std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::{lambda()#3}>::_M_invoke(std::_Any_data const&) (/usr/bin/ceph-osd+0x2ca7adda)
    #11 0x55813a245437 in seastar::posix_thread::start_routine(void*) (/usr/bin/ceph-osd+0x2c56a437)
    #12 0x7fc80109f801 in start_thread (/lib64/libc.so.6+0x9f801)
    #13 0x7fc80103f44f in __GI___clone3 (/lib64/libc.so.6+0x3f44f)

0x619000191fec is located 364 bytes inside of 920-byte region [0x619000191e80,0x619000192218)
freed by thread T2 here:
    #0 0x7fc8036b73cf in operator delete(void*, unsigned long) (/lib64/libasan.so.6+0xb73cf)
    #1 0x55812ef80734 in crimson::osd::SnapTrimObjSubEvent::~SnapTrimObjSubEvent() (/usr/bin/ceph-osd+0x212a5734)

previously allocated by thread T2 here:
    #0 0x7fc8036b6367 in operator new(unsigned long) (/lib64/libasan.so.6+0xb6367)
    #1 0x55812ed5331c in auto crimson::osd::PerShardState::start_operation_may_interrupt<crimson::interruptible::interruptor<crimson::osd::IOInterruptCondition>, crimson::osd::SnapTrimObjSubEvent, boost::intrusive_ptr<crimson::osd::PG>&, hobject_t const&, snapid_t const&>(boost::intrusive_ptr<crimson::osd::PG>&, hobject_t const&, snapid_t const&) (/usr/bin/ceph-osd+0x2107831c)

Thread T2 created by T0 here:
    #0 0x7fc8036587d5 in pthread_create (/lib64/libasan.so.6+0x587d5)
    #1 0x55813a455f6b in seastar::posix_thread::posix_thread(seastar::posix_thread::attr, std::function<void ()>) (/usr/bin/ceph-osd+0x2c77af6b)

SUMMARY: AddressSanitizer: heap-use-after-free (/usr/bin/ceph-osd+0x1edccba7) in seastar::shared_mutex::unlock()
Shadow bytes around the buggy address:
  0x0c328002a3a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa
  0x0c328002a3b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c328002a3c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c328002a3d0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c328002a3e0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c328002a3f0: fd fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd
  0x0c328002a400: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c328002a410: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c328002a420: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c328002a430: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c328002a440: fd fd fd fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==79463==ABORTING
daemon-helper: command failed with exit status 1

@cyx1231st cyx1231st force-pushed the wip-crimson-osd-fix-complete-exit branch from dd37ce5 to 80c49a7 Compare October 25, 2023 02:18
@cyx1231st
Copy link
Member Author

changeset: 80c49a7

…rI::cancel()

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
…ross-core

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
…e() and exit()

complete() should be called to leave the last phase in the normal path,
and exit() to be called in finally() to release the resources under all
circumstances.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
To address warning:
/root/ceph/src/crimson/osd/osd_connection_priv.h:89:27: warning:
‘crimson::osd::OSDConnectionPriv&
crimson::osd::get_osd_priv(crimson::net::Connection*)’ defined but not
used [-Wunused-function]
   89 | static OSDConnectionPriv &get_osd_priv(crimson::net::Connection
      *conn) { |                           ^~~~~~~~~~~~

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Split the cross-core phase into 2 independent core-local phases, and
preserve the ordering using sequential ID instead.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
They are supposed to be used cross-core.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
… req

So that print can always deal with a valid req.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
@cyx1231st cyx1231st force-pushed the wip-crimson-osd-fix-complete-exit branch from 80c49a7 to 0fd2e75 Compare November 2, 2023 07:29
@cyx1231st
Copy link
Member Author

Rebased to prepare for another round of tests.

@cyx1231st
Copy link
Member Author

@cyx1231st
Copy link
Member Author

jenkins retest this please

@Matan-B
Copy link
Contributor

Matan-B commented Nov 6, 2023

jenkins test make check

@cyx1231st
Copy link
Member Author

jenkins test api

@cyx1231st cyx1231st merged commit 543a62f into ceph:main Nov 7, 2023
10 of 11 checks passed
@cyx1231st cyx1231st deleted the wip-crimson-osd-fix-complete-exit branch November 7, 2023 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants