-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crimson/osd: decouple cross-core pg submission out of the OrderedExclusivePhase #53537
crimson/osd: decouple cross-core pg submission out of the OrderedExclusivePhase #53537
Conversation
d8300ef
to
144e568
Compare
Convert to draft, will submit the changes to address the crimson OSD starvation issue in the same PR. |
144e568
to
dfecd83
Compare
dfecd83
to
3bb2e74
Compare
Sorry submitted an outdated branch, repushed. |
phase = nullptr; | ||
} | ||
if (phase) { | ||
std::ignore = seastar::smp::submit_to( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athanatos Probably it's simpler to use OrderedConcurrentPhaseT
only in the local reactor, and for the special cross-core case, relying on the newly introduced class crosscore_t
.
Unlike the original OrderedConcurrentPhaseT
, crosscore_t
doesn't rely on any additional submit_to()
to preserve the ordering. submit_to()
is usually expensive and should be avoided wherever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, much cleaner.
@@ -11,13 +13,80 @@ | |||
|
|||
namespace crimson::osd { | |||
|
|||
/** | |||
* crosscore_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crosscore_ordering_t ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
3bb2e74
to
be23265
Compare
changeset:
|
be23265
to
9587ffd
Compare
rebased to prepare for the CI build |
jenkins test api |
Looks like the ref-counter is somehow broken: ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.0.0-6684-g9587ffdf/rpm/el9/BUILD/ceph-18.0.0-6684-g9587ffdf/redhat-linux-build/boost/include/boost/smart_ptr/intrusive_ptr.hpp:201: T* boost::intrusive_ptr::operator->() const [with T = MOSDPGUpdateLogMissingReply]: Assertion `px != 0' failed. Aborting on shard 1. |
}, [](std::exception_ptr) { | ||
return seastar::now(); | ||
}, pg).finally([this, ref] { | ||
logger().debug("{}: exit", *this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athanatos FYI, the ref-counter issue of #53537 (comment) turns out to because that this line is trying to print this
op without a valid req (moved away at line 70 above), causing seg fault according to
ceph/src/crimson/osd/osd_operations/logmissing_request_reply.cc
Lines 28 to 34 in 9587ffd
void LogMissingRequestReply::print(std::ostream& os) const | |
{ | |
os << "LogMissingRequestReply(" | |
<< "from=" << req->from | |
<< " req=" << *req | |
<< ")"; | |
} |
changeset:
|
Another suspious failure from https://pulpito.ceph.com/yingxin-2023-10-18_02:44:02-crimson-rados-wip-yingxin-crimson-osd-crosscore-pg-submission-3-distro-default-smithi/7431249/ :
|
dd37ce5
to
80c49a7
Compare
changeset: 80c49a7 |
…rI::cancel() Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
…ross-core Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
…e() and exit() complete() should be called to leave the last phase in the normal path, and exit() to be called in finally() to release the resources under all circumstances. Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
To address warning: /root/ceph/src/crimson/osd/osd_connection_priv.h:89:27: warning: ‘crimson::osd::OSDConnectionPriv& crimson::osd::get_osd_priv(crimson::net::Connection*)’ defined but not used [-Wunused-function] 89 | static OSDConnectionPriv &get_osd_priv(crimson::net::Connection *conn) { | ^~~~~~~~~~~~ Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Split the cross-core phase into 2 independent core-local phases, and preserve the ordering using sequential ID instead. Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
They are supposed to be used cross-core. Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
… req So that print can always deal with a valid req. Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
80c49a7
to
0fd2e75
Compare
Rebased to prepare for another round of tests. |
build: https://github.com/ceph/ceph-ci/tree/wip-yingxin-crimson-osd-crosscore-pg-submission-5 issues:
|
jenkins retest this please |
jenkins test make check |
jenkins test api |
The first 7 commits are cleanups and potential fixes to the pipelining infrastructure:
PipelineExitBarrierI
.PipelineHandle::complete()
andexit()
.The last 2 commits are to decouple cross-core pg submission out of the OrderedExclusivePhase, by:
The rough evaluation shows at most 235% end performance improvements at 8 cores with cyanstore and 1 OSD:
![iops-ref-opt](https://private-user-images.githubusercontent.com/7736006/270851863-26ddb302-1d83-4401-b950-206002e1e42c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgwMzE4OTAsIm5iZiI6MTcxODAzMTU5MCwicGF0aCI6Ii83NzM2MDA2LzI3MDg1MTg2My0yNmRkYjMwMi0xZDgzLTQ0MDEtYjk1MC0yMDYwMDJlMWU0MmMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYxMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MTBUMTQ1OTUwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ODA2YWIzYmVmZjFkOWMzM2Q5ODBkM2QwYmMxY2UyMTQyNmNkOWY1NDAyNzYwNGE3OWQ4NGY5YmRkYTdiN2E3NSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.vN_98Hx1-ZA-TwB00-5Ci2Oo3R6rMGW7XEGc1t31k9g)
Note that the OPT case was evaluated with #53130.
@liu-chunmei has also observed the excessive latencies around the cross-core submissions.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows