-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crimson/osd/pg: submit_error_log send messages to osd by order #52192
Conversation
26a64c4
to
e35523d
Compare
e35523d
to
25208c9
Compare
@cyx1231st's comment from a different PR may relate here:
|
DNM, as this is not resolved yet.
|
This won't cause any issues unless the sending is involving cross-core. If no cross-core, send will enqueue the message inside the connection immediately . This might be the reason we don't see any real impacts yet. |
Use chained futurized `send_to_osd()` instead of voided `send_cluster_message()`. Fixes: https://tracker.ceph.com/issues/61651 Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Payload is already decoded in IOHandler::read_message (decode_message). Signed-off-by: Matan Breizman <mbreizma@redhat.com>
25208c9
to
15042aa
Compare
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
15042aa
to
7c92c86
Compare
See ceph/src/crimson/net/Connection.h Lines 79 to 87 in fa09e73
Currently the ordering is guaranteed for msg_a and msg_b if and only if the user logic can be reduced to: return conn->send(msg_a
).then([conn] {
return conn->send(msg_b);
}); The other cases can cause reordering, even if the send()'s future is not discarded and the sender sends msg_a before msg_b, such as: OSD::ms_dispatch(conn, msg_a) {
gate->dispatch_in_background([conn, msg_a] {
return conn->send(msg_a);
});
});
OSD::ms_dispatch(conn, msg_b) {
gate->dispatch_in_background([conn, msg_b] {
return conn->send(msg_b);
});
} If guaranteeing [A] is challenging from OSD perspective, we might need to support the send ordering without requiring to chain the send()'s future, so that [B] also works. cc @athanatos |
@cyx1231st, @rzarzynski
|
.then([this] { | ||
return shard_services.send_to_osd(peer.osd, | ||
std::move(log_m), | ||
get_osdmap_epoch()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send_to_osd()
is now chained inside CommonPGPipeline::process
which is an OrderedExclusivePhase
. IIUC this should be able to serialize message sending across phases (i.e. #52192 (comment) case [A]).
So, is it guaranteed that different tids must be disjoint to enter CommonPGPipeline::process
?
The correct serialization can be verified by adding another log (as finish-send-osd) at L968 below after send_to_osd()
to see if sends across tids are really serialized, so that:
- start-send-osd rep_tid=72057594037928058
- finish-send-osd rep_tid=72057594037928058
- start-send-osd rep_tid=72057594037928059
- finish-send-osd rep_tid=72057594037928059
If the above order is correct when the reordering happens, at least we can be sure that #52192 (comment) is unrelated to this issue as [A] is already enforced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send_to_osd()
is now chained insideCommonPGPipeline::process
which is anOrderedExclusivePhase
. IIUC this should be able to serialize message sending across phases (i.e. #52192 (comment) case [A]).So, is it guaranteed that different tids must be disjoint to enter
CommonPGPipeline::process
?
This should be guaranteed. However, it seems like this is not true since we have racing senders.
The correct serialization can be verified by adding another log (as finish-send-osd) at L968 below after
send_to_osd()
to see if sends across tids are really serialized, so that:
...
If the above order is correct when the reordering happens, at least we can be sure that #52192 (comment) is unrelated to this issue as [A] is already enforced.
After adding (start/finish-send) it can be seen that the re-ordering is happening in the sender side. I will verify that [A] is actually enforced.
See: https://gist.github.com/Matan-B/b28ae9fe2ce43efe58f8ce4555c28ead?permalink_comment_id=4623285#gistcomment-4623285
Yeah, that's how I'd expect the interface to work. The change in this PR to chain the send_to_osd future within the Pipeline stage should suffice to ensure the above condition. I take it we're still seeing the same issue even with that fix? |
Seems @Matan-B still see this issue after 436706c, right? Inside messenger,
They are not logically changed with the refactorings of the multi-core msgr. |
Yes. There might be another missing piece here since there are racing senders. still WIP. |
Yes, although less frequent - job 7331433 |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
MOSDPGUpdateLogMissing
messages sent to the osd were handled in wrong order rather than the order in which they were sent.See how LogMissingRequest's
entries 69'6
is handled beforeentries 69'4
.Fixes: https://tracker.ceph.com/issues/61651
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows