Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crimson/osd/pg: submit_error_log send messages to osd by order #52192

Closed
wants to merge 4 commits into from

Conversation

Matan-B
Copy link
Contributor

@Matan-B Matan-B commented Jun 26, 2023

MOSDPGUpdateLogMissing messages sent to the osd were handled in wrong order rather than the order in which they were sent.

See how LogMissingRequest's entries 69'6 is handled before entries 69'4.

DEBUG 2023-06-27 11:30:21,504 [shard 0] osd - logmissing_request(id=5885, detail=LogMissingRequest(from=255 req=pg_update_log_missing(10.1a epoch 69/67 rep_tid 144115188075855936 entries 69'6 (0'0) error    10:5cea7035:::rbd_info:head by client.4188.0:2806 0.000000 -17 ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0 trim_to 0'0 roll_forward_to 0'0) v3)): starting start_pg_operation
DEBUG 2023-06-27 11:30:21,506 [shard 0] osd - logmissing_request(id=5886, detail=LogMissingRequest(from=255 req=pg_update_log_missing(10.1a epoch 69/67 rep_tid 144115188075855935 entries 69'4 (0'0) error    10:5cea7035:::rbd_info:head by client.4188.0:2805 0.000000 -17 ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0 trim_to 0'0 roll_forward_to 0'0) v3)): starting start_pg_operation

Fixes: https://tracker.ceph.com/issues/61651

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@Matan-B Matan-B force-pushed the wip-matanb-crimson-61651 branch 2 times, most recently from 26a64c4 to e35523d Compare June 29, 2023 08:34
@Matan-B Matan-B changed the title crimson/osd_operations/logmissing_request: Add to Peering PG Pipeline crimson/osd/pg: submit_error_log send messages to osd by order Jun 29, 2023
@Matan-B Matan-B marked this pull request as ready for review June 29, 2023 08:39
@Matan-B Matan-B requested a review from a team as a code owner June 29, 2023 08:39
@Matan-B
Copy link
Contributor Author

Matan-B commented Jun 29, 2023

@cyx1231st's comment from a different PR may relate here:

I'm not sure whether there is any assumption about the send message ordering in this case, but it worths noting that cross-core calls cannot guarantee FIFO in the target core. With the current implementation, the future returned by Connection::send() should be chained correctly before the next message.

Also see #51147 (comment)

@Matan-B Matan-B requested a review from cyx1231st June 29, 2023 11:47
@Matan-B
Copy link
Contributor Author

Matan-B commented Jun 29, 2023

DNM, as this is not resolved yet.
https://pulpito.ceph.com/matan-2023-06-29_10:59:39-crimson-rados-wip-matanb-crimson-61651-distro-crimson-smithi/7321209

DEBUG 2023-06-29 11:36:16,202 [shard 0] osd - logmissing_request(id=39134, detail=LogMissingRequest(from=255 req=pg_update_log_missing(19.4 epoch 154/151 rep_tid 72057594037928191 entries 154'3 (0'0) error    19:2062e8f2:::rb.0.1106.5aa4f408.00000000000b:head by client.4358.0:276 0.000000 -2 ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0 trim_to 0'0 roll_forward_to 154'1) v3)): starting start_pg_operation
DEBUG 2023-06-29 11:36:16,205 [shard 0] osd - logmissing_request(id=39136, detail=LogMissingRequest(from=255 req=pg_update_log_missing(19.4 epoch 154/151 rep_tid 72057594037928190 entries 154'2 (0'0) error    19:20a48d33:::rb.0.1106.5aa4f408.000000000004:head by client.4358.0:269 0.000000 -2 ObjectCleanRegions clean_offsets: [0~18446744073709551615], clean_omap: 1, new_object: 0 trim_to 0'0 roll_forward_to 154'1) v3)): starting start_pg_operation

@Matan-B Matan-B added the DNM label Jun 29, 2023
@cyx1231st
Copy link
Member

@cyx1231st's comment from a different PR may relate here:

This won't cause any issues unless the sending is involving cross-core. If no cross-core, send will enqueue the message inside the connection immediately .

This might be the reason we don't see any real impacts yet.

Use chained futurized `send_to_osd()` instead of voided `send_cluster_message()`.

Fixes: https://tracker.ceph.com/issues/61651

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Payload is already decoded in IOHandler::read_message (decode_message).

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
@Matan-B
Copy link
Contributor Author

Matan-B commented Jul 4, 2023

@cyx1231st
Copy link
Member

cyx1231st commented Jul 5, 2023

See

/**
* send
*
* Send a message over a connection that has completed its handshake.
*
* May be invoked from any core, but that requires to chain the returned
* future to preserve ordering.
*/
virtual seastar::future<> send(MessageURef msg) = 0;

Currently the ordering is guaranteed for msg_a and msg_b if and only if the user logic can be reduced to:
[A]

return conn->send(msg_a
).then([conn] {
  return conn->send(msg_b);
});

The other cases can cause reordering, even if the send()'s future is not discarded and the sender sends msg_a before msg_b, such as:
[B] msg_a is dispatched before msg_b

OSD::ms_dispatch(conn, msg_a) {
  gate->dispatch_in_background([conn, msg_a] {
    return conn->send(msg_a);
  });
});
OSD::ms_dispatch(conn, msg_b) {
  gate->dispatch_in_background([conn, msg_b] {
    return conn->send(msg_b);
  });
}

If guaranteeing [A] is challenging from OSD perspective, we might need to support the send ordering without requiring to chain the send()'s future, so that [B] also works.

cc @athanatos

@Matan-B
Copy link
Contributor Author

Matan-B commented Jul 5, 2023

@cyx1231st, @rzarzynski
The issue seems to be caused by the multicore enablement as suspected.
It's confirmed by running 15 rbd_api_tests (which almost constantly fails) with seastar smp 1:
https://pulpito.ceph.com/matan-2023-07-05_15:20:07-crimson-rados-main-distro-crimson-smithi/
Matan-B@5ff255f

seastar smp 3 run (main) for comparison:
https://pulpito.ceph.com/matan-2023-07-05_17:42:09-crimson-rados-main-distro-crimson-smithi/

.then([this] {
return shard_services.send_to_osd(peer.osd,
std::move(log_m),
get_osdmap_epoch());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_to_osd() is now chained inside CommonPGPipeline::process which is an OrderedExclusivePhase. IIUC this should be able to serialize message sending across phases (i.e. #52192 (comment) case [A]).

So, is it guaranteed that different tids must be disjoint to enter CommonPGPipeline::process?

The correct serialization can be verified by adding another log (as finish-send-osd) at L968 below after send_to_osd() to see if sends across tids are really serialized, so that:

  • start-send-osd rep_tid=72057594037928058
  • finish-send-osd rep_tid=72057594037928058
  • start-send-osd rep_tid=72057594037928059
  • finish-send-osd rep_tid=72057594037928059

If the above order is correct when the reordering happens, at least we can be sure that #52192 (comment) is unrelated to this issue as [A] is already enforced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_to_osd() is now chained inside CommonPGPipeline::process which is an OrderedExclusivePhase. IIUC this should be able to serialize message sending across phases (i.e. #52192 (comment) case [A]).

So, is it guaranteed that different tids must be disjoint to enter CommonPGPipeline::process?

This should be guaranteed. However, it seems like this is not true since we have racing senders.

The correct serialization can be verified by adding another log (as finish-send-osd) at L968 below after send_to_osd() to see if sends across tids are really serialized, so that:
...
If the above order is correct when the reordering happens, at least we can be sure that #52192 (comment) is unrelated to this issue as [A] is already enforced.

After adding (start/finish-send) it can be seen that the re-ordering is happening in the sender side. I will verify that [A] is actually enforced.
See: https://gist.github.com/Matan-B/b28ae9fe2ce43efe58f8ce4555c28ead?permalink_comment_id=4623285#gistcomment-4623285

@athanatos
Copy link
Contributor

See

/**
* send
*
* Send a message over a connection that has completed its handshake.
*
* May be invoked from any core, but that requires to chain the returned
* future to preserve ordering.
*/
virtual seastar::future<> send(MessageURef msg) = 0;

Currently the ordering is guaranteed for msg_a and msg_b if and only if the user logic can be reduced to: [A]

return conn->send(msg_a
).then([conn] {
  return conn->send(msg_b);
});

The other cases can cause reordering, even if the send()'s future is not discarded and the sender sends msg_a before msg_b, such as: [B] msg_a is dispatched before msg_b

OSD::ms_dispatch(conn, msg_a) {
  gate->dispatch_in_background([conn, msg_a] {
    return conn->send(msg_a);
  });
});
OSD::ms_dispatch(conn, msg_b) {
  gate->dispatch_in_background([conn, msg_b] {
    return conn->send(msg_b);
  });
}

If guaranteeing [A] is challenging from OSD perspective, we might need to support the send ordering without requiring to chain the send()'s future, so that [B] also works.

cc @athanatos

Yeah, that's how I'd expect the interface to work. The change in this PR to chain the send_to_osd future within the Pipeline stage should suffice to ensure the above condition. I take it we're still seeing the same issue even with that fix?

@cyx1231st
Copy link
Member

I take it we're still seeing the same issue even with that fix?

Seems @Matan-B still see this issue after 436706c, right?

Inside messenger,

  • for the send path, the message will be enqueued FIFO once the connection core is reached, at
    assert(seastar::this_shard_id() == get_shard_id());
    logger().trace("{} do_send() got message -- {}", conn, *msg);
    if (get_io_state() != io_state_t::drop) {
    out_pending_msgs.push_back(std::move(msg));
  • for ms_dispatch() path, the message will be dispatched to OSD::ms_dispatch() synchronously at
    auto msg_ref = MessageRef{message, false};
    assert(ctx.get_io_state() == io_state_t::open);
    assert(get_io_state() == io_state_t::open);
    ceph_assert_always(conn_ref);
    // throttle the reading process by the returned future
    return dispatchers.ms_dispatch(conn_ref, std::move(msg_ref));

They are not logically changed with the refactorings of the multi-core msgr.

@Matan-B
Copy link
Contributor Author

Matan-B commented Jul 9, 2023

Yeah, that's how I'd expect the interface to work. The change in this PR to chain the send_to_osd future within the Pipeline stage should suffice to ensure the above condition. I take it we're still seeing the same issue even with that fix?

Yes. There might be another missing piece here since there are racing senders. still WIP.
#52192 (comment)

@Matan-B
Copy link
Contributor Author

Matan-B commented Jul 9, 2023

I take it we're still seeing the same issue even with that fix?

Seems @Matan-B still see this issue after 436706c, right?

Yes, although less frequent - job 7331433
https://pulpito.ceph.com/matan-2023-07-09_07:24:53-crimson-rados-wip-matanb-crimson-only-61651-distro-crimson-smithi/
https://github.com/ceph/ceph-ci/commits/wip-matanb-crimson-only-61651

Copy link

github-actions bot commented Nov 1, 2023

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Nov 1, 2023
@Matan-B Matan-B closed this Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants