New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msgr/simple: set Pipe::out_seq to in_seq of the connecting side #21585
Conversation
src/msg/simple/Pipe.cc
Outdated
@@ -1424,6 +1424,8 @@ void Pipe::discard_requeued_up_to(uint64_t seq) | |||
rq.pop_front(); | |||
out_seq++; | |||
} | |||
|
|||
out_seq = seq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if out_q is empty, this method would be a no-op, as it returns at https://github.com/ceph/ceph/pull/21585/files#diff-459b1eb43a0f5b6b6b430e3c67fee6ceR1414 . so, in which case, your change kicks in?
@tchaikov Sorry, I uploaded the wrong code, just corrected that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks right to me! Did you check to see if msg/async has the same bug?
Hmm, blindly setting seq values makes me nervous. Especially in the case where we’re not looking at the queue at all — that must be a special case for a reason? |
@gregsfortytwo |
@yuriw I still can't see the test result, could you paste it here? |
If the current pipe has just replaced a newly created "existing pipe", its out_q would be empty, in which case out_seq cannot be move up to in_seq of the connecting side and later requests forwarded to leader wouldn't be acked. Fixes: http://tracker.ceph.com/issues/23807 Signed-off-by: Xuehan Xu <xuxuehan@360.cn>
@gregsfortytwo |
i am confused. is the connection a new one or not in your case? |
@tchaikov Um, the error happens this way: the accepting side is just about to connect to the other side and has just created a new pipe, and at this moment the other side establishes a connect, so upon receiving that connect message, the accepting side will have a newly created existing pipe which is all empty. And if the connecting side is an old long-existed pipe, in which the in_seq and in_seq_acked is not zero, then after the connect phase, the out_seq on the accepting side would be inconsistent with the in_seq and in_seq_acked of the connecting side, which, before the accepting side increases to large enough, would make the connecting side not ack any of the messages sent from the accepting side which it should be. In our case, the error led to the accepting side's mon_daemon_bytes filled up, since the connecting side's in_seq_acked is very large and so many messages sent from peon monitor is not acked. |
retest this please |
If the current pipe has just replaced a newly created "existing pipe",
its out_q would be empty, in which case out_seq cannot be move up to
in_seq of the connecting side and later requests forwarded to leader
wouldn't be acked.
Fixes: http://tracker.ceph.com/issues/23807
Signed-off-by: Xuehan Xu xuxuehan@360.cn