Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msgr/simple: set Pipe::out_seq to in_seq of the connecting side #21585

Merged
merged 1 commit into from Apr 26, 2018

Conversation

xxhdx1985126
Copy link
Contributor

If the current pipe has just replaced a newly created "existing pipe",
its out_q would be empty, in which case out_seq cannot be move up to
in_seq of the connecting side and later requests forwarded to leader
wouldn't be acked.

Fixes: http://tracker.ceph.com/issues/23807
Signed-off-by: Xuehan Xu xuxuehan@360.cn

@tchaikov tchaikov changed the title Mon: set Pipe::out_seq to in_seq of the connecting side msgr/simple: set Pipe::out_seq to in_seq of the connecting side Apr 22, 2018
@@ -1424,6 +1424,8 @@ void Pipe::discard_requeued_up_to(uint64_t seq)
rq.pop_front();
out_seq++;
}

out_seq = seq;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if out_q is empty, this method would be a no-op, as it returns at https://github.com/ceph/ceph/pull/21585/files#diff-459b1eb43a0f5b6b6b430e3c67fee6ceR1414 . so, in which case, your change kicks in?

@xxhdx1985126
Copy link
Contributor Author

@tchaikov Sorry, I uploaded the wrong code, just corrected that.

Copy link
Member

@liewegas liewegas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks right to me! Did you check to see if msg/async has the same bug?

@liewegas liewegas added this to the mimic milestone Apr 23, 2018
@yuriw
Copy link
Contributor

yuriw commented Apr 23, 2018

@gregsfortytwo
Copy link
Member

Hmm, blindly setting seq values makes me nervous. Especially in the case where we’re not looking at the queue at all — that must be a special case for a reason?

@xxhdx1985126
Copy link
Contributor Author

@gregsfortytwo
Hi, greg.
I thought that the main purpose of the variables "out_seq", "in_seq", "in_seq_acked" is to serve as the seq number of the message transmission to make sure that messages are reliably delivered, and it seems that they don't have direct relation with the out_q; and on the other hand, we encountered a problem caused by the inconsistency of the out_seq and in_seq on both sides of the transmission as demonstrated in http://tracker.ceph.com/issues/23807, so we chose to set out_seq to the in_seq of the other side. We think that, as long as the purpose of those seqs is what we thought, this modification wouldn't cause anything bad. Is this right? Thanks:-)

@xxhdx1985126
Copy link
Contributor Author

@yuriw I still can't see the test result, could you paste it here?

If the current pipe has just replaced a newly created "existing pipe",
its out_q would be empty, in which case out_seq cannot be move up to
in_seq of the connecting side and later requests forwarded to leader
wouldn't be acked.

Fixes: http://tracker.ceph.com/issues/23807
Signed-off-by: Xuehan Xu <xuxuehan@360.cn>
@xxhdx1985126
Copy link
Contributor Author

xxhdx1985126 commented Apr 24, 2018

@gregsfortytwo
Hi, greg. I got what you mean, out_seq and in_seq also identify whether a message has been received. So I changed the modification to only set out_seq when out_q is empty, since our problem only happens when the Pipe or AsyncConnection is newly created and if out_q is not empty, the pipe or asyncconnection mustn't be newly created and the out_seq could have been decreased by "requeue_sent".

@tchaikov
Copy link
Contributor

the Pipe or AsyncConnection is newly created and if out_q is not empty, the pipe or asyncconnection mustn't be newly created and the out_seq could have been decreased by "requeue_sent".

i am confused. is the connection a new one or not in your case?

@xxhdx1985126
Copy link
Contributor Author

xxhdx1985126 commented Apr 24, 2018

@tchaikov Um, the error happens this way: the accepting side is just about to connect to the other side and has just created a new pipe, and at this moment the other side establishes a connect, so upon receiving that connect message, the accepting side will have a newly created existing pipe which is all empty. And if the connecting side is an old long-existed pipe, in which the in_seq and in_seq_acked is not zero, then after the connect phase, the out_seq on the accepting side would be inconsistent with the in_seq and in_seq_acked of the connecting side, which, before the accepting side increases to large enough, would make the connecting side not ack any of the messages sent from the accepting side which it should be. In our case, the error led to the accepting side's mon_daemon_bytes filled up, since the connecting side's in_seq_acked is very large and so many messages sent from peon monitor is not acked.

@xxhdx1985126
Copy link
Contributor Author

retest this please

@yuriw yuriw merged commit 8c0f2b6 into ceph:master Apr 26, 2018
@xxhdx1985126 xxhdx1985126 deleted the wip-23807 branch April 27, 2018 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants