Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luminous: msg/async: do not bump connect_seq for fault during ACCEPTING_SESSION #29788

Closed
wants to merge 1 commit into from
Closed

Conversation

chenerqi
Copy link

@chenerqi chenerqi commented Aug 21, 2019

backport tracker issue: https://tracker.ceph.com/issues/42318


partial backport (1 commit) of: #24546

parent tracker: https://tracker.ceph.com/issues/42316

@neha-ojha neha-ojha added this to the luminous milestone Aug 21, 2019
@neha-ojha
Copy link
Member

Is there a tracker issue for this backport?

@chenerqi
Copy link
Author

chenerqi commented Aug 22, 2019

Is there a tracker issue for this backport?

No, I met the issue in my ceph cluster, it needs fix for ACCEPTING phase fault, this part of code has been rewritten in which Sage fixed the problem in msg/async/ProtocolV2 with commit b2b1234
I didn't find track issue for Sage's commit.

There is a similar track issue for CONNECTING phase fault, Sage fixed it in msg/async/ProtocolV1
#25343
https://tracker.ceph.com/issues/36612
36612 links to luminous backport issue 37521, but I checked luminous branch has no this issue, so this track issue can be closed since nothing to backport to luminous for CONNECTING phase fault.
https://tracker.ceph.com/issues/37521

@smithfarm smithfarm added the messenger Issues involving one of the Ceph messenger implementations label Oct 1, 2019
@smithfarm
Copy link
Contributor

@chenerqi The corresponding master PR for this backport is #24546

It was merged for nautilus. That means: if it is going to be backported, there needs to be a tracker issue. The tracker issue should be created with the following fields:

Status: Pending Backport
Backport: mimic, luminous
Pull request ID: 24546

(In the description just briefly explain the bug that #24546 is fixing)

Once that tracker is created, we can proceed. Adding DNM until then.

@smithfarm smithfarm added the DNM label Oct 11, 2019
@chenerqi
Copy link
Author

@chenerqi The corresponding master PR for this backport is #24546

It was merged for nautilus. That means: if it is going to be backported, there needs to be a tracker issue. The tracker issue should be created with the following fields:

Status: Pending Backport
Backport: mimic, luminous
Pull request ID: 24546

(In the description just briefly explain the bug that #24546 is fixing)

Once that tracker is created, we can proceed. Adding DNM until then.

Tracker issue 42316 created.
https://tracker.ceph.com/issues/42316

@smithfarm
Copy link
Contributor

@chenerqi Thanks. There's one more thing left to do: fix the commit message. The correct commit message should be:

msg/async/ProtocolV2: do not bump connect_seq for fault during ACCEPTING_SESSION

If we have a connection race, and we lose, we may end up with outgoing
messages *and* be in ACCEPTING_SESSION.  If we then fault, we want to
leave connect_seq at 0 to avoid triggering a reset.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b2b123435f32338826afc4a662e64c29f1262f68)

Conflicts:
     src/msg/async/ProtocolV2.cc
- file does not exist in luminous. Made change manually in the file src/msg/async/AsyncConnection.cc

(in other words, please run git commit --amend and replace the existing commit message with the text shown above)

Copy link
Contributor

@smithfarm smithfarm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message is wrong

@smithfarm smithfarm requested a review from rjfd October 15, 2019 08:11
@smithfarm
Copy link
Contributor

@rjfd Does this look right to you for luminous?

@rjfd rjfd requested a review from liewegas October 15, 2019 08:30
ACCEPTING_SESSION

If we have a connection race, and we lose, we may end up with outgoing
messages *and* be in ACCEPTING_SESSION.  If we then fault, we want to
leave connect_seq at 0 to avoid triggering a reset.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b2b1234)

Conflicts:
     src/msg/async/ProtocolV2.cc
- file does not exist in luminous. Made change manually in the file
  src/msg/async/AsyncConnection.cc
@chenerqi
Copy link
Author

@chenerqi Thanks. There's one more thing left to do: fix the commit message. The correct commit message should be:

msg/async/ProtocolV2: do not bump connect_seq for fault during ACCEPTING_SESSION

If we have a connection race, and we lose, we may end up with outgoing
messages *and* be in ACCEPTING_SESSION.  If we then fault, we want to
leave connect_seq at 0 to avoid triggering a reset.

Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit b2b123435f32338826afc4a662e64c29f1262f68)

Conflicts:
     src/msg/async/ProtocolV2.cc
- file does not exist in luminous. Made change manually in the file src/msg/async/AsyncConnection.cc

(in other words, please run git commit --amend and replace the existing commit message with the text shown above)

Updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
messenger Issues involving one of the Ceph messenger implementations
Projects
None yet
3 participants