Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: msg/async: do not trigger RESETSESSION from connect fault during connection phase #30672

Merged
merged 1 commit into from Oct 15, 2019

Conversation

smithfarm
Copy link
Contributor

@smithfarm smithfarm added this to the mimic milestone Oct 1, 2019
@smithfarm smithfarm added the messenger Issues involving one of the Ceph messenger implementations label Oct 1, 2019
@smithfarm
Copy link
Contributor Author

Luminous is perilously close to crossing over the line into EOL land, if it has not crossed it already. But mimic is still alive and well.

@smithfarm smithfarm force-pushed the wip-37520-mimic branch 3 times, most recently from fa39d92 to 65b6864 Compare October 1, 2019 15:44
…ection phase

Previously, if we got a connection fault during the connect/connect_reply
phase, we would increment connect_seq on the client side and trigger a
RESETSESSION on the server side (because there was not yet an existing
connection to replace).  This led to dropped messages, usually in the
form of stuck peering in the rados/thrash suite.

The problem is that the condition for 'reconnect' vs 'backoff' inherited
the test from SimpleMessenger, which had only a STATE_CONNECTING.  In
contract, AsyncMessenger also as CONNECTING_WAIT_BANNER_AND_IDENTIFY and
CONNECTING_SEND_CONNECT_MSG, and if we were in these states, we would
increment connect_seq instead of backing off and retrying (without an
increment).

Fix by adjusting the condition to match the range of CONNECTING states
in asyncmessenger.

Fixes: http://tracker.ceph.com/issues/36612
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 8346e39)

Conflicts:
	src/msg/async/Protocol.cc
- file does not exist in mimic: applied changes manually in src/msg/async/AsyncConnection.cc
- mimic uses different states and does not have "connection" abstraction
- mimic needs to pass "async_msgr->cct" to ldout instead of just cct
@yuriw
Copy link
Contributor

yuriw commented Oct 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
messenger Issues involving one of the Ceph messenger implementations mimic-batch-1 needs-qa wip-yuri4-testing
Projects
None yet
3 participants