Skip to content

Remove inFlightEcho entry on ECHO_REQ failure#4863

Open
grom358 wants to merge 1 commit into
apache:trunkfrom
instaclustr:CASSANDRA-21428-trunk
Open

Remove inFlightEcho entry on ECHO_REQ failure#4863
grom358 wants to merge 1 commit into
apache:trunkfrom
instaclustr:CASSANDRA-21428-trunk

Conversation

@grom358
Copy link
Copy Markdown
Contributor

@grom358 grom358 commented Jun 4, 2026

In Gossiper, echoHandler only implements onResponse. RequestCallback.onFailure has a default no-op, so when the ECHO_REQ times out or the remote node returns an error, inflightEcho.remove(addr) is never called. The stale entry persists. Any subsequent markAlive(addr, localState) call — where localState is the same in-place-mutated object already in inflightEcho — sees localState.equals(prevState) = true (identity equality, same reference) and skips indefinitely. In a temporary-partition scenario (node briefly unreachable, echo times out, node recovers with the same generation), the node can get stuck permanently dead: the failure detector sees it as alive and keeps triggering markAlive, but every invocation is suppressed by the stale entry. The stale entry is only cleared by removeEndpoint() (explicit removal) or silentlyMarkDead() via markDead() (failure detector conviction) — neither of which fires if the failure detector is reporting the node as healthy.

Fix: override onFailure in echoHandler to call inflightEcho.remove(addr).

}

@Override
public void onFailure(InetAddressAndPort from, RequestFailureReason failureReason)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be RequestFailure?

@smiklosovic smiklosovic force-pushed the CASSANDRA-21428-trunk branch from 2f73906 to cc2fb8c Compare June 6, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants