Skip to content

Conversation

@Vladsz83
Copy link
Contributor

@Vladsz83 Vladsz83 commented Sep 9, 2021

If node looses outgoing connections, it can decide it is alone in the cluster and won't fail. Happens on small clusters where failed node attempts to connect to every other node before connRecoveryTimeout expires.

Consider:

The cluster n1 -> n2 -> n3 -> n4 -> n1
n4 looses all outgoing connections.
n3 keeps successful ping to n4.
n4 attempts to connect to n1, n2, n3. Fails with each due to outgoing network failure.
spi.connrecoveryTimeout is not reached. n4 decides it is alone and continues working.
n3 still sends messages to n4. n4 does not lack incoming connections.
ring is actually broken because of n4. n3 cannot determine failure of n4.
Solution: node could watch its incoming traffic which notyfies of the incoming network. If all the outgoing connections are lost but messages are received, node must left the grid to prevent ring break.

@Vladsz83
Copy link
Contributor Author

Won't be fixed.

@Vladsz83 Vladsz83 closed this Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant