[4.0.x] Added missing destination check before processing message batch #515
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We found a bug with message batching on the receiver side.
The bug leads to the new node silently ignoring messages from other nodes so it can not completely join the cluster.
Steps to reproduce:
node 1 has physical address ip1 and logical address A
node 2 has physical address ip2 and logical address B
These messages are not acknowledged and so UNICAST3 keeps them in its send table.
The messages are also retransmitted in an interval.
TP.handleSingleMessage
filters out single messages to the old destination B before callingMaxOneThreadPerSender.process
viaunicastDestMismatch
.TP.handleMessageBatch
orTP.processBatch
does not filter out message batches to the old destination B before callingMaxOneThreadPerSender.process
.MaxOneThreadPerSender.MessageTable.Entry
by callingMessageTable.get
.The destination of the batch is set to the old destination B.
MaxOneThreadPerSender.MessageTable.Entry
.A message batch must never bundle messages to different destinations, so here is the problem.
SubmitToThreadPool.BatchHandler.run
via theunicastDestMismatch
check because the destination of the batch is set to B.This includes ignoring the message to the new destination C.
Result: Messages arrive at the transport level but are never passed up the stack, so the new node is stuck although heartbeats work.
The fix we provide is very easy: the
unicastDestMismatch
check is performed for messages batches the same way it is performed for single messages before starting processing the message.