-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[broker] Fix issue that message ordering could be broken when redelivering messages on Key_Shared subscription #10762
Conversation
if (addMessageToReplay(position.getLedgerId(), position.getEntryId())) { | ||
// TODO: We want to pass a sticky key hash as a third argument to guarantee the order of the messages | ||
// on Key_Shared subscription, but it's difficult to get the sticky key here | ||
if (addMessageToReplay(position.getLedgerId(), position.getEntryId(), null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is executed when redelivery is requested from the consumer side (e.g. negative ack, ack timeout). So the user should allow the messages to be out of order in this case.
5867d2c
to
312713f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work!
I left one question for curiosity
.../apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java
Outdated
Show resolved
Hide resolved
312713f
to
75062d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java
Outdated
Show resolved
Hide resolved
75062d6
to
5a262ce
Compare
@lhotari PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@massakam nice idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Outstanding work @massakam !
|
||
public int removeAllUpTo(long markDeleteLedgerId, long markDeleteEntryId) { | ||
if (hashesToBeBlocked != null) { | ||
for (LongPair longPair : hashesToBeBlocked.keys()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using key()
will make a copy of the whole set, so we should instead do the forEach()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@merlimat If we try to remove any entry in forEach()
, it seems to cause a deadlock. Is there any way to avoid this without using keys()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhotari It happens in the master branch. hashesToBeBlocked
is an instance of org.apache.bookkeeper.util.collections.ConcurrentLongLongPairHashMap
, so it should be unrelated to those fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we can go on with this but we should also import here a fixed version of ConcurrentLongLongPairHashMap
that doesn't have the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we should also import here a fixed version of
ConcurrentLongLongPairHashMap
that doesn't have the problem
@merlimat Such a version doesn't exist yet, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the keys to be removed in the forEach()
loop to the newly created list and remove them from hashesToBeBlocked
after exiting the loop. This makes the entries to be copied part of the set rather than the whole set. PTAL.
266ca5d
...r/src/main/java/org/apache/pulsar/broker/service/persistent/MessageRedeliveryController.java
Outdated
Show resolved
Hide resolved
5decc85
to
ae48c4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@merlimat Please help review this PR again. |
62db869
to
101391c
Compare
…ering messages on Key_Shared subscription (apache#10762) Messages with the same key can be out of order if message redelivery occurs on a Key_Shared subscription. 1. Suppose `PersistentDispatcherMultipleConsumers#messagesToRedeliver` contains message-1 and message-2. Message-1 will be delivered to consumer-a and message-2 will be delivered to consumer-b. 2. The dispatcher tried to send message-1 to consumer-a, but the consumer was too slow to send it. 3. Consumer-a is added to `stuckConsumers`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L263-L266 4. The next time `readMoreEntries()` is run, `getMessagesToReplayNow()` will return an empty Set because `isDispatcherStuckOnReplays` is true. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L368-L374 5. The dispatcher reads newer messages instead of the messages contained in `messagesToRedeliver`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L233-L267 6. A new message (message-3) is delivered to consumer-b. 7. Message-2 contained in messagesToRedeliver is delivered to consumer-b. 8. As a result, the order of message-2 and message-3 is reversed. When adding a message to be redeliver to `messagesToRedeliver`, save the hash of the key that the message has. If the dispatcher attempts to send newer messages to the consumer that have a key corresponding to any one of the saved hash values, they will be added to `messagesToRedeliver` instead of being sent. This prevents messages with the same key from being out of order. (cherry picked from commit 5aee599)
…ering messages on Key_Shared subscription (#10762) Messages with the same key can be out of order if message redelivery occurs on a Key_Shared subscription. 1. Suppose `PersistentDispatcherMultipleConsumers#messagesToRedeliver` contains message-1 and message-2. Message-1 will be delivered to consumer-a and message-2 will be delivered to consumer-b. 2. The dispatcher tried to send message-1 to consumer-a, but the consumer was too slow to send it. 3. Consumer-a is added to `stuckConsumers`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L263-L266 4. The next time `readMoreEntries()` is run, `getMessagesToReplayNow()` will return an empty Set because `isDispatcherStuckOnReplays` is true. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L368-L374 5. The dispatcher reads newer messages instead of the messages contained in `messagesToRedeliver`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L233-L267 6. A new message (message-3) is delivered to consumer-b. 7. Message-2 contained in messagesToRedeliver is delivered to consumer-b. 8. As a result, the order of message-2 and message-3 is reversed. When adding a message to be redeliver to `messagesToRedeliver`, save the hash of the key that the message has. If the dispatcher attempts to send newer messages to the consumer that have a key corresponding to any one of the saved hash values, they will be added to `messagesToRedeliver` instead of being sent. This prevents messages with the same key from being out of order. (cherry picked from commit 5aee599)
…ering messages on Key_Shared subscription (apache#10762) Messages with the same key can be out of order if message redelivery occurs on a Key_Shared subscription. 1. Suppose `PersistentDispatcherMultipleConsumers#messagesToRedeliver` contains message-1 and message-2. Message-1 will be delivered to consumer-a and message-2 will be delivered to consumer-b. 2. The dispatcher tried to send message-1 to consumer-a, but the consumer was too slow to send it. 3. Consumer-a is added to `stuckConsumers`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L263-L266 4. The next time `readMoreEntries()` is run, `getMessagesToReplayNow()` will return an empty Set because `isDispatcherStuckOnReplays` is true. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L368-L374 5. The dispatcher reads newer messages instead of the messages contained in `messagesToRedeliver`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L233-L267 6. A new message (message-3) is delivered to consumer-b. 7. Message-2 contained in messagesToRedeliver is delivered to consumer-b. 8. As a result, the order of message-2 and message-3 is reversed. When adding a message to be redeliver to `messagesToRedeliver`, save the hash of the key that the message has. If the dispatcher attempts to send newer messages to the consumer that have a key corresponding to any one of the saved hash values, they will be added to `messagesToRedeliver` instead of being sent. This prevents messages with the same key from being out of order. (cherry picked from commit 5aee599)
Cherry-picking #10762 broke the Delayed messages feature in branch-2.7. This patch restores the method that has been dropped
…ering messages on Key_Shared subscription (#10762) Messages with the same key can be out of order if message redelivery occurs on a Key_Shared subscription. 1. Suppose `PersistentDispatcherMultipleConsumers#messagesToRedeliver` contains message-1 and message-2. Message-1 will be delivered to consumer-a and message-2 will be delivered to consumer-b. 2. The dispatcher tried to send message-1 to consumer-a, but the consumer was too slow to send it. 3. Consumer-a is added to `stuckConsumers`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L263-L266 4. The next time `readMoreEntries()` is run, `getMessagesToReplayNow()` will return an empty Set because `isDispatcherStuckOnReplays` is true. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L368-L374 5. The dispatcher reads newer messages instead of the messages contained in `messagesToRedeliver`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L233-L267 6. A new message (message-3) is delivered to consumer-b. 7. Message-2 contained in messagesToRedeliver is delivered to consumer-b. 8. As a result, the order of message-2 and message-3 is reversed. When adding a message to be redeliver to `messagesToRedeliver`, save the hash of the key that the message has. If the dispatcher attempts to send newer messages to the consumer that have a key corresponding to any one of the saved hash values, they will be added to `messagesToRedeliver` instead of being sent. This prevents messages with the same key from being out of order. (cherry picked from commit 5aee599)
…ering messages on Key_Shared subscription (apache#10762) ### Motivation Messages with the same key can be out of order if message redelivery occurs on a Key_Shared subscription. 1. Suppose `PersistentDispatcherMultipleConsumers#messagesToRedeliver` contains message-1 and message-2. Message-1 will be delivered to consumer-a and message-2 will be delivered to consumer-b. 2. The dispatcher tried to send message-1 to consumer-a, but the consumer was too slow to send it. 3. Consumer-a is added to `stuckConsumers`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L263-L266 4. The next time `readMoreEntries()` is run, `getMessagesToReplayNow()` will return an empty Set because `isDispatcherStuckOnReplays` is true. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java#L368-L374 5. The dispatcher reads newer messages instead of the messages contained in `messagesToRedeliver`. https://github.com/apache/pulsar/blob/894d92b2be3bee334e7ce32760c4d2e7978603aa/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java#L233-L267 6. A new message (message-3) is delivered to consumer-b. 7. Message-2 contained in messagesToRedeliver is delivered to consumer-b. 8. As a result, the order of message-2 and message-3 is reversed. ### Modifications When adding a message to be redeliver to `messagesToRedeliver`, save the hash of the key that the message has. If the dispatcher attempts to send newer messages to the consumer that have a key corresponding to any one of the saved hash values, they will be added to `messagesToRedeliver` instead of being sent. This prevents messages with the same key from being out of order.
…essage skip to avoid unnecessary consumption stuck (#20335) ### Motivation - #7105 provide a mechanism to avoid a stuck consumer affecting the consumption of other consumers: - if all consumers can not accept more messages, stop delivering messages to the client. - if one consumer can not accept more messages, just read new messages and deliver them to other consumers. - #7553 provide a mechanism to fix the issue of lost order of consumption: If the consumer cannot accept any more messages, skip the consumer for the next round of message delivery because there may be messages with the same key in the replay queue. - #10762 provide a mechanism to fix the issue of lost order of consumption: If there have any messages with the same key in the replay queue, do not deliver the new messages to this consumer. #10762 and #7553 do the same thing and #10762 is better than #7553 , so #7553 is unnecessary. ### Modifications remove the mechanism provided by #7553 to avoid unnecessary consumption stuck.
…essage skip to avoid unnecessary consumption stuck (#20335) - #7105 provide a mechanism to avoid a stuck consumer affecting the consumption of other consumers: - if all consumers can not accept more messages, stop delivering messages to the client. - if one consumer can not accept more messages, just read new messages and deliver them to other consumers. - #7553 provide a mechanism to fix the issue of lost order of consumption: If the consumer cannot accept any more messages, skip the consumer for the next round of message delivery because there may be messages with the same key in the replay queue. - #10762 provide a mechanism to fix the issue of lost order of consumption: If there have any messages with the same key in the replay queue, do not deliver the new messages to this consumer. #10762 and #7553 do the same thing and #10762 is better than #7553 , so #7553 is unnecessary. remove the mechanism provided by #7553 to avoid unnecessary consumption stuck. (cherry picked from commit 1e664b7)
…essage skip to avoid unnecessary consumption stuck (#20335) ### Motivation - #7105 provide a mechanism to avoid a stuck consumer affecting the consumption of other consumers: - if all consumers can not accept more messages, stop delivering messages to the client. - if one consumer can not accept more messages, just read new messages and deliver them to other consumers. - #7553 provide a mechanism to fix the issue of lost order of consumption: If the consumer cannot accept any more messages, skip the consumer for the next round of message delivery because there may be messages with the same key in the replay queue. - #10762 provide a mechanism to fix the issue of lost order of consumption: If there have any messages with the same key in the replay queue, do not deliver the new messages to this consumer. #10762 and #7553 do the same thing and #10762 is better than #7553 , so #7553 is unnecessary. ### Modifications remove the mechanism provided by #7553 to avoid unnecessary consumption stuck.
…essage skip to avoid unnecessary consumption stuck (apache#20335) - apache#7105 provide a mechanism to avoid a stuck consumer affecting the consumption of other consumers: - if all consumers can not accept more messages, stop delivering messages to the client. - if one consumer can not accept more messages, just read new messages and deliver them to other consumers. - apache#7553 provide a mechanism to fix the issue of lost order of consumption: If the consumer cannot accept any more messages, skip the consumer for the next round of message delivery because there may be messages with the same key in the replay queue. - apache#10762 provide a mechanism to fix the issue of lost order of consumption: If there have any messages with the same key in the replay queue, do not deliver the new messages to this consumer. apache#10762 and apache#7553 do the same thing and apache#10762 is better than apache#7553 , so apache#7553 is unnecessary. remove the mechanism provided by apache#7553 to avoid unnecessary consumption stuck. (cherry picked from commit 1e664b7) (cherry picked from commit c973603)
…essage skip to avoid unnecessary consumption stuck (#20335) ### Motivation - #7105 provide a mechanism to avoid a stuck consumer affecting the consumption of other consumers: - if all consumers can not accept more messages, stop delivering messages to the client. - if one consumer can not accept more messages, just read new messages and deliver them to other consumers. - #7553 provide a mechanism to fix the issue of lost order of consumption: If the consumer cannot accept any more messages, skip the consumer for the next round of message delivery because there may be messages with the same key in the replay queue. - #10762 provide a mechanism to fix the issue of lost order of consumption: If there have any messages with the same key in the replay queue, do not deliver the new messages to this consumer. #10762 and #7553 do the same thing and #10762 is better than #7553 , so #7553 is unnecessary. ### Modifications remove the mechanism provided by #7553 to avoid unnecessary consumption stuck. (cherry picked from commit 1e664b7)
Motivation
Messages with the same key can be out of order if message redelivery occurs on a Key_Shared subscription.
PersistentDispatcherMultipleConsumers#messagesToRedeliver
contains message-1 and message-2. Message-1 will be delivered to consumer-a and message-2 will be delivered to consumer-b.stuckConsumers
.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java
Lines 263 to 266 in 894d92b
readMoreEntries()
is run,getMessagesToReplayNow()
will return an empty Set becauseisDispatcherStuckOnReplays
is true.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentStickyKeyDispatcherMultipleConsumers.java
Lines 368 to 374 in 894d92b
messagesToRedeliver
.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java
Lines 233 to 267 in 894d92b
Modifications
When adding a message to be redeliver to
messagesToRedeliver
, save the hash of the key that the message has. If the dispatcher attempts to send newer messages to the consumer that have a key corresponding to any one of the saved hash values, they will be added tomessagesToRedeliver
instead of being sent. This prevents messages with the same key from being out of order.