Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages that have already been acked are redelivered when upgrading Pulsar version #7554

Closed
massakam opened this issue Jul 16, 2020 · 3 comments · Fixed by #7861
Closed
Labels
area/broker type/bug The PR fixed a bug or issue reported a bug

Comments

@massakam
Copy link
Contributor

The other day, we upgraded the Pulsar version of our broker servers from 2.3.2 to 2.4.2.

At that time, a large number of messages were redelivered on some topics. These messages should have already been acked in the past. The topics where this happened had some unacked messages in their backlog. I can understand that the unacked messages were redelivered. The strange thing is that some or all of the acked messages that followed the unacked messages were also redelivered.

We have found that this happens when the Pulsar 2.3.2 broker that owns the topic in question is shut down and the topic is moved to the Pulsar 2.4.2 broker. The cause seems to be the difference in behavior of individualDeletedMessages held by ManagedCursorImpl instance.

In Pulsar 2.3.2, individualDeletedMessages is an instance of the TreeRangeSet class. Instances of TreeRangeSet can contain ranges that span different ledgers.

Verification code 1:

RangeSet<PositionImpl> rangeSet = TreeRangeSet.create();
rangeSet.add(Range.openClosed(new PositionImpl(1, 100), new PositionImpl(2, 200)));
System.out.println(rangeSet);

Result 1:

[(1:100..2:200]]

On the other hand, in Pulsar 2.4.2, individualDeletedMessages is an instance of ConcurrentOpenLongPairRangeSet by default. If a range that spans multiple ledgers is added to this instance, the information of the first half will be lost. It seems that adding such a range to ConcurrentOpenLongPairRangeSet is not allowed.

Verification code 2:

LongPairConsumer<PositionImpl> positionRangeConverter = (key, value) -> new PositionImpl(key, value);
LongPairRangeSet<PositionImpl> rangeSet = new ConcurrentOpenLongPairRangeSet<PositionImpl>(4096, positionRangeConverter);
rangeSet.addOpenClosed(1, 100, 2, 200);
System.out.println(rangeSet);

Result 2:

[(2:-1..2:200]]

And if we set managedLedgerUnackedRangesOpenCacheSetEnabled to false on Pulsar 2.4.2 broker, individualDeletedMessages will be an instance of LongPairRangeSet.DefaultRangeSet. Such a range can be added to LongPairRangeSet.DefaultRangeSet.

Verification code 3:

LongPairConsumer<PositionImpl> positionRangeConverter = (key, value) -> new PositionImpl(key, value);
LongPairRangeSet<PositionImpl> rangeSet = new LongPairRangeSet.DefaultRangeSet<>(positionRangeConverter);
rangeSet.addOpenClosed(1, 100, 2, 200);
System.out.println(rangeSet);

Result 3:

[(1:100..2:200]]

This difference in behavior causes redelivery of acked messages. For example, suppose a topic owned by a Pulsar 2.3.2 broker has the following individuallyDeletedMessages:

"individuallyDeletedMessages" : "[(2625703:-1..2625703:9], (2625719:-1..2625727:9]]",

When this Pulsar 2.3.2 broker is shut down and the topic moves to a Pulsar 2.4.2 broker, the individuallyDeletedMessages changes as follows:

"individuallyDeletedMessages" : "[(2625703:-1..2625703:9],(2625727:-1..2625727:9]]",

The messages with ledger ID 2625719 have already been acked, but after the topic moves to the Pulsar 2.4.2 broker, that information is lost and these messages are redelivered to the consumers.

@massakam massakam added the type/bug The PR fixed a bug or issue reported a bug label Jul 16, 2020
@massakam
Copy link
Contributor Author

The class of individualDeletedMessages has been changed by the following two PRs:

@massakam
Copy link
Contributor Author

@rdhabalia Is the difference in behavior between ConcurrentOpenLongPairRangeSet and TreeRangeSet/LongPairRangeSet.DefaultRangeSet intentional? If so, what is the reason?

@sijie
Copy link
Member

sijie commented Jul 17, 2020

@codelipenghui Can you check this issue?

sijie pushed a commit that referenced this issue Aug 27, 2020
…Messages (#7861)

Fixes #7554

### Motivation

As mentioned in #7554, the class of `individualDeletedMessages` is different between Pulsar 2.3.2 (and earlier) and 2.4.0 (and later). This causes some of ranges contained in `individualDeletedMessages` to be lost when the version of Pulsar is upgraded, and a large number of messages that have already been acked can be redelivered to consumers.

Also, even if the Pulsar version is 2.4.0 or later, the same phenomenon occurs when the value of `managedLedgerUnackedRangesOpenCacheSetEnabled` is switched from false to true.

### Modifications

If the list of individually deleted message ranges loaded from ZK contains ranges that span different ledgers, split the ranges by ledger ID and store them in `individualDeletedMessages`.

As a result, information about deleted message ranges is never lost and messages that have already been acked will not be redelivered.
lbenc135 pushed a commit to lbenc135/pulsar that referenced this issue Sep 5, 2020
…Messages (apache#7861)

Fixes apache#7554

### Motivation

As mentioned in apache#7554, the class of `individualDeletedMessages` is different between Pulsar 2.3.2 (and earlier) and 2.4.0 (and later). This causes some of ranges contained in `individualDeletedMessages` to be lost when the version of Pulsar is upgraded, and a large number of messages that have already been acked can be redelivered to consumers.

Also, even if the Pulsar version is 2.4.0 or later, the same phenomenon occurs when the value of `managedLedgerUnackedRangesOpenCacheSetEnabled` is switched from false to true.

### Modifications

If the list of individually deleted message ranges loaded from ZK contains ranges that span different ledgers, split the ranges by ledger ID and store them in `individualDeletedMessages`.

As a result, information about deleted message ranges is never lost and messages that have already been acked will not be redelivered.
lbenc135 pushed a commit to lbenc135/pulsar that referenced this issue Sep 5, 2020
…Messages (apache#7861)

Fixes apache#7554

### Motivation

As mentioned in apache#7554, the class of `individualDeletedMessages` is different between Pulsar 2.3.2 (and earlier) and 2.4.0 (and later). This causes some of ranges contained in `individualDeletedMessages` to be lost when the version of Pulsar is upgraded, and a large number of messages that have already been acked can be redelivered to consumers.

Also, even if the Pulsar version is 2.4.0 or later, the same phenomenon occurs when the value of `managedLedgerUnackedRangesOpenCacheSetEnabled` is switched from false to true.

### Modifications

If the list of individually deleted message ranges loaded from ZK contains ranges that span different ledgers, split the ranges by ledger ID and store them in `individualDeletedMessages`.

As a result, information about deleted message ranges is never lost and messages that have already been acked will not be redelivered.
lbenc135 pushed a commit to lbenc135/pulsar that referenced this issue Sep 5, 2020
…Messages (apache#7861)

Fixes apache#7554

### Motivation

As mentioned in apache#7554, the class of `individualDeletedMessages` is different between Pulsar 2.3.2 (and earlier) and 2.4.0 (and later). This causes some of ranges contained in `individualDeletedMessages` to be lost when the version of Pulsar is upgraded, and a large number of messages that have already been acked can be redelivered to consumers.

Also, even if the Pulsar version is 2.4.0 or later, the same phenomenon occurs when the value of `managedLedgerUnackedRangesOpenCacheSetEnabled` is switched from false to true.

### Modifications

If the list of individually deleted message ranges loaded from ZK contains ranges that span different ledgers, split the ranges by ledger ID and store them in `individualDeletedMessages`.

As a result, information about deleted message ranges is never lost and messages that have already been acked will not be redelivered.
wolfstudy pushed a commit that referenced this issue Oct 30, 2020
…Messages (#7861)

Fixes #7554

### Motivation

As mentioned in #7554, the class of `individualDeletedMessages` is different between Pulsar 2.3.2 (and earlier) and 2.4.0 (and later). This causes some of ranges contained in `individualDeletedMessages` to be lost when the version of Pulsar is upgraded, and a large number of messages that have already been acked can be redelivered to consumers.

Also, even if the Pulsar version is 2.4.0 or later, the same phenomenon occurs when the value of `managedLedgerUnackedRangesOpenCacheSetEnabled` is switched from false to true.

### Modifications

If the list of individually deleted message ranges loaded from ZK contains ranges that span different ledgers, split the ranges by ledger ID and store them in `individualDeletedMessages`.

As a result, information about deleted message ranges is never lost and messages that have already been acked will not be redelivered.

(cherry picked from commit 0b7f034)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
2 participants