[Bug][broker] cursor will read in dead loop when do tailing-read with enableTransaction #22943
Open
3 tasks done
Labels
type/bug
The PR fixed a bug or issue reported a bug
Search before asking
Read release policy
Version
client: pulsar-3.0.5
broker: pulsar-3.0.5
Minimal reproduce step
do txn produce and normal consume on a 200-partition topic by pulsar-perf. The throughput is 10MB/s, batchSize is 10, subscriptionType is exclusive. It is a tailing read, consuming the latest message
produce config is : -txn -nmt 1000 -time 0 -s 1024 -i 60 -bm 10 -b 1000 -bb 4194304 -r 10000 -mk random -threads 3
consume config is : -time 0 -i 60 -s sub_test_txn_p200 -ss sub_test_txn_p200 -sp Latest -ioThreads 1 -n 1
What did you expect to see?
cpu load is low
What did you see instead?
broker with little throughput but high cpu load
Anything else?
This issue is proposed before but actually the issue still exist in the master branch . And it is a serious issue that result in transaction unavailable.
The root is :
In ManagedCursorImpl#asyncReadEntriesWithSkipOrWait, hasMoreEntries() only compare readPosition and lastConfirmedEntry. However, if we enableTransaction, maxReadPosition also decide whether we can read entry.
Currently, if readPosition < lastConfirmedEntry && readPosition > maxReadPosition. We can read entry immediately. But when enter internalReadFromLedger(), we will go into opReadEntry.checkReadCompletion(), and then trigger callback.readEntriesComplete()
Therefore, it would continue to read entry in dead loop, but actually there is no need to read entry.
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Lines 934 to 979 in 5dc0304
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Lines 2051 to 2056 in 5dc0304
pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpReadEntry.java
Lines 164 to 186 in 5dc0304
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: