-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Managed Ledger] Resolved race by fixing order of adding OpAddEntry to pendingAddEntries #10758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@lhotari thoughts? |
|
@devinbost thanks for your contribution. For this PR, do we need to update docs? |
|
@Anonymitaet Thanks for asking. I don't think that will be necessary for this PR. |
|
I believe that we should not add the operation to the list until it is fully prepared |
a223cc2 to
6583c24
Compare
|
@eolivelli I moved the What happens if this block in ( pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java Line 1403 in 380cf92
Seems like a race... The state could be set to |
|
/pulsarbot run-failure-checks |
|
I think we have more flaky tests.
Another one that failed earlier was:
but that test passed for me locally. @lhotari FYI. |
|
/pulsarbot run-failure-checks |
|
@eolivelli PTAL. All the tests are passing. |
|
@sijie ? |
|
Due to this race, if the ledger isn't attached to the |
Adding to pendingAddEntries after finishing changes on addOperation
380cf92 to
b546920
Compare
|
The pr had no activity for 30 days, mark with Stale label. |
|
@devinbost:Thanks for your contribution. For this PR, do we need to update docs? |
|
Closed as stale and conflict. Please rebase and resubmit the patch if it's still relevant. It seems no consensus here. PR page is always under high traffic. From GitHub data, the Pulsar community opens and closes about 300+ PR per month correspondingly. I suggest you start a thread on the dev@ mailing list to reach a consensus first. |
It looks like it's possible for
pendingAddEntriesto have anOpAddEntryinstance that hasn't had aledgerIdset beforecheckAddTimeout()is called.We add the
OpAddEntrytopendingAddEntrieshere:pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Line 716 in a223cc2
and set the
ledgerIdlater onOpAddEntryin that method:pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Line 760 in a223cc2
If
checkAddTimeout()is called before theledgerIdis set, theledgerIdwill show as -1 (pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Line 3678 in a223cc2
ledgerClosed(..), we may not close the ledger that was timing out.