Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid problem that topic becomes unavailable due to failure of cursor recovery #2673

Merged
merged 2 commits into from
Oct 2, 2018

Conversation

massakam
Copy link
Contributor

@massakam massakam commented Sep 28, 2018

Motivation

This is a change to avoid #2666.

If a topic is under the following condition and the broker is restarted, that topic becomes unavailable and we can not send or receive messages.

  • The managed ledger contains two ledgers and the newer ledger is empty
ledgerInfo {
  ledgerId: 65425
  entries: 2
  size: 238
  timestamp: 1506510325426
}
ledgerInfo {
  ledgerId: 1952728
  timestamp: 0
}
  • cursorsLedgerId included in the managed cursor info is -1
  • markDeleteLedgerId is between the two ledgers contained in the managed ledger
cursorsLedgerId: -1
markDeleteLedgerId: 1952575
markDeleteEntryId: -1

This is because position gets ahead of the last position of the managed ledger, and IllegalArgumentException occurs in the following line.

messagesConsumedCounter = -getNumberOfEntries(Range.openClosed(position, ledger.getLastPosition()));

Modifications

If position gets ahead of the last position of the managed ledger when recovering the cursor, position is adjusted to the last position.

Result

Even a topic with metadata as described above, the cursor will be recovered successfully.

@massakam massakam added this to the 2.3.0 milestone Sep 28, 2018
@massakam massakam self-assigned this Sep 28, 2018
@massakam
Copy link
Contributor Author

rerun cpp tests

@sijie sijie changed the title Avoid problem that topic becomes unavailable due to failure of cursor… Avoid problem that topic becomes unavailable due to failure of cursor recovery Sep 28, 2018
@massakam
Copy link
Contributor Author

rerun cpp tests

@merlimat
Copy link
Contributor

@massakam The change LGTM. It would be nice to have a test that reproduces this issue in the Mocked managed ledger tests, to ensure we won't have a regression in the future.

@massakam
Copy link
Contributor Author

massakam commented Oct 1, 2018

rerun cpp tests

@massakam
Copy link
Contributor Author

massakam commented Oct 1, 2018

rerun java8 tests

@massakam
Copy link
Contributor Author

massakam commented Oct 2, 2018

@merlimat I have added the test.

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@merlimat merlimat merged commit b2484d9 into apache:master Oct 2, 2018
@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Oct 2, 2018
@massakam massakam deleted the cursor-recovery branch October 2, 2018 00:52
sijie pushed a commit that referenced this pull request Oct 4, 2018
For a topic with metadata similar to #2673, IllegalArgumentException may occur in the following line:
https://github.com/apache/pulsar/blob/b2484d92d5068d4f0699eb9c3d31640cb48f9dd0/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L655

This is the broker log when the exception has occurred:
[invalid_range_error.txt](https://github.com/apache/pulsar/files/2442924/invalid_range_error.txt)

It is because `readPosition` is ahead of `ledger.getLastPosition().getNext()`, so `ManagedCursorImpl#getNumberOfEntries()` should return 0 as the number of entries to read in that case.

I think this issue and #2673 are the result of that `ledger.getLastPosition()` is no longer the real last of the managed ledger because of #1550.
@massakam massakam modified the milestones: 2.3.0, 2.2.1 Oct 29, 2018
@merlimat merlimat modified the milestones: 2.2.1, 2.2.0 Dec 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants