Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs related to the journal recovering from a UFS failure #9723

Merged
merged 1 commit into from
Aug 14, 2019

Conversation

calvinjia
Copy link
Contributor

Fixes #9722

Fixes several edge cases in journal recovery.

  1. If we fail to flush to the journal, do not try to recover, instead flag the stream as needsRecovery and handle the recovery on next operation
  2. When recovering, make sure there is a previous journal entry
  3. When recovering, create a new log file starting with the latest committed journal entry (instead of our current one)
  4. When recovering, fail if the latest journal entry recovered from our buffer is not equal to our expected next sequence number - 1
  5. When recovering, do not consider incomplete logs when inferring the last persisted sequence number from the file name (it will be INTEGER_MIN)

@calvinjia calvinjia requested a review from ggezer August 14, 2019 18:13
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • AmplabJenkins build check: PENDING
    • We were not able to detect AmplabJenkins test results on this PR. Status will update when testing completes.
  • Commits associated with Github account: PASS
  • PR title follows the conventions: PASS

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Alluxio-Pull-Request-Builder/5099/
Test PASSed.

Copy link
Contributor

@ggezer ggezer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • AmplabJenkins build check: PASS
  • Commits associated with Github account: PASS
  • PR title follows the conventions: PASS

All checks passed!

@calvinjia
Copy link
Contributor Author

alluxio-bot, merge this please.

@alluxio-bot alluxio-bot merged commit be4e9cb into Alluxio:master Aug 14, 2019
calvinjia added a commit to calvinjia/tachyon that referenced this pull request Aug 14, 2019
Fixes Alluxio#9722

Fixes several edge cases in journal recovery.

1. If we fail to flush to the journal, do not try to recover, instead
flag the stream as needsRecovery and handle the recovery on next
operation
2. When recovering, make sure there is a previous journal entry
3. When recovering, create a new log file starting with the latest
committed journal entry (instead of our current one)
4. When recovering, fail if the latest journal entry recovered from our
buffer is not equal to our expected next sequence number - 1
5. When recovering, do not consider incomplete logs when inferring the
last persisted sequence number from the file name (it will be
INTEGER_MIN)

pr-link: Alluxio#9723
change-id: cid-63623a4e96d91f9619605dd9cad1b7da7679ac9c
calvinjia added a commit to calvinjia/tachyon that referenced this pull request Aug 14, 2019
Fixes Alluxio#9722

Fixes several edge cases in journal recovery.

1. If we fail to flush to the journal, do not try to recover, instead
flag the stream as needsRecovery and handle the recovery on next
operation
2. When recovering, make sure there is a previous journal entry
3. When recovering, create a new log file starting with the latest
committed journal entry (instead of our current one)
4. When recovering, fail if the latest journal entry recovered from our
buffer is not equal to our expected next sequence number - 1
5. When recovering, do not consider incomplete logs when inferring the
last persisted sequence number from the file name (it will be
INTEGER_MIN)

pr-link: Alluxio#9723
change-id: cid-63623a4e96d91f9619605dd9cad1b7da7679ac9c
alluxio-bot pushed a commit that referenced this pull request Aug 14, 2019
Cherry-pick of existing commit.
orig-pr: #9723
orig-commit: be4e9cb
orig-commit-author: Calvin Jia <jia.calvin@gmail.com>

pr-link: #9726
change-id: cid-63623a4e96d91f9619605dd9cad1b7da7679ac9c
alluxio-bot pushed a commit that referenced this pull request Aug 15, 2019
Cherry-pick of existing commit.
orig-pr: #9723
orig-commit: be4e9cb
orig-commit-author: Calvin Jia <jia.calvin@gmail.com>

pr-link: #9727
change-id: cid-63623a4e96d91f9619605dd9cad1b7da7679ac9c
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request Nov 12, 2019
Fixes Alluxio#9722

Fixes several edge cases in journal recovery.

1. If we fail to flush to the journal, do not try to recover, instead
flag the stream as needsRecovery and handle the recovery on next
operation
2. When recovering, make sure there is a previous journal entry
3. When recovering, create a new log file starting with the latest
committed journal entry (instead of our current one)
4. When recovering, fail if the latest journal entry recovered from our
buffer is not equal to our expected next sequence number - 1
5. When recovering, do not consider incomplete logs when inferring the
last persisted sequence number from the file name (it will be
INTEGER_MIN)

pr-link: Alluxio#9723
change-id: cid-63623a4e96d91f9619605dd9cad1b7da7679ac9c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Address edge cases in UFS journal recovery in case of UFS failure
5 participants