New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release-7.1] Fix false-alarm of sev40 error in distributed consistency checker setting #11203
Merged
kakaiu
merged 2 commits into
apple:release-7.1
from
kakaiu:fix-sev-error-distributed-consistency-checker
Feb 16, 2024
Merged
[Release-7.1] Fix false-alarm of sev40 error in distributed consistency checker setting #11203
kakaiu
merged 2 commits into
apple:release-7.1
from
kakaiu:fix-sev-error-distributed-consistency-checker
Feb 16, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
kakaiu
changed the title
fix sev error in distributed consistency checker
Fix SEV error in distributed consistency checker
Feb 15, 2024
kakaiu
changed the title
Fix SEV error in distributed consistency checker
Fix false-alarm of sev40 error in distributed consistency checker setting
Feb 15, 2024
kakaiu
force-pushed
the
fix-sev-error-distributed-consistency-checker
branch
from
February 15, 2024 18:48
7fbf6fd
to
8d5d980
Compare
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
jzhou77
approved these changes
Feb 15, 2024
liquid-helium
approved these changes
Feb 15, 2024
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
kakaiu
changed the title
Fix false-alarm of sev40 error in distributed consistency checker setting
[Release-7.1] Fix false-alarm of sev40 error in distributed consistency checker setting
Feb 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a corner case triggers s40 error while no fatal event actually happens. The story is that: In round 1, the checker tried to request a tester to run the workload. At the tester side, the request was received and the workload started. However, the checker was failed to decide whether the tester had received the request (request_maybe_delivered in ConsistencyCheckUrgent_RunWorkloadError2 trace event). So, the checker thought that this tester had failed and the checker marked all ranges assigned to this tester as failed (will retry later). Then the checker immediately proceeded to round 2. However, the tester was still running the workload of round 1. In round 2, the checker issued a new workload to the tester. The tester saw a new workload while the existing workload of round 1 is still running. Therefore the tester cancelled the old workload and started the new workload. The TestFailure trace event is triggered by the old workload.
500K correctness:
20240216-025627-zhewang-aedeffe9df3ae1b5 compressed=True data_size=26923596 duration=19154774 ended=500000 fail_fast=10 max_runs=500000 pass=500000 priority=100 remaining=0 runtime=2:13:36 sanity=False started=500000 stopped=20240216-051003 submitted=20240216-025627 timeout=5400 username=zhewang
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)