Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-7.1] Fix false-alarm of sev40 error in distributed consistency checker setting #11203

Merged

Conversation

kakaiu
Copy link
Member

@kakaiu kakaiu commented Feb 15, 2024

This is a corner case triggers s40 error while no fatal event actually happens. The story is that: In round 1, the checker tried to request a tester to run the workload. At the tester side, the request was received and the workload started. However, the checker was failed to decide whether the tester had received the request (request_maybe_delivered in ConsistencyCheckUrgent_RunWorkloadError2 trace event). So, the checker thought that this tester had failed and the checker marked all ranges assigned to this tester as failed (will retry later). Then the checker immediately proceeded to round 2. However, the tester was still running the workload of round 1. In round 2, the checker issued a new workload to the tester. The tester saw a new workload while the existing workload of round 1 is still running. Therefore the tester cancelled the old workload and started the new workload. The TestFailure trace event is triggered by the old workload.

500K correctness:
20240216-025627-zhewang-aedeffe9df3ae1b5 compressed=True data_size=26923596 duration=19154774 ended=500000 fail_fast=10 max_runs=500000 pass=500000 priority=100 remaining=0 runtime=2:13:36 sanity=False started=500000 stopped=20240216-051003 submitted=20240216-025627 timeout=5400 username=zhewang

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@kakaiu kakaiu changed the title fix sev error in distributed consistency checker Fix SEV error in distributed consistency checker Feb 15, 2024
@kakaiu kakaiu changed the title Fix SEV error in distributed consistency checker Fix false-alarm of sev40 error in distributed consistency checker setting Feb 15, 2024
@kakaiu kakaiu force-pushed the fix-sev-error-distributed-consistency-checker branch from 7fbf6fd to 8d5d980 Compare February 15, 2024 18:48
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 7fbf6fd
  • Duration 0:10:57
  • Result: ❌ FAILED
  • Error: reference not found for primary source and source version 7fbf6fd1ceb21b8d3d978366c794dfcbf0e97ce8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 7fbf6fd
  • Duration 0:11:29
  • Result: ❌ FAILED
  • Error: reference not found for primary source and source version 7fbf6fd1ceb21b8d3d978366c794dfcbf0e97ce8
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 7fbf6fd
  • Duration 0:21:48
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 7fbf6fd
  • Duration 0:28:34
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 7fbf6fd
  • Duration 0:30:46
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 8d5d980
  • Duration 0:48:17
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 8d5d980
  • Duration 1:07:26
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 8d5d980
  • Duration 1:17:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@kakaiu kakaiu closed this Feb 15, 2024
@kakaiu kakaiu reopened this Feb 15, 2024
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 8d5d980
  • Duration 0:27:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 8d5d980
  • Duration 0:34:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 8d5d980
  • Duration 0:45:57
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 8d5d980
  • Duration 1:11:24
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 8d5d980
  • Duration 1:16:22
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 05f8c66
  • Duration 0:26:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 05f8c66
  • Duration 0:34:56
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux CentOS 7

  • Commit ID: 05f8c66
  • Duration 0:44:32
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux CentOS 7

  • Commit ID: 05f8c66
  • Duration 1:08:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux CentOS 7

  • Commit ID: 05f8c66
  • Duration 1:16:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@kakaiu kakaiu merged commit e8856e9 into apple:release-7.1 Feb 16, 2024
5 checks passed
@kakaiu kakaiu changed the title Fix false-alarm of sev40 error in distributed consistency checker setting [Release-7.1] Fix false-alarm of sev40 error in distributed consistency checker setting Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants