-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix handleTssMismatches crashes. [release-7.4] #12330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
handleTssMismatches(DatabaseContext* cx) uses a pointer to DatabaseContext object, which can be destroyed when "tr" is reset within this actor. However, the actor can't be destroyed because it's on the stack. Introducing this delay gives a chance to cancel the actor.
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
gxglass
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I wonder if there is a use for something like
#define CHECK_FOR_CANCELLATION wait(delay(0))
and call it at the top of loops that need this property? That said I don't know how frequent this need arises.
That's a good idea. I think there are maybe 2 or 3 other places we used this trick before. |
cherrypick #12328
handleTssMismatches(DatabaseContext* cx) uses a pointer to DatabaseContext object, which can be destroyed when "tr" is reset within this actor. However, the actor can't be destroyed because it's on the stack. Introducing this delay gives a chance to cancel the actor.
20250826-165641-jzhou-a1cbfc05926fdfd9 compressed=True data_size=59276546 duration=4833332 ended=100000 fail=1 fail_fast=10 max_runs=100000 pass=99999 priority=100 remaining=0 runtime=1:22:44 sanity=False started=100000 stopped=20250826-181925 submitted=20250826-165641 timeout=5400 username=jzhou
Interestingly, the failure is
-f ./tests/rare/ClogRemoteTLog.toml -s 870359747 -b offwithClogRemoteTLogCheckFailedSev40, which is not the crashing failure this PR fixed.Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormainif this is the youngest branch)