-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver: deflake TestLeaseQueueLeasePreferencePurgatoryError #134653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: deflake TestLeaseQueueLeasePreferencePurgatoryError #134653
Conversation
The test sets up an environment in which 40 replicas of interest are supposed to enter the lease queue purgatory. The test was waiting for this to happen before proceeding, but was doing so incorrectly: It checked that the number of replicas in the purgatory matches 40 (as opposed to checking directly that all ranges of interest had entered it). Since other ranges could slip in, occasionally the test would proceed too early, remove the condition that causes ranges to enter the purgatory, and then find that a few ranges would not be processed (since they never entered the purgatory in the first place). This commit fixes this by waiting explicitly for the RangeIDs of interest to be represented in the lease queue purgatory. I was able to reproduce the flake in a few minutes on my gceworker via ``` ./dev test --count 10000 --stress ./pkg/kv/kvserver \ --filter TestLeaseQueueLeasePreferencePurgatoryError -- \ --jobs 100 --local_resources=cpu=100 --local_resources=memory=HOST_RAM 2>&1 ``` This no longer reproduces as of this PR. Fixes cockroachdb#134578. Epic: none Release note: None
We are somewhat abusing purgatory, however it was intentional, as purgatory is a convenient retry mechanism for leaseholders who are not currently the raft leader and will therefore fail a lease transfer check in the allocator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
bors r+ |
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #134578: branch-release-24.1, branch-release-24.3. Issue #134768: branch-release-24.1, branch-release-24.2. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Accidentally missed this in cockroachdb#134653. Epic: none
The test sets up an environment in which 40 replicas of interest are
supposed to enter the lease queue purgatory. The test was waiting for
this to happen before proceeding, but was doing so incorrectly: It
checked that the number of replicas in the purgatory matches 40 (as
opposed to checking directly that all ranges of interest had entered
it). Since other ranges could slip in, occasionally the test would
proceed too early, remove the condition that causes ranges to enter the
purgatory, and then find that a few ranges would not be processed (since
they never entered the purgatory in the first place).
This commit fixes this by waiting explicitly for the RangeIDs of
interest to be represented in the lease queue purgatory.
I was able to reproduce the flake in a few minutes on my gceworker via
This no longer reproduces as of this PR:
Fixes #134578.
Fixes #134768.
The backports will fix but this
Touches #134765.
Epic: none
Release note: None