-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv/kvserver: TestLeasePreferencesDuringOutage failed #120605
Labels
A-testing
Testing tools and infrastructure
branch-master
Failures and bugs on the master branch.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
P-2
Issues/test failures with a fix SLA of 3 months
T-kv
KV Team
Milestone
Comments
cockroach-teamcity
added
branch-master
Failures and bugs on the master branch.
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
T-kv
KV Team
labels
Mar 17, 2024
I can reproduce this in under 200 runs with:
This is failing on the up-replication step, where there should be no voters on n2 or n3. |
kvoli
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-testing
Testing tools and infrastructure
P-2
Issues/test failures with a fix SLA of 3 months
and removed
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
labels
Mar 18, 2024
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Mar 26, 2024
A replica lease queue was introduced in cockroachdb#119155, which processes lease transfers for replicas. Previously, the replicate queue handled lease transfers. The linked change omitted updating the `ReplicationManual` test cluster mode to disable the lease queue, resulting in unexpected lease transfers in some tests. Disable the lease queue under `ReplicationManual` replication mode. Misc tests are also updated to disable/enable the lease queue appropriately. Epic: None Touches: cockroachdb#120605 Release note: None
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Mar 26, 2024
A replica lease queue was introduced in cockroachdb#119155, which processes lease transfers for replicas. Previously, the replicate queue handled lease transfers. The linked change omitted updating the `ReplicationManual` test cluster mode to disable the lease queue, resulting in unexpected lease transfers in some tests. Disable the lease queue under `ReplicationManual` replication mode. Misc tests are also updated to disable/enable the lease queue appropriately. Epic: None Touches: cockroachdb#120605 Release note: None
kvoli
added a commit
to kvoli/cockroach
that referenced
this issue
Mar 26, 2024
After cockroachdb#118966, lease preference satisfaction is no longer tied to up-replication. As such, there is no guarantee that a range will up-replicate before the lease is transferred to satisfy a preference. This caused `TestLeasePreferencesDuringOutage` to occasionally fail as the test relies on the replica scanner to enqueue replicas into the replicate queue to up-replicate, whereas previously it would be enqueued into for lease preferences initially. Only assert on the lease preference, as this is all that is guaranteed to occur quickly. Fixes: cockroachdb#120605 Release note: None
craig bot
pushed a commit
that referenced
this issue
Mar 26, 2024
120857: compose: remove PG ComposeCompare test r=rafiss a=rafiss This test does not provide us much value and is too flaky to be useful. Most of the time it fails are due to minor differences in things like names, formatting, or precision, and accommodating each of these differences is not worth it. fixes #109400 fixes #116150 fixes #112154 Release note: None 121052: base,testutils,kvserver: disable lease queue in replication manual r=arulajmani a=kvoli A replica lease queue was introduced in #119155, which processes lease transfers for replicas. Previously, the replicate queue handled lease transfers. The linked change omitted updating the `ReplicationManual` test cluster mode to disable the lease queue, resulting in unexpected lease transfers in some tests. Disable the lease queue under `ReplicationManual` replication mode. Misc tests are also updated to disable/enable the lease queue appropriately. Epic: None Touches: #120605 Release note: None 121120: roachtest: deflake admission-control/multitenant-fairness r=sumeerbhola a=aadityasondhi In #120236, we removed the `failed` metadata from the `crdb_internal.statement_statistics` table. Fixes #120586. Fixes #120587. Fixes #120588. Fixes #120589. Release note: None 121132: batcheval: move test to `large` RBE pool r=rail a=rickystewart Epic: CRDB-8308 Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
craig bot
pushed a commit
that referenced
this issue
Mar 27, 2024
119719: kv: disable circuit breaker on destroyed replica r=lyang24 a=lyang24 This commit disables circuit breaker on replica that is destoryed or in the process of being destroyed. Informs #104567 Release note: None 120643: kvserver: don't wait for replication in outage lease pref test r=andrewbaptist a=kvoli After #118966, lease preference satisfaction is no longer tied to up-replication. As such, there is no guarantee that a range will up-replicate before the lease is transferred to satisfy a preference. This caused `TestLeasePreferencesDuringOutage` to occasionally fail as the test relies on the replica scanner to enqueue replicas into the replicate queue to up-replicate, whereas previously it would be enqueued into for lease preferences initially. Only assert on the lease preference, as this is all that is guaranteed to occur quickly. Fixes: #120605 Release note: None 120727: sql: add crdb_internal.protect_cluster builtin r=dt a=stevendanna Migration tooling uses historical queries that may need to run for many minutes or hours. Ensuring that these queries can continue to completion requires protecting the table's data from garbage collection. The new builtin crdb_internal.protect_cluster creates a cluster-wide PTS record at the given timestamp. The timestamp will expire in 24 hours or when the returned job ID is canceled. We re-purpose the stream ingestion producer job as the owner of this timestamp. This has a couple of advantages over a session-scoped builtin for now: - Ability for an operator to remove the PTS record explicitly using CANCEL JOB. - Some visibility into the source of the PTS record via the jobs table. We use a cluster-wide PTS record rather than a table-specific PTS record because to be sure that we can do a historical query as of a given timestamp also requires that we have access to at least the descriptor and namespace table at that timestamp. We've re-used the stream ingestion producer job for now as it also gives a simple way for the caller to extend the PTS record beyond 24 hours with crdb_internal.replication_stream_progress(job_id, protectTS). Epic: CC-27068 Release note: None Co-authored-by: lyang24 <lanqingy@usc.edu> Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-testing
Testing tools and infrastructure
branch-master
Failures and bugs on the master branch.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-test-failure
Broken test (automatically or manually discovered).
O-robot
Originated from a bot.
P-2
Issues/test failures with a fix SLA of 3 months
T-kv
KV Team
kv/kvserver.TestLeasePreferencesDuringOutage failed on master @ 067e48d29b9093038f6fcf2074cd761ffdcd4fe2:
Parameters:
attempt=1
deadlock=true
run=1
shard=25
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-36769
The text was updated successfully, but these errors were encountered: