-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrades: Setting the cluster version can get stuck behind leasing #113908
Labels
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Comments
fqazi
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
labels
Nov 6, 2023
nvanbenschoten
added
the
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
label
Nov 7, 2023
rafiss
added
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
labels
Nov 7, 2023
fqazi
added a commit
to fqazi/cockroach
that referenced
this issue
Nov 8, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: cockroachdb#113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
fqazi
added a commit
to fqazi/cockroach
that referenced
this issue
Nov 9, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: cockroachdb#113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
fqazi
added a commit
to fqazi/cockroach
that referenced
this issue
Nov 9, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: cockroachdb#113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
fqazi
added a commit
to fqazi/cockroach
that referenced
this issue
Nov 13, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: cockroachdb#113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
fqazi
added a commit
to fqazi/cockroach
that referenced
this issue
Nov 13, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: cockroachdb#113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
fqazi
added a commit
to fqazi/cockroach
that referenced
this issue
Nov 15, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: cockroachdb#113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
fqazi
added a commit
to fqazi/cockroach
that referenced
this issue
Nov 16, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: cockroachdb#113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
craig bot
pushed a commit
that referenced
this issue
Nov 17, 2023
113934: roachprod: use gcloud CLI instead of net.LookupSRV r=renatolabs a=herkolategan Previously `net.LookupSRV` with a custom resolver was used to lookup DNS records. This approach resulted in several flakes and required waiting on DNS servers to have the records available. The CLI is more stable, but has a greater call overhead. This PR also introduces a cache to reduce the cost of the `LookupSRVRecords` call which could be called frequently depending on the origin of use. The cache is updated for any CRUD operations on the DNS entries, and a call to the CLI will not occur if any entry exists for the name the lookup is attempting. The names are also normalised to remove a trailing dot in order to make matching against the cache work correctly. There is a small risk that the cache could go out of sync if any other roachprod process manipulates the records with a create, update or destroy operation, while a continuous roachprod process is interacting with the entries. This risk is relatively small and usually applies to roachtest rather than everyday use of roachprod. Fixes #111269 Epic: None Release Note: None 113996: upgrade: use high priority txn's to update the cluster version r=fqazi a=fqazi Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: #113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters. Co-authored-by: Herko Lategan <herko@cockroachlabs.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
yuzefovich
pushed a commit
to yuzefovich/cockroach
that referenced
this issue
Nov 28, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: cockroachdb#113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
fqazi
added a commit
that referenced
this issue
Nov 29, 2023
Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals. Fixes: #113908 Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
On clusters with a large number of nodes and descriptors its possible for leasing traffic to be continuous. As a part of 23.1 we added version checks during lease renewal and deletion to detect if the format should be regional by row or the old non-multi region format. Unfortunately, the way the version guards in the leasing manager work is that they query KV in a high priority transaction preventing us from bumping up the cluster version number
We need to know transactionally the version so that we can determine if the new regional by row table should be used, so not doing version checks transactionally is not an option. Instead the alternative we are going to pursue is to bump the priority of the upgrade transaction when setting the cluster version.
Jira issue: CRDB-33252
gz#19331
Epic CRDB-35306
The text was updated successfully, but these errors were encountered: