Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade: use high priority txn's to update the cluster version #113996

Merged
merged 2 commits into from Nov 17, 2023

Conversation

fqazi
Copy link
Collaborator

@fqazi fqazi commented Nov 8, 2023

Previously, it was possible for the leasing subsystem to starve out attempts to set the cluster version during upgrades, since the leasing subsystem uses high priority txn for renewals. To address this, this patch makes the logic to set the cluster version high priority so it can't be pushed out by lease renewals.

Fixes: #113908

Release note (bug fix): Addressed a bug that could cause cluster version finalization to get starved out by descriptor lease renewals on larger clusters.

@fqazi fqazi added backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2. labels Nov 8, 2023
@fqazi fqazi requested a review from a team November 8, 2023 01:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fqazi!

In addition to the test that you're adding here, there are two other thing that would give us more confidence in the efficacy and safety of this change:

  1. we demonstrate that it fixes the reproduction presented in https://github.com/cockroachlabs/support/issues/2685#issuecomment-1788125904.
  2. we determine all of the keys that the upgrade txn that we're marking as high priority could read from and write to. The code itself makes it look like we're only reading from the version key and then writing to it. But are there any other reads or writes that are performed on behalf of the txn (now at high priority) due to the use of an internal executor?

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @fqazi)


pkg/sql/set_cluster_setting.go line 621 at r1 (raw file):

		return db.Txn(ctx, func(ctx context.Context, txn isql.Txn) error {
			// Run the version bump inside the upgrade as high priority, since
			// lease manager ends up locking the version row inside the settings

s/locking/reading/

"with high priority"


pkg/sql/set_cluster_setting.go line 624 at r1 (raw file):

			// table when refreshing leases. On larger clusters or with a large number
			// of descriptors its possible for normal transactions to be starved
			// (#95227)

nit: punctuation.


pkg/sql/set_cluster_setting.go line 624 at r1 (raw file):

			// table when refreshing leases. On larger clusters or with a large number
			// of descriptors its possible for normal transactions to be starved
			// (#95227)

Let's also add a paragraph about why this is safe (see my comment above).

Assuming that we're only reading from the version key and then writing to it, changing all of these upgrade transactions from normal priority to high priority should not alter how they interact with each other — two transactions at the same priority interact the same with each other, regardless of what that priority is.

So the only effects this will have are:

  1. the transaction will now abort any other transactions that write to the version key with normal priority (there aren't any, right?)
  2. the transaction will now block any other transaction that reads from the version key. That's kind of the point.
  3. Depending on whether the internal executor is hiding any other reads or writes, this may interact differently with other txns for those.

pkg/sql/set_cluster_setting.go line 660 at r1 (raw file):

				ctx, "update-setting", txn.KV(),
				sessiondata.RootUserSessionDataOverride,
				`UPSERT INTO system.settings (name, value, "lastUpdated", "valueType") VALUES ($1, $2, now(), $3)`,

Could this use of SQL ever result in a lease acquisition? Does that risk a deadlock, if the lease acquisition gets stuck on the version upgrade's write?


pkg/upgrade/upgrades/version_starvation_test.go line 32 at r1 (raw file):

// the cluster version is done with a high priority txn and cannot
// be pushed out. Previously, this would be normal priority and
// get pushed by the leasing code. Note: This test just confirms

"by the leasing code, leading to starvation when leases were acquired with sufficiently high frequency"

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @nvanbenschoten)


pkg/sql/set_cluster_setting.go line 660 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could this use of SQL ever result in a lease acquisition? Does that risk a deadlock, if the lease acquisition gets stuck on the version upgrade's write?

Oh good point that is a potential danger here, if a lease renewal is needed on the system.settings table. Specifically a synchronous renewal was required then we can deadlock.

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @nvanbenschoten)


pkg/sql/set_cluster_setting.go line 624 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's also add a paragraph about why this is safe (see my comment above).

Assuming that we're only reading from the version key and then writing to it, changing all of these upgrade transactions from normal priority to high priority should not alter how they interact with each other — two transactions at the same priority interact the same with each other, regardless of what that priority is.

So the only effects this will have are:

  1. the transaction will now abort any other transactions that write to the version key with normal priority (there aren't any, right?)
  2. the transaction will now block any other transaction that reads from the version key. That's kind of the point.
  3. Depending on whether the internal executor is hiding any other reads or writes, this may interact differently with other txns for those.

Clarified why this should be safe:
// Run the version bump inside the upgrade as high priority, since
// lease manager ends up reading the version row (with high priority)
// inside the settings table when refreshing leases. On larger clusters
// or with a large number of descriptors its possible for normal
// transactions to be starved (#95227).
// This is safe because we expected this transaction only do the following:
// 1) We expect this transaction to only read and write to the
// version key in the system.settings table.
// 2) Reads from the system.sql_instances table to confirm all SQL servers
// have been upgraded in multi-tenant environments.
// 3) Other transactions will use a normal priority and get pushed out by
// this one.
// 4) There is a potential danger to conflict with lease renewal of the
// settings table and the upgrade, but this cannot occur in practice,
// because the lease will always be acquired before we try to write to
// the settings table.

If we want to be paranoid we could disable using leased descriptors in the planner.


pkg/sql/set_cluster_setting.go line 660 at r1 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

Oh good point that is a potential danger here, if a lease renewal is needed on the system.settings table. Specifically a synchronous renewal was required then we can deadlock.

Turns out this is safe, with some experimentation I was able to confirm this is a non-issue, since we will always get the lease to the setting table before writing.

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I looked into (2) and the only other operation is a read to the sqlinstances table.

I'm going to run (1) locally to reproduce the scenario and confirm the fix works

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @nvanbenschoten)

@fqazi
Copy link
Collaborator Author

fqazi commented Nov 9, 2023

@nvanbenschoten Actually disregard my previous comment yeah there is a deadlock here, my test was flawed. Let me see if I can work around it by avoiding leasing during the settings update.

@fqazi fqazi force-pushed the upgradeHiPriVersion branch 5 times, most recently from 46b2a42 to 9ac22a2 Compare November 9, 2023 21:41
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r3, 3 of 3 files at r4, 5 of 5 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @fqazi)


-- commits line 29 at r5:
"simulate"?


pkg/sql/set_cluster_setting.go line 623 at r4 (raw file):

			// lease manager ends up reading the version row (with high priority)
			// inside the settings table when refreshing leases. On larger clusters
			// or with a large number of descriptors its possible for normal

nit: "it's"


pkg/sql/set_cluster_setting.go line 626 at r4 (raw file):

			// transactions to be starved  (#95227).
			// This is safe because we expected this transaction only do the following:
			// 1) We expect this transaction to only read and write to the

How does the use of WithSkipDescriptorCache change this behavior? Are there new keys that we read from and write to?


pkg/sql/set_cluster_setting.go line 630 at r4 (raw file):

			// 2) Reads from the system.sql_instances table to confirm all SQL servers
			//    have been upgraded in multi-tenant environments.
			// 3) Other transactions will use a normal priority and get pushed out by

Which other transactions is this referring to? Other transactions that write to the version key? Do such transactions exist?


pkg/sql/set_cluster_setting.go line 634 at r4 (raw file):

			// 4) There is a potential danger to conflict with lease renewal of the
			//    settings table and the upgrade, but to avoid this, we intentionally
			//		force the internal executor to avoid leased descriptors.

nit: indentation looks off here.


pkg/sql/set_cluster_setting.go line 686 at r4 (raw file):

			return err
		},
			isql.WithSkipDescriptorCache())

Could we give this a comment? Specifically, why does this help? If we never perform leasing then this transaction will never get stuck waiting on a leasing transaction. Does it have to perform other reads and writes to compensate? If so, how do those operations interact with other transactions?


pkg/sql/catalog/lease/kv_writer.go line 71 at r5 (raw file):

) (settingswatcher.VersionGuard, error) {
	targetVersion := clusterversion.TODO_Delete_V23_1_SystemRbrCleanup
	if w.knobs.ForceVersionGuardReads {

Would it be cleaner to push this knob one level deeper and consult it directly in SettingsWatcher.MakeVersionGuard? So we would have logic in that function that looks like: if activeVersion.IsActive(maxGate) || forceVersionGuardReads {

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @nvanbenschoten)


pkg/sql/set_cluster_setting.go line 626 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

How does the use of WithSkipDescriptorCache change this behavior? Are there new keys that we read from and write to?

// 1) We expect this transaction to only read and write to the
// version key in the system.settings table. It will also do high
// priority reads from the system.namespace and system.descriptor tables
// because the WithSkipDescriptorCache below (this will push out schema
// changes during the upgrade).


pkg/sql/set_cluster_setting.go line 630 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Which other transactions is this referring to? Other transactions that write to the version key? Do such transactions exist?

I was thinking of other transactions involving the system database getting pushed if they involve the system database descriptor:

// 3) Other transactions will use a normal priority and get pushed out by
// this one, if they involve schema changes on the system database
// descriptor (highly unlikely).


pkg/sql/set_cluster_setting.go line 686 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could we give this a comment? Specifically, why does this help? If we never perform leasing then this transaction will never get stuck waiting on a leasing transaction. Does it have to perform other reads and writes to compensate? If so, how do those operations interact with other transactions?

I clarified this above, but I'll add another comment stating why this is needed here:

// We will intentionally skip the leasing layer to update the
// system.settings table to avoid a deadlock with the leasing layer.
// The leasing layer will read the version key in system.settings with
// a high priority before any renewal. The update code above will need a
// lease on system.settings and potentially system.sqlinstances. If any of
// leases required expire, we could deadlock with the renewal being
// blocked on the version key. To get around this, we will force the
// transaction to directly make round trips to read descriptors instead
// of using the leasing cache.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r6, 9 of 9 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt and @fqazi)


pkg/sql/set_cluster_setting.go line 686 at r4 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

I clarified this above, but I'll add another comment stating why this is needed here:

// We will intentionally skip the leasing layer to update the
// system.settings table to avoid a deadlock with the leasing layer.
// The leasing layer will read the version key in system.settings with
// a high priority before any renewal. The update code above will need a
// lease on system.settings and potentially system.sqlinstances. If any of
// leases required expire, we could deadlock with the renewal being
// blocked on the version key. To get around this, we will force the
// transaction to directly make round trips to read descriptors instead
// of using the leasing cache.

Thanks for spelling this out. The high-priority read on the system.sqlinstances and system.descriptors tables makes me vaguely concerned that this may run into trouble with #104728. We'll want to make sure that this change is minimizing risk. Expanding the footprint of this transaction while making it high priority is working against this goal.

This makes me wonder how difficult it would be to have the upgrade transaction read and write the settings key directly, without going through SQL. The leasing transaction is already doing this with GetClusterVersionFromStorage. Would it be difficult to do the same here?


pkg/server/settingswatcher/version_guard_test.go line 145 at r7 (raw file):

			watcher := settingswatcher.New(nil, s.Codec(), settings, nil, nil, nil)
			require.NoError(t, kvDB.Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {
				guard, err := watcher.MakeVersionGuard(ctx, txn, maxVersion, false /*force versin guard reads*/)

nit: the comment here is off.

Copy link
Contributor

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just questions and some comment nits.

The change makes sense to me but I don't think I can vouch for it being side effect free so I'll leave an approval to Nathan.

Will the session based leasing work fix this issue more reliably? If so, can we add a note or link to either a commit message or comment?

@@ -50,6 +50,12 @@ func WithSessionData(sd *sessiondata.SessionData) Option {
return (*sessionDataOption)(sd)
}

// WithSkipDescriptorCache forces the internal executor to avoid all leasing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a short not about when or why this option might be used and any warnings that users should consider if they do opt to use this option?

Comment on lines 620 to 624
// Run the version bump inside the upgrade as high priority, since
// lease manager ends up reading the version row (with high priority)
// inside the settings table when refreshing leases. On larger clusters
// or with a large number of descriptors it's possible for normal
// transactions to be starved (#95227).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some reference numbers to "large" here? Doesn't have to be exact, just within the ballpark as large is quite ambiguous.

The wording here feels a bit difficult to digest, in particular, "normal transactions" made me think about non-system transactions but I think the intended meaning is non-high priority transactions.

What would you think about (With appropriate line wrapping and appropriate reference numbers):

Suggested change
// Run the version bump inside the upgrade as high priority, since
// lease manager ends up reading the version row (with high priority)
// inside the settings table when refreshing leases. On larger clusters
// or with a large number of descriptors it's possible for normal
// transactions to be starved (#95227).
// On clusters with a large number of descriptors (> 10k) or nodes (>1k), non-priority transactions reading the version row can be staved. This is due to the lease manager reading the version row at high priority (#95227).
// Run the version bump inside the upgrade as a high priority txn to avoid being bullied out by the lease manager.

// TestLeasingClusterVersionStarvation validates that setting
// the cluster version is done with a high priority txn and cannot
// be pushed out. Previously, this would be normal priority and
// get pushed by the leasing by the leasing code, leading to starvation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// get pushed by the leasing by the leasing code, leading to starvation
// get pushed by the leasing, leading to starvation

return
}
resumeBump <- struct{}{}
for retry := retry.Start(retry.Options{}); retry.Next(); {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which is the correct question to ask here: Why is this in a retry loop or why is there a Commit call after an infinite loop?

Are we expecting the select for update query to receive the txn retry error?

// inside the settings table when refreshing leases. On larger clusters
// or with a large number of descriptors it's possible for normal
// transactions to be starved (#95227).
// This is safe because we expected this transaction only do the following:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a me issue but I don't see how these items say using a high priority txn is safe? I also don't understand how it may be unsafe 😅

Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would eliminate one of the causes, unfortunately that migration will run into this, so it will temporarily invoke the same wrath 👀

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chrisseto, @dt, and @nvanbenschoten)


pkg/sql/set_cluster_setting.go line 686 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for spelling this out. The high-priority read on the system.sqlinstances and system.descriptors tables makes me vaguely concerned that this may run into trouble with #104728. We'll want to make sure that this change is minimizing risk. Expanding the footprint of this transaction while making it high priority is working against this goal.

This makes me wonder how difficult it would be to have the upgrade transaction read and write the settings key directly, without going through SQL. The leasing transaction is already doing this with GetClusterVersionFromStorage. Would it be difficult to do the same here?

We can directly do the version bump using KV calls. The system.sqlinstances comes from the upgrade manager hook and deeply nested so we can't easily remove that (that would only impact serverless).


pkg/sql/set_cluster_setting.go line 624 at r7 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Could we add some reference numbers to "large" here? Doesn't have to be exact, just within the ballpark as large is quite ambiguous.

The wording here feels a bit difficult to digest, in particular, "normal transactions" made me think about non-system transactions but I think the intended meaning is non-high priority transactions.

What would you think about (With appropriate line wrapping and appropriate reference numbers):

			// On clusters with a large number of descriptors (> 10k) or nodes (>1k), non-priority transactions reading the version row can be staved. This is due to the lease manager reading the version row at high priority (#95227).
			// Run the version bump inside the upgrade as a high priority txn to avoid being bullied out by the lease manager.

// On complex clusters with a large number of descriptors (> 500) and
// multi-region nodes (> 9), normal priority transactions reading/updating // the version row can be starved. This is due to the lease manager reading // the version row at high priority, when refreshing leases (#95227), with // a complex cluster this traffic will continuous.


pkg/sql/set_cluster_setting.go line 625 at r7 (raw file):

Previously, chrisseto (Chris Seto) wrote…

I think this is a me issue but I don't see how these items say using a high priority txn is safe? I also don't understand how it may be unsafe 😅

Clarified this with:

// This is safe from deadlocks / starvation because we expected this
// transaction only do the following:


pkg/sql/isql/options.go line 53 at r3 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Could you add a short not about when or why this option might be used and any warnings that users should consider if they do opt to use this option?

Updated to:

WithSkipDescriptorCache forces the internal executor to avoid all leasing infrastructure, which will force descriptors to be read from KV. This can incur network latencies when accessing descriptors.


pkg/upgrade/upgrades/version_starvation_test.go line 34 at r7 (raw file):

Previously, chrisseto (Chris Seto) wrote…
// get pushed by the leasing, leading to starvation

Done.


pkg/upgrade/upgrades/version_starvation_test.go line 100 at r7 (raw file):

Previously, chrisseto (Chris Seto) wrote…

I'm not sure which is the correct question to ask here: Why is this in a retry loop or why is there a Commit call after an infinite loop?

Are we expecting the select for update query to receive the txn retry error?

Ooh good catch, yes the commit isn't needed. We expect the select to get pushed and hit an error eventually.

@fqazi fqazi requested a review from chrisseto November 13, 2023 22:23
@daniel-crlabs
Copy link
Contributor

Do we know when this issue will be fixed? We have a high visibility customer who is severely impacted by this. No one seems to be assigned to this, so I'm cc'ing everyone who has commented on it:

@fqazi @nvanbenschoten @andreimatei @rupertharwood-crl

@fqazi fqazi force-pushed the upgradeHiPriVersion branch 2 times, most recently from 6f497bf to 7455a4b Compare November 15, 2023 20:47
Copy link
Contributor

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately that migration will run into this

Oof, that's rough.

:lgtm: though I'll leave the final sign off to @nvanbenschoten

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 15 of 15 files at r11, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @fqazi)


pkg/sql/set_cluster_setting.go line 625 at r11 (raw file):

			return err
		}
		return db.KV().Txn(ctx, func(ctx context.Context, txn *kv.Txn) error {

Could you add a comment about why this goes out of its way to use raw KV?


pkg/sql/set_cluster_setting.go line 633 at r11 (raw file):

			// Run the version bump inside the upgrade as a high priority txn to avoid
			// being starved out by the lease manager.
			// Run the version bump inside the upgrade as high priority, since

nit: some redundancy here. "Run the version bump inside the upgrade as" twice.


pkg/sql/set_cluster_setting.go line 682 at r11 (raw file):

			}
			// Encode the setting value to write out the updated version.
			var tuple []byte

This all looks correct to me, but the logic feels easy to get wrong. It also feels like it could drift from SQL over time. Do we feel like we have sufficient testing around this upgrade code to give us confidence that we're writing SQL-decodable bytes?


pkg/sql/set_cluster_setting.go line 682 at r11 (raw file):

			}
			// Encode the setting value to write out the updated version.
			var tuple []byte

On ^ note, consider pulling this into a separate function and unit testing it.


pkg/sql/set_cluster_setting.go line 707 at r11 (raw file):

			newValue.SetTuple(tuple)
			if row.Value != nil {
				if err := txn.CPut(ctx, row.Key, newValue, row.Value.TagAndDataBytes()); err != nil {

This is all in a serializable transaction, so is there a need for the CPut? I might be missing something.

@fqazi fqazi force-pushed the upgradeHiPriVersion branch 3 times, most recently from dfb2d0c to ccf31fb Compare November 16, 2023 23:12
Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt and @nvanbenschoten)


pkg/sql/set_cluster_setting.go line 682 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This all looks correct to me, but the logic feels easy to get wrong. It also feels like it could drift from SQL over time. Do we feel like we have sufficient testing around this upgrade code to give us confidence that we're writing SQL-decodable bytes?

Done

Added unit tests as well.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thank you for the hard work on pushing this along @fqazi. I think we can go ahead an merge this once the remaining comments are addressed.

I do wonder if the backport would benefit from another look from someone else who has worked in this code recently, just to provide additional confidence that we aren't missing anything. @ajstorm or @JeffSwenson would either of you two (or both!) like to provide a review?

Reviewed 8 of 8 files at r12, 4 of 4 files at r13, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @dt and @fqazi)


-- commits line 4 at r12:
"minimimal"


-- commits line 5 at r12:
"s/writoe/write to/"


pkg/server/settingswatcher/setting_encoder.go line 27 at r12 (raw file):

// EncodeSettingKey encodes a key for the system.settings table, which
// can be used for direct KV operations.
func EncodeSettingKey(codec keys.SQLCodec, setting string) []byte {

Should we call this from GetClusterVersionFromStorage?


pkg/server/settingswatcher/setting_encoder.go line 29 at r12 (raw file):

func EncodeSettingKey(codec keys.SQLCodec, setting string) []byte {
	indexPrefix := codec.IndexPrefix(keys.SettingsTableID, uint32(1))
	return encoding.EncodeUvarintAscending(encoding.EncodeStringAscending(indexPrefix, "version"), uint64(0))

Was this intending to pass setting, instead of "version"?


pkg/sql/set_cluster_setting.go line 682 at r11 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

Done

Added unit tests as well.

This turned out nicely.


pkg/sql/set_cluster_setting.go line 636 at r13 (raw file):

			// 1) We expect this transaction to only read and write to the
			//    version key in the system.settings table. To achieve the smallest
			//    possible txn and extra to other tables we are going to

"and extra to"

Something sounds off here.


pkg/server/settingswatcher/setting_encoder_test.go line 31 at r12 (raw file):

)

func TestSettingsEncoder(t *testing.T) {

Nice test!

To be able to have minimal high priority txns for bumping cluster
version, we need the ability to write the setting table using KV calls.
This patch adds, the functions EncodeSettingKey and EncodeSettingValue
to allow directly generate values for Put/Get calls.

Release note: None
Previously, it was possible for the leasing subsystem to starve out
attempts to set the cluster version during upgrades, since the leasing
subsystem uses high priority txn for renewals. To address this, this
patch makes the logic to set the cluster version high priority so
it can't be pushed out by lease renewals.

Fixes: cockroachdb#113908

Release note (bug fix): Addressed a bug that could cause cluster version
finalization to get starved out by descriptor lease renewals on larger
clusters.
Copy link
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @dt and @nvanbenschoten)


pkg/server/settingswatcher/setting_encoder.go line 27 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we call this from GetClusterVersionFromStorage?

I'm tempted to leave these as general incase we run into similar use cases for other settings.


pkg/server/settingswatcher/setting_encoder.go line 29 at r12 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Was this intending to pass setting, instead of "version"?

Yes, I was trying to generalize this code.


pkg/sql/set_cluster_setting.go line 636 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"and extra to"

Something sounds off here.

Ugh..I meant to write:

To achieve the smallest
// possible txn and avoid extra operations on other keys, we are going to
// use KV call with EncodeSettingKey/EncodeSettingValue functions
// instead of using the internal executor.

@fqazi
Copy link
Collaborator Author

fqazi commented Nov 17, 2023

@nvanbenschoten @chrisseto TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 17, 2023

Build succeeded:

@craig craig bot merged commit 8d026f9 into cockroachdb:master Nov 17, 2023
8 checks passed
Copy link

blathers-crl bot commented Nov 17, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 07f3628 to blathers/backport-release-23.1-113996: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from 07f3628 to blathers/backport-release-23.2-113996: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 backport-23.2.x Flags PRs that need to be backported to 23.2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

upgrades: Setting the cluster version can get stuck behind leasing
5 participants