Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: Finalize cluster version upgrade automatically. #24987

Merged
merged 1 commit into from May 14, 2018

Conversation

windchan7
Copy link
Contributor

Added a feature to auto upgrade the cluster setting version after a rolling
upgrade. Now operators no longer have to manually type the upgrade command.

Fixes #23912.

Release note: None

@windchan7 windchan7 requested a review from a team as a code owner April 23, 2018 14:41
@windchan7 windchan7 requested review from a team April 23, 2018 14:41
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@benesch
Copy link
Contributor

benesch commented Apr 24, 2018

Looking really good! Everything below is mostly code style/plumbing preferences.

I've got some more feedback that will be more efficient to go through verbally. Let's find some time tomorrow to sit down!


Reviewed 11 of 11 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved.


pkg/cli/start.go, line 604 at r1 (raw file):

				fmt.Print(msg)
			}
			s.AttemptUpgrade(ctx)

See below. I think there's a better place for this that will make your life easier.


pkg/cmd/roachtest/upgrade.go, line 234 at r1 (raw file):

		})
	}
}

This is a great test! Need to review in more detail tomorrow though.


pkg/server/server.go, line 1629 at r1 (raw file):

// AttemptUpgrade attempts to upgrade cluster version.
func (s *Server) AttemptUpgrade(ctx context.Context) {
	go func() {

For Reasons™ you should use stopper.RunWorker instead of directly launching a goroutine. Something about being able to cleanly shut things down.


pkg/server/server.go, line 1632 at r1 (raw file):

		for {
			sqlExecutor := &sql.InternalExecutor{ExecCfg: s.execCfg}
			querySystemSettings(ctx, sqlExecutor)

The fact that you need to poll to wait for system.settings to become available suggests that your'e starting this daemon too early! Let's sit down tomorrow and figure out a better place for it. (After the call to EnsureMigrations in pkg/server/server.go is a good bet.)


pkg/server/server.go, line 1637 at r1 (raw file):

			if canUpgrade, err := checkCanUpgrade(ctx, s); err != nil {
				log.Warningf(ctx, "error when upgrading cluster version: %s", err)
				return

We should retry in this case. I recommend using util/retry so you get exponential backoff. I'll throw out an initial backoff of 1s and a max backoff of 30s with unlimited retries as a reasonable starting point.


pkg/server/server.go, line 1655 at r1 (raw file):

				log.Warningf(ctx, "error when upgrading cluster version: %s", err)
				return
			}

This should happen inside a transaction so that we can't end up in a state where version has been bumped but preserve_downgrade_option has not been cleared. You can also get the new value all in one shot:

BEGIN;
SET CLUSTER SETTING version = crdb_internal.node_executable_version();
RESET CLUSTER SETTING cluster.preserve_downgrade_option;
SHOW CLUSTER SETTING version;
COMMIT;

I think you'll want to use s.sqlExecutor.ExecuteStatementsBuffered instead of an internal executor for this to work properly.


pkg/server/server.go, line 1681 at r1 (raw file):

// all live nodes are running the new version,
// all non-decommissioned nodes are alive.
func checkCanUpgrade(ctx context.Context, s *Server) (bool, error) {

The signature of this function is rather confusing. I think you should just return an error. If the error is present, you should abort and retry. If it's nil, you should give up.


pkg/server/server.go, line 1741 at r1 (raw file):

// Get current cluster version.
func clusterVersion(ctx context.Context, sqlExecutor *sql.InternalExecutor) (string, error) {
	clusterVersionStmt := "SHOW cluster setting version;"

const clusterVersionStmt = SHOW CLUSTER SETTING version.


pkg/server/server.go, line 1759 at r1 (raw file):

	datums, _, err := sqlExecutor.QueryRows(ctx, "gossip-nodes", gossipStmt)
	return datums, err
}

Extracting this into a function isn't buying you much! I'd inline it above.


pkg/server/server.go, line 1773 at r1 (raw file):

	}
	return true, nil
}

Ditto here.


pkg/server/server.go, line 1804 at r1 (raw file):

		}
	}
	return nodes, nil

Isn't liveness.Statues exactly what you want? A map from node ID to liveness status, that is.


pkg/settings/version.go, line 92 at r1 (raw file):

	register(key, desc, setting)
	return setting
}

It doesn't seem like this version setting is buying you much besides a lot of boilerplate. Am I missing something? It seems like you could just use a ValidatedStringSetting and save yourself some work!

Oh, I see. The callback for string validation doesn't take the right type. I'd prefer if you instead adjust the signature of all validate functions to include the *settings.Values object that you need. There's only a few validated settings. It might also be cleaner to just use a StateMachineSetting. Let's see what @tschottdorf thinks, though!


pkg/sql/executor.go, line 75 at r1 (raw file):

	st := opaque.(*cluster.Settings)
	serverVersion := st.Version.ServerVersion
	clusterVersion := st.Version.Version().MinimumVersion

Ugh, this opaque stuff is kind of nasty. Can you move this function into the settings/cluster package? I'd put it next to the versionTransformer, which has to do the same opaque dance, so that it's all in the same place.

Actually, I think this is the wrong place for this setting. Can you move the whole thing into settings/cluster?


Comments from Reviewable

@windchan7
Copy link
Contributor Author

windchan7 commented Apr 25, 2018

@benesch
I fixed all things except for the liveness thing. Please see the specific comments below.

In addition, I found that acceptance tests just hang indefinitely because TestDockerCLI hangs, which is causing a bunch of acceptance test to fail.

Here's a couple tests need to be updated/removed:

TestClusterVersionUpgrade1_0To2_0 no longer works since auto upgrade is introduced. I managed to get it passed by adjusting a bunch of things but I don't think we should keep it given we have a new roachtest for it.

TestDockerReadWriteBidirectionalReferenceVersion no longer works because we cannot restart a 2.0-X node into 2.0.0. This is clearly forbidden by SynthesizeClusterVersionFromEngines in stores.go. I don't think we should keep this test either.

version.go no longer works because we won't be able to do a rolling downgrade once we did a rolling upgrade. But I intend to keep and modify it: Say we have n nodes, we do a rolling upgrade for the first n-1 nodes and stop the last node. Then we set cluster.preserve_downgrade_option to be the old cluster version. Then we start the last node with the new binary and completes the rest of the logic in version.go.

Review status: 2 of 13 files reviewed at latest revision, 13 unresolved discussions.


pkg/server/server.go, line 1629 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

For Reasons™ you should use stopper.RunWorker instead of directly launching a goroutine. Something about being able to cleanly shut things down.

Done.


pkg/server/server.go, line 1632 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

The fact that you need to poll to wait for system.settings to become available suggests that your'e starting this daemon too early! Let's sit down tomorrow and figure out a better place for it. (After the call to EnsureMigrations in pkg/server/server.go is a good bet.)

Done.


pkg/server/server.go, line 1637 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

We should retry in this case. I recommend using util/retry so you get exponential backoff. I'll throw out an initial backoff of 1s and a max backoff of 30s with unlimited retries as a reasonable starting point.

Done.


pkg/server/server.go, line 1655 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

This should happen inside a transaction so that we can't end up in a state where version has been bumped but preserve_downgrade_option has not been cleared. You can also get the new value all in one shot:

BEGIN;
SET CLUSTER SETTING version = crdb_internal.node_executable_version();
RESET CLUSTER SETTING cluster.preserve_downgrade_option;
SHOW CLUSTER SETTING version;
COMMIT;

I think you'll want to use s.sqlExecutor.ExecuteStatementsBuffered instead of an internal executor for this to work properly.

Done.


pkg/server/server.go, line 1681 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

The signature of this function is rather confusing. I think you should just return an error. If the error is present, you should abort and retry. If it's nil, you should give up.

Done.


pkg/server/server.go, line 1741 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

const clusterVersionStmt = SHOW CLUSTER SETTING version.

Done.


pkg/server/server.go, line 1759 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

Extracting this into a function isn't buying you much! I'd inline it above.

Done.


pkg/server/server.go, line 1773 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

Ditto here.

Done.


pkg/server/server.go, line 1804 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

Isn't liveness.Statues exactly what you want? A map from node ID to liveness status, that is.

I think it makes more sense to pull from admin server's liveness end point. I checked the implementation of
s.admin.Liveness. Essentially it's calling a function in nodeliveness. However, it does some preprocessing of
clock time and a bunch of other data preprocessing. If we don't call it, we have to reimplement all of that, which is a bit redundant.


pkg/cli/start.go, line 604 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

See below. I think there's a better place for this that will make your life easier.

Done.


pkg/settings/version.go, line 92 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

It doesn't seem like this version setting is buying you much besides a lot of boilerplate. Am I missing something? It seems like you could just use a ValidatedStringSetting and save yourself some work!

Oh, I see. The callback for string validation doesn't take the right type. I'd prefer if you instead adjust the signature of all validate functions to include the *settings.Values object that you need. There's only a few validated settings. It might also be cleaner to just use a StateMachineSetting. Let's see what @tschottdorf thinks, though!

I adjusted the signature of all validation functions. Done.


pkg/sql/executor.go, line 75 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

Ugh, this opaque stuff is kind of nasty. Can you move this function into the settings/cluster package? I'd put it next to the versionTransformer, which has to do the same opaque dance, so that it's all in the same place.

Actually, I think this is the wrong place for this setting. Can you move the whole thing into settings/cluster?

Done.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 13, 2018

Review status: all files reviewed at latest revision, 23 unresolved discussions, some commit checks failed.


pkg/server/version_cluster_test.go, line 137 at r5 (raw file):

Previously, windchan7 (Victor Chen) wrote…

Also, I'm kinda curious about what's the Minimum Version / UseVersion of a binary with server version 2.0-X. Is it something that user can change?

Yes, that seems like it would be what's happening. I think the most reasonable way to deal with this is to introduce a TestingKnob that gets passed in here:

Knobs: base.TestingKnobs{

The knob would be an atomically updated bool (int32 via atomic.LoadInt32) and when it's set, the auto update loop treats it as a mixed-version cluster.

The good thing about that is that this allows you to make a subtest that verifies the manual upgrade path as well.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 13, 2018

Review status: all files reviewed at latest revision, 23 unresolved discussions, some commit checks failed.


pkg/server/version_cluster_test.go, line 137 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Yes, that seems like it would be what's happening. I think the most reasonable way to deal with this is to introduce a TestingKnob that gets passed in here:

Knobs: base.TestingKnobs{

The knob would be an atomically updated bool (int32 via atomic.LoadInt32) and when it's set, the auto update loop treats it as a mixed-version cluster.

The good thing about that is that this allows you to make a subtest that verifies the manual upgrade path as well.

Oh, and re: UseVersion: in the RFC, there was originally a distinction between the two that was never implemented. The idea was that you could bump MinVersion but then later decrease UseVersion. This would disable features that could be turned off at that point, but would keep features that couldn't be turned off on. I wouldn't worry about it. Suffice to say that the two always move in lockstep today, and have the semantics of MinVersion.


Comments from Reviewable

@windchan7 windchan7 force-pushed the auto branch 2 times, most recently from 6a97619 to 2e079a4 Compare May 13, 2018 21:56
@windchan7
Copy link
Contributor Author

Review status: 15 of 21 files reviewed at latest revision, 23 unresolved discussions.


pkg/cmd/roachtest/upgrade.go, line 203 at r4 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

I'd prefer a retry loop, but your call.

I'll keep it for now.


pkg/server/server.go, line 1635 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Could you move the new code here into a new file server_update.go?

Done.


pkg/server/server.go, line 1636 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

rename to startAttemptUpgrade (when a goroutine runs "indefinitely", we usually make that clear in the name, see startWriteNodeStatus).

Done.


pkg/server/server.go, line 1645 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

What happens if the node starts draining? Will this get in its way? What if it starts draining and then stops draining, is the session still ok? Perhaps open a session every time you need one.

Done.


pkg/server/server_update.go, line 1 at r6 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

2018 🎆

Done.


pkg/server/server_update.go, line 21 at r6 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

nit: move this down into the "cockroach" group of imports below.

Done.


pkg/server/server_update.go, line 146 at r6 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

nit: this wants a dot.

Done.


pkg/server/server_update.go, line 156 at r6 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

// clusterVersion returns the current cluster version from the SQL subsystem (which returns the version from the KV store as opposed to the possibly lagging settings subsystem).

Done.


pkg/server/version_cluster_test.go, line 142 at r2 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

I commented on this on the other PR that removes this test. I think we should keep it, but strip it down (in particular remove all the tombstone migration test, as this is now obsolete), and also make sure the versions used here don't rot in the future.

Done.


pkg/server/version_cluster_test.go, line 146 at r4 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Nice to see all of this go.

Done.


pkg/server/version_cluster_test.go, line 133 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

My suggestion was to use newVersion = cluster.BinaryServerVersion (which is always the highest version known to the binary) and oldVersion := prev(newVersion) where prev(2.1) == 2.0, prev(2.1-5) == 2.1, prev(2.0) = 1.0 etc. That way you never have to update the test again, and it will always enable the "new" migrations during the fake upgrade.

Done.


pkg/server/version_cluster_test.go, line 137 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Oh, and re: UseVersion: in the RFC, there was originally a distinction between the two that was never implemented. The idea was that you could bump MinVersion but then later decrease UseVersion. This would disable features that could be turned off at that point, but would keep features that couldn't be turned off on. I wouldn't worry about it. Suffice to say that the two always move in lockstep today, and have the semantics of MinVersion.

Done.


pkg/server/version_cluster_test.go, line 140 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

"1.0" will be oldVersion, "2.0" will be newVersion.

Done.


pkg/server/version_cluster_test.go, line 144 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This too.

Done.


pkg/server/version_cluster_test.go, line 150 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Hmm, I just thought this through and it'll be awkward, right? When you create the cluster, it's immediately up for auto-upgrading, and so you have a race here. Maybe just add a comment to the test that the non-auto upgrade path is covered by a roachtest (and mention which one).

I added some test logic here that is kinda similar to the roachtest.


pkg/server/version_cluster_test.go, line 126 at r6 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Also rename this test to something like TestVersoinUpgrade please.

Done.


pkg/settings/cluster/settings.go, line 91 at r4 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

The comment that is shown to the user. Also always be specific about the version, there are two versions involved in an upgrade.

"disable (automatic or manual) cluster version upgrades from the specified version until reset"

Done.


pkg/settings/cluster/settings.go, line 371 at r4 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Makes sense, though I really should've run into this explanation in the parts of the code that I looked at. A comment here would go a long way, and the cluster var comment should also mention this behavior (the comment I suggested there is wrong in light of this information(). Something like

Disable auto-upgrading to the specific version. Also prevents manual upgrades until cleared.

Done.


pkg/sql/logictest/testdata/logic_test/cluster_version, line 6 at r4 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

?

It is racy but I moved it down a bit to make it less racy.


pkg/sql/logictest/testdata/logic_test/cluster_version, line 46 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Don't just change the outcome, make sure that we retain the original spirit of the test.

I think all behaviors are expected.


Comments from Reviewable

@bdarnell
Copy link
Member

Reviewed 7 of 19 files at r2, 5 of 8 files at r3, 1 of 5 files at r4, 1 of 4 files at r5, 1 of 3 files at r6, 6 of 6 files at r7.
Review status: all files reviewed at latest revision, 23 unresolved discussions.


pkg/server/server_update.go, line 45 at r7 (raw file):

// startAttemptUpgrade attempts to upgrade cluster version.
func (s *Server) startAttemptUpgrade(ctx context.Context) {
	s.stopper.RunWorker(s.stopper.WithCancel(ctx), func(ctx context.Context) {

This should be RunAsyncTask, not RunWorker (we're not perfectly consistent about this, but the idea is that workers run for the lifetime of the server, while tasks are finite).


pkg/server/server_update.go, line 92 at r7 (raw file):

					`BEGIN;
					 SET CLUSTER SETTING version = crdb_internal.node_executable_version();
					 RESET CLUSTER SETTING cluster.preserve_downgrade_option;

Why is this reset needed? If we successfully set the version, doesn't that mean that preserve_downgrade_option is not set?


Comments from Reviewable

@windchan7
Copy link
Contributor Author

Review status: 20 of 21 files reviewed at latest revision, 25 unresolved discussions.


pkg/server/server_update.go, line 45 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This should be RunAsyncTask, not RunWorker (we're not perfectly consistent about this, but the idea is that workers run for the lifetime of the server, while tasks are finite).

Done.


pkg/server/server_update.go, line 92 at r7 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why is this reset needed? If we successfully set the version, doesn't that mean that preserve_downgrade_option is not set?

Oh, thanks for the catch. I missed it because the initial RFC does not have the restriction. Done.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 14, 2018

Almost there 💪!


Reviewed 5 of 6 files at r7, 1 of 1 files at r8.
Review status: all files reviewed at latest revision, 11 unresolved discussions.


pkg/server/server_update.go, line 156 at r6 (raw file):

Previously, windchan7 (Victor Chen) wrote…

Done.

nit: needs a dot.


pkg/server/server_update.go, line 37 at r8 (raw file):

// version upgrade should happen automatically or not.
type UpgradeTestingKnobs struct {
	DisableUpgrade *int32

// accessed atomically

Also this would be an int32, not *int32.


pkg/server/server_update.go, line 60 at r8 (raw file):

			// Check if auto upgrade is disabled for test purposes.
			if k := s.cfg.TestingKnobs.Upgrade; k != nil {

This simplifies because there's no nil check.


pkg/server/version_cluster_test.go, line 100 at r6 (raw file):

					Knobs: base.TestingKnobs{
						Store: &storage.StoreTestingKnobs{
							BootstrapVersion: &bootstrapVersion,

This isn't needed?


pkg/server/version_cluster_test.go, line 159 at r8 (raw file):

	// The first node will have cluster version (aka. minimum version) set to be
	// 1.0 and UseVersion set to be 2.0 so that we can test auto upgrade. The

You still need to remove UseVersion from the comment. While you're there, also fix 1.0 and 2.0 and the version it mentions. Basically just rewrite the whole thing ;)


pkg/server/version_cluster_test.go, line 175 at r8 (raw file):

			BootstrapVersion: &bootstrapVersion,
		},
		Upgrade: &server.UpgradeTestingKnobs{

This will just be &server.UpgradeTestingKnobs{DisableUpgrade: 1}


pkg/server/version_cluster_test.go, line 187 at r8 (raw file):

	}
	atomic.StoreInt32(&upgrade, 0)
	time.Sleep(10 * time.Second)

No sleeping in unit tests. Just remove this (you would miss a failure here, but that's ok -- it would still be flaky and we're covering this functionality in the roachtests).


pkg/server/version_cluster_test.go, line 189 at r8 (raw file):

	time.Sleep(10 * time.Second)

	// Check the cluster version is stil oldVersion.

still


pkg/server/version_cluster_test.go, line 202 at r8 (raw file):

	// Check the cluster version is bumped to newVersion.
	for curVersion != newVersion.String() {
		time.Sleep(time.Second)

Never sleep in unit tests. Use a SucceedsSoon and then just set curVersion after the SucceedsSoon.


pkg/settings/cluster/settings.go, line 89 at r8 (raw file):

// Register cluster.preserve_downgrade_option in the cluster settings for auto upgrade.
// Setting it will diable auto-upgrading for the specified cluster version. It also

"diable", and also consider that this comment is still fuzzy. I think you can just remove this comment, now that the help for the cluster setting is concise and clear.


pkg/sql/logictest/testdata/logic_test/cluster_version, line 6 at r4 (raw file):

Previously, windchan7 (Victor Chen) wrote…

It is racy but I moved it down a bit to make it less racy.

That won't be good enough. I think you should change the default-v1.1@v1.0 configuration so that it disables auto upgrade (you just introduced the corresponding knob). See

{name: "default-v1.1@v1.0", numNodes: 1, overrideDistSQLMode: "Off",
and
ServerArgs: base.TestServerArgs{
. The test should then remain unchanged in this PR.
Perhaps rename the config to default-v1.1@v1.0-noupgrade.


pkg/sql/logictest/testdata/logic_test/cluster_version, line 46 at r5 (raw file):

Previously, windchan7 (Victor Chen) wrote…

I think all behaviors are expected.

But you're testing less behaviors than before because you're moving off the manual upgrade path.


Comments from Reviewable

@windchan7
Copy link
Contributor Author

Review status: 17 of 22 files reviewed at latest revision, 20 unresolved discussions, some commit checks failed.


pkg/server/server_update.go, line 156 at r6 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

nit: needs a dot.

Done.


pkg/server/server_update.go, line 37 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

// accessed atomically

Also this would be an int32, not *int32.

Done.


pkg/server/server_update.go, line 60 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This simplifies because there's no nil check.

Done.


pkg/server/version_cluster_test.go, line 143 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This will be oldVersion

Done.


pkg/server/version_cluster_test.go, line 100 at r6 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This isn't needed?

We pass in TestingKnobs as an argument, which has the store testingKnob information.


pkg/server/version_cluster_test.go, line 159 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

You still need to remove UseVersion from the comment. While you're there, also fix 1.0 and 2.0 and the version it mentions. Basically just rewrite the whole thing ;)

Done.


pkg/server/version_cluster_test.go, line 175 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This will just be &server.UpgradeTestingKnobs{DisableUpgrade: 1}

Done.


pkg/server/version_cluster_test.go, line 187 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

No sleeping in unit tests. Just remove this (you would miss a failure here, but that's ok -- it would still be flaky and we're covering this functionality in the roachtests).

Done.


pkg/server/version_cluster_test.go, line 189 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

still

Done.


pkg/server/version_cluster_test.go, line 202 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Never sleep in unit tests. Use a SucceedsSoon and then just set curVersion after the SucceedsSoon.

Done.


pkg/settings/cluster/settings.go, line 89 at r8 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

"diable", and also consider that this comment is still fuzzy. I think you can just remove this comment, now that the help for the cluster setting is concise and clear.

Done.


pkg/sql/logictest/testdata/logic_test/cluster_version, line 6 at r4 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

That won't be good enough. I think you should change the default-v1.1@v1.0 configuration so that it disables auto upgrade (you just introduced the corresponding knob). See

{name: "default-v1.1@v1.0", numNodes: 1, overrideDistSQLMode: "Off",
and
ServerArgs: base.TestServerArgs{
. The test should then remain unchanged in this PR.
Perhaps rename the config to default-v1.1@v1.0-noupgrade.

Done.


pkg/sql/logictest/testdata/logic_test/cluster_version, line 46 at r5 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

But you're testing less behaviors than before because you're moving off the manual upgrade path.

Done.


Comments from Reviewable

defer session.Finish(s.sqlExecutor)

// Check if auto upgrade is disabled for test purposes.
upgradeTestingKnobs := s.cfg.TestingKnobs.Upgrade.(*UpgradeTestingKnobs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentry caught a crash on this type conversion. It's dangerous to cast without checking for success (or even nil):

stopper.go:176: *runtime.TypeAssertionError: interface conversion: base.ModuleTestingKnobs is nil, not *server.UpgradeTestingKnobs

https://sentry.io/cockroach-labs/cockroachdb/issues/555937530/?referrer=slack#

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back with a nil check. I don't understand what Tobi means by saying: "This simplifies because there's no nil check."

@windchan7
Copy link
Contributor Author

pkg/server/server_update.go, line 60 at r8 (raw file):

Previously, windchan7 (Victor Chen) wrote…

Done.

Can you clarify what you mean by that? Removing the nil check will cause errors as some tests does not set UpgradeTestingKnobs.


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 14, 2018

:lgtm_strong:


Review status: 17 of 22 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/server/server_update.go, line 60 at r9 (raw file):

Previously, windchan7 (Victor Chen) wrote…

Changed back with a nil check. I don't understand what Tobi means by saying: "This simplifies because there's no nil check."

Discussed offline, I was just wrong.


pkg/sql/logictest/testdata/logic_test/cluster_version, line 73 at r10 (raw file):


statement error cannot upgrade to 1.1-999: node running 1.1
SET CLUSTER SETTING version = '1.1-999'

Not sure what happened here. If (and only if) you eat another CI run anyway, could you restore the original line ending?


Comments from Reviewable

@tbg
Copy link
Member

tbg commented May 14, 2018

Reviewed 2 of 5 files at r9, 3 of 3 files at r10.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


Comments from Reviewable

Added a feature to auto upgrade the cluster setting version after a rolling
upgrade. Now operators no longer have to manually type the upgrade command.

Fixes cockroachdb#23912.

Release note: None
craig bot pushed a commit that referenced this pull request May 14, 2018
25184: cluster: remove legacy version upgrade tests for auto upgrade merge r=windchan7 a=windchan7

Legacy tests such as `TestClusterVersionUpgrade1_0To2_0`,
`TestDockerReadWriteBidirectionalReferenceVersion`, and
`TestDockerReadWriteForwardReferenceVersion` no longer applies.

Therefore, we remove and they will be replaced with new roachtests
when the auto upgrade PR #24987 merges.

Release note: None

25473: opt/optbuilder: Fix bug with order by, aggregate alias, and having. r=rytaft a=rytaft

Previously, queries such as:
`SELECT SUM(a) AS a2 FROM abcd GROUP BY c HAVING SUM(a)=10
ORDER BY a2`

were incorrectly causing an error:
`error: column name "a2" not found`

This commit fixes the error by ensuring that a column alias for an
aggregate function is applied even when the same aggregate appears
multiple times in the same query (e.g., in the HAVING clause).

Release note: None

Co-authored-by: Victor Chen <victor@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@windchan7
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request May 14, 2018
24987: cluster: Finalize cluster version upgrade automatically. r=windchan7 a=windchan7

Added a feature to auto upgrade the cluster setting version after a rolling
upgrade. Now operators no longer have to manually type the upgrade command.

Fixes #23912.

Release note: None

Co-authored-by: Victor Chen <victor@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented May 14, 2018

Build succeeded

@craig craig bot merged commit fcc4eb4 into cockroachdb:master May 14, 2018
@tbg
Copy link
Member

tbg commented May 14, 2018 via email

@benesch
Copy link
Contributor

benesch commented May 16, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants