New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster: Finalize cluster version upgrade automatically. #24987
Conversation
b0520f0
to
423d75a
Compare
Looking really good! Everything below is mostly code style/plumbing preferences. I've got some more feedback that will be more efficient to go through verbally. Let's find some time tomorrow to sit down! Reviewed 11 of 11 files at r1. pkg/cli/start.go, line 604 at r1 (raw file):
See below. I think there's a better place for this that will make your life easier. pkg/cmd/roachtest/upgrade.go, line 234 at r1 (raw file):
This is a great test! Need to review in more detail tomorrow though. pkg/server/server.go, line 1629 at r1 (raw file):
For Reasons™ you should use pkg/server/server.go, line 1632 at r1 (raw file):
The fact that you need to poll to wait for pkg/server/server.go, line 1637 at r1 (raw file):
We should retry in this case. I recommend using util/retry so you get exponential backoff. I'll throw out an initial backoff of 1s and a max backoff of 30s with unlimited retries as a reasonable starting point. pkg/server/server.go, line 1655 at r1 (raw file):
This should happen inside a transaction so that we can't end up in a state where BEGIN;
SET CLUSTER SETTING version = crdb_internal.node_executable_version();
RESET CLUSTER SETTING cluster.preserve_downgrade_option;
SHOW CLUSTER SETTING version;
COMMIT; I think you'll want to use pkg/server/server.go, line 1681 at r1 (raw file):
The signature of this function is rather confusing. I think you should just return an error. If the error is present, you should abort and retry. If it's nil, you should give up. pkg/server/server.go, line 1741 at r1 (raw file):
const clusterVersionStmt = pkg/server/server.go, line 1759 at r1 (raw file):
Extracting this into a function isn't buying you much! I'd inline it above. pkg/server/server.go, line 1773 at r1 (raw file):
Ditto here. pkg/server/server.go, line 1804 at r1 (raw file):
Isn't pkg/settings/version.go, line 92 at r1 (raw file):
Oh, I see. The callback for string validation doesn't take the right type. I'd prefer if you instead adjust the signature of all validate functions to include the *settings.Values object that you need. There's only a few validated settings. It might also be cleaner to just use a StateMachineSetting. Let's see what @tschottdorf thinks, though! pkg/sql/executor.go, line 75 at r1 (raw file):
Ugh, this opaque stuff is kind of nasty. Can you move this function into the settings/cluster package? I'd put it next to the Actually, I think this is the wrong place for this setting. Can you move the whole thing into settings/cluster? Comments from Reviewable |
d85d5a4
to
e7d19c2
Compare
a82e133
to
f00321e
Compare
@benesch In addition, I found that acceptance tests just hang indefinitely because Here's a couple tests need to be updated/removed:
Review status: 2 of 13 files reviewed at latest revision, 13 unresolved discussions. pkg/server/server.go, line 1629 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/server/server.go, line 1632 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/server/server.go, line 1637 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/server/server.go, line 1655 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/server/server.go, line 1681 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/server/server.go, line 1741 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/server/server.go, line 1759 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/server/server.go, line 1773 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/server/server.go, line 1804 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
I think it makes more sense to pull from admin server's liveness end point. I checked the implementation of pkg/cli/start.go, line 604 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. pkg/settings/version.go, line 92 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
I adjusted the signature of all validation functions. Done. pkg/sql/executor.go, line 75 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Done. Comments from Reviewable |
839c59a
to
616225a
Compare
Review status: all files reviewed at latest revision, 23 unresolved discussions, some commit checks failed. pkg/server/version_cluster_test.go, line 137 at r5 (raw file): Previously, windchan7 (Victor Chen) wrote…
Yes, that seems like it would be what's happening. I think the most reasonable way to deal with this is to introduce a TestingKnob that gets passed in here: cockroach/pkg/server/version_cluster_test.go Line 101 in f6e5ab6
The knob would be an atomically updated bool ( The good thing about that is that this allows you to make a subtest that verifies the manual upgrade path as well. Comments from Reviewable |
Review status: all files reviewed at latest revision, 23 unresolved discussions, some commit checks failed. pkg/server/version_cluster_test.go, line 137 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Oh, and re: UseVersion: in the RFC, there was originally a distinction between the two that was never implemented. The idea was that you could bump Comments from Reviewable |
6a97619
to
2e079a4
Compare
Review status: 15 of 21 files reviewed at latest revision, 23 unresolved discussions. pkg/cmd/roachtest/upgrade.go, line 203 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I'll keep it for now. pkg/server/server.go, line 1635 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/server.go, line 1636 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/server.go, line 1645 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/server_update.go, line 1 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/server_update.go, line 21 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/server_update.go, line 146 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/server_update.go, line 156 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 142 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 146 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 133 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 137 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 140 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 144 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 150 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I added some test logic here that is kinda similar to the roachtest. pkg/server/version_cluster_test.go, line 126 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/settings/cluster/settings.go, line 91 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/settings/cluster/settings.go, line 371 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/sql/logictest/testdata/logic_test/cluster_version, line 6 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
It is racy but I moved it down a bit to make it less racy. pkg/sql/logictest/testdata/logic_test/cluster_version, line 46 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I think all behaviors are expected. Comments from Reviewable |
Reviewed 7 of 19 files at r2, 5 of 8 files at r3, 1 of 5 files at r4, 1 of 4 files at r5, 1 of 3 files at r6, 6 of 6 files at r7. pkg/server/server_update.go, line 45 at r7 (raw file):
This should be RunAsyncTask, not RunWorker (we're not perfectly consistent about this, but the idea is that workers run for the lifetime of the server, while tasks are finite). pkg/server/server_update.go, line 92 at r7 (raw file):
Why is this reset needed? If we successfully set the version, doesn't that mean that preserve_downgrade_option is not set? Comments from Reviewable |
Review status: 20 of 21 files reviewed at latest revision, 25 unresolved discussions. pkg/server/server_update.go, line 45 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/server/server_update.go, line 92 at r7 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Oh, thanks for the catch. I missed it because the initial RFC does not have the restriction. Done. Comments from Reviewable |
Almost there 💪! Reviewed 5 of 6 files at r7, 1 of 1 files at r8. pkg/server/server_update.go, line 156 at r6 (raw file): Previously, windchan7 (Victor Chen) wrote…
nit: needs a dot. pkg/server/server_update.go, line 37 at r8 (raw file):
// accessed atomically Also this would be an pkg/server/server_update.go, line 60 at r8 (raw file):
This simplifies because there's no nil check. pkg/server/version_cluster_test.go, line 100 at r6 (raw file):
This isn't needed? pkg/server/version_cluster_test.go, line 159 at r8 (raw file):
You still need to remove pkg/server/version_cluster_test.go, line 175 at r8 (raw file):
This will just be pkg/server/version_cluster_test.go, line 187 at r8 (raw file):
No sleeping in unit tests. Just remove this (you would miss a failure here, but that's ok -- it would still be flaky and we're covering this functionality in the roachtests). pkg/server/version_cluster_test.go, line 189 at r8 (raw file):
still pkg/server/version_cluster_test.go, line 202 at r8 (raw file):
Never sleep in unit tests. Use a pkg/settings/cluster/settings.go, line 89 at r8 (raw file):
"diable", and also consider that this comment is still fuzzy. I think you can just remove this comment, now that the help for the cluster setting is concise and clear. pkg/sql/logictest/testdata/logic_test/cluster_version, line 6 at r4 (raw file): Previously, windchan7 (Victor Chen) wrote…
That won't be good enough. I think you should change the cockroach/pkg/sql/logictest/logic.go Line 390 in 835476d
cockroach/pkg/sql/logictest/logic.go Line 852 in 92fdda1
Perhaps rename the config to default-v1.1@v1.0-noupgrade .
pkg/sql/logictest/testdata/logic_test/cluster_version, line 46 at r5 (raw file): Previously, windchan7 (Victor Chen) wrote…
But you're testing less behaviors than before because you're moving off the manual upgrade path. Comments from Reviewable |
Review status: 17 of 22 files reviewed at latest revision, 20 unresolved discussions, some commit checks failed. pkg/server/server_update.go, line 156 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/server_update.go, line 37 at r8 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/server_update.go, line 60 at r8 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 143 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 100 at r6 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
We pass in TestingKnobs as an argument, which has the store testingKnob information. pkg/server/version_cluster_test.go, line 159 at r8 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 175 at r8 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 187 at r8 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 189 at r8 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/server/version_cluster_test.go, line 202 at r8 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/settings/cluster/settings.go, line 89 at r8 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/sql/logictest/testdata/logic_test/cluster_version, line 6 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. pkg/sql/logictest/testdata/logic_test/cluster_version, line 46 at r5 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. Comments from Reviewable |
pkg/server/server_update.go
Outdated
defer session.Finish(s.sqlExecutor) | ||
|
||
// Check if auto upgrade is disabled for test purposes. | ||
upgradeTestingKnobs := s.cfg.TestingKnobs.Upgrade.(*UpgradeTestingKnobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sentry caught a crash on this type conversion. It's dangerous to cast without checking for success (or even nil):
stopper.go:176: *runtime.TypeAssertionError: interface conversion: base.ModuleTestingKnobs is nil, not *server.UpgradeTestingKnobs
https://sentry.io/cockroach-labs/cockroachdb/issues/555937530/?referrer=slack#
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed back with a nil check. I don't understand what Tobi means by saying: "This simplifies because there's no nil check."
pkg/server/server_update.go, line 60 at r8 (raw file): Previously, windchan7 (Victor Chen) wrote…
Can you clarify what you mean by that? Removing the nil check will cause errors as some tests does not set UpgradeTestingKnobs. Comments from Reviewable |
Review status: 17 of 22 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. pkg/server/server_update.go, line 60 at r9 (raw file): Previously, windchan7 (Victor Chen) wrote…
Discussed offline, I was just wrong. pkg/sql/logictest/testdata/logic_test/cluster_version, line 73 at r10 (raw file):
Not sure what happened here. If (and only if) you eat another CI run anyway, could you restore the original line ending? Comments from Reviewable |
Reviewed 2 of 5 files at r9, 3 of 3 files at r10. Comments from Reviewable |
Added a feature to auto upgrade the cluster setting version after a rolling upgrade. Now operators no longer have to manually type the upgrade command. Fixes cockroachdb#23912. Release note: None
25184: cluster: remove legacy version upgrade tests for auto upgrade merge r=windchan7 a=windchan7 Legacy tests such as `TestClusterVersionUpgrade1_0To2_0`, `TestDockerReadWriteBidirectionalReferenceVersion`, and `TestDockerReadWriteForwardReferenceVersion` no longer applies. Therefore, we remove and they will be replaced with new roachtests when the auto upgrade PR #24987 merges. Release note: None 25473: opt/optbuilder: Fix bug with order by, aggregate alias, and having. r=rytaft a=rytaft Previously, queries such as: `SELECT SUM(a) AS a2 FROM abcd GROUP BY c HAVING SUM(a)=10 ORDER BY a2` were incorrectly causing an error: `error: column name "a2" not found` This commit fixes the error by ensuring that a column alias for an aggregate function is applied even when the same aggregate appears multiple times in the same query (e.g., in the HAVING clause). Release note: None Co-authored-by: Victor Chen <victor@cockroachlabs.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
bors r+ |
24987: cluster: Finalize cluster version upgrade automatically. r=windchan7 a=windchan7 Added a feature to auto upgrade the cluster setting version after a rolling upgrade. Now operators no longer have to manually type the upgrade command. Fixes #23912. Release note: None Co-authored-by: Victor Chen <victor@cockroachlabs.com>
Build succeeded |
Wohoo!
On Mon, May 14, 2018 at 4:22 PM craig[bot] ***@***.***> wrote:
Merged #24987 <#24987>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24987 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE135McNtxx6OAZg-mSRgmlzgqk-Xu7Wks5tyed-gaJpZM4TgBxA>
.
--
…-- Tobias
|
🎉
On Tue, May 15, 2018 at 4:36 AM, Tobias Schottdorf <notifications@github.com
… wrote:
Wohoo!
On Mon, May 14, 2018 at 4:22 PM craig[bot] ***@***.***>
wrote:
> Merged #24987 <#24987>.
>
> —
> You are receiving this because you were mentioned.
>
>
> Reply to this email directly, view it on GitHub
> <#24987 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AE135McNtxx6OAZg-
mSRgmlzgqk-Xu7Wks5tyed-gaJpZM4TgBxA>
> .
>
--
-- Tobias
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24987 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA15INXO1mNpXPfvGxszE9VbTtVZ_oyRks5tyerJgaJpZM4TgBxA>
.
|
Added a feature to auto upgrade the cluster setting version after a rolling
upgrade. Now operators no longer have to manually type the upgrade command.
Fixes #23912.
Release note: None