-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: fix panic in mixedversion
when cockroach fails to start
#103799
roachtest: fix panic in mixedversion
when cockroach fails to start
#103799
Conversation
When a mixed-version test (using the `mixedversion` package) starts, the `cockroach` binary is uploaded to every node, and then the cluster is started. If, for some reason, the cockroach process crashes in this startup phase, the `mixedversion` package would panic while generating an error message. The reason for that is that there was an assumption that the connection cache was initialized at that point, which does not hold when the test failure happened on test setup. This fixes this issue by making sure we check for the status of the connection cache when generating error messages. We also make sure concurrent accesses to the connection cache are safe; while this is not strictly necessary (no concurrent reads and writes to it right now), it will likely help in the future as this code changes. Epic: none Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan and @renatolabs)
pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go
line 310 at r1 (raw file):
clusterVersionsBefore := tr.clusterVersions var clusterVersionsAfter atomic.Value if tr.connCacheInitialized() {
Should we log the fact that the failure occurred during the first step, i.e., before initialization finished? Or is it already captured somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=srosenberg
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)
pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go
line 310 at r1 (raw file):
The error itself already includes that. Here's the error I see when I forced roachtest to use bad fixtures:
[mixed-version-test/1_starting-cluster-from-fixtures-for-version-22.2.9] 16:28:48 runner.go:328: mixed-version test failure while running step 1 (starting cluster from fixtures for version "22.2.9"): ~ ./cockroach.sh
Build failed: |
Flaked: #103531 (comment). bors retry |
Build succeeded: |
When a mixed-version test (using the
mixedversion
package) starts, thecockroach
binary is uploaded to every node, and then the cluster is started. If, for some reason, the cockroach process crashes in this startup phase, themixedversion
package would panic while generating an error message. The reason for that is that there was an assumption that the connection cache was initialized at that point, which does not hold when the test failure happened on test setup.This fixes this issue by making sure we check for the status of the connection cache when generating error messages. We also make sure concurrent accesses to the connection cache are safe; while this is not strictly necessary (no concurrent reads and writes to it right now), it will likely help in the future as this code changes.
Epic: none
Release note: None