Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: fix panic in mixedversion when cockroach fails to start #103799

Merged

Conversation

renatolabs
Copy link
Collaborator

When a mixed-version test (using the mixedversion package) starts, the cockroach binary is uploaded to every node, and then the cluster is started. If, for some reason, the cockroach process crashes in this startup phase, the mixedversion package would panic while generating an error message. The reason for that is that there was an assumption that the connection cache was initialized at that point, which does not hold when the test failure happened on test setup.

This fixes this issue by making sure we check for the status of the connection cache when generating error messages. We also make sure concurrent accesses to the connection cache are safe; while this is not strictly necessary (no concurrent reads and writes to it right now), it will likely help in the future as this code changes.

Epic: none

Release note: None

When a mixed-version test (using the `mixedversion` package) starts,
the `cockroach` binary is uploaded to every node, and then the cluster
is started. If, for some reason, the cockroach process crashes in this
startup phase, the `mixedversion` package would panic while generating
an error message. The reason for that is that there was an assumption
that the connection cache was initialized at that point, which does
not hold when the test failure happened on test setup.

This fixes this issue by making sure we check for the status of the
connection cache when generating error messages. We also make sure
concurrent accesses to the connection cache are safe; while this is
not strictly necessary (no concurrent reads and writes to it right
now), it will likely help in the future as this code changes.

Epic: none

Release note: None
@renatolabs renatolabs requested a review from a team as a code owner May 23, 2023 17:46
@renatolabs renatolabs requested review from herkolategan and srosenberg and removed request for a team May 23, 2023 17:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan and @renatolabs)


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 310 at r1 (raw file):

	clusterVersionsBefore := tr.clusterVersions
	var clusterVersionsAfter atomic.Value
	if tr.connCacheInitialized() {

Should we log the fact that the failure occurred during the first step, i.e., before initialization finished? Or is it already captured somewhere else?

Copy link
Collaborator Author

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=srosenberg

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)


pkg/cmd/roachtest/roachtestutil/mixedversion/runner.go line 310 at r1 (raw file):
The error itself already includes that. Here's the error I see when I forced roachtest to use bad fixtures:

[mixed-version-test/1_starting-cluster-from-fixtures-for-version-22.2.9] 16:28:48 runner.go:328: mixed-version test failure while running step 1 (starting cluster from fixtures for version "22.2.9"): ~ ./cockroach.sh

@craig
Copy link
Contributor

craig bot commented May 23, 2023

Build failed:

@renatolabs
Copy link
Collaborator Author

Flaked: #103531 (comment).

bors retry

@craig
Copy link
Contributor

craig bot commented May 23, 2023

Build succeeded:

@craig craig bot merged commit 87c2ab0 into cockroachdb:master May 23, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants