Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: acceptance/gossip/restart-node-one failed [slow replication] #68107

Closed
cockroach-teamcity opened this issue Jul 27, 2021 · 17 comments
Closed
Assignees
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. skipped-test

Comments

@cockroach-teamcity
Copy link
Member

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ 1e0daf805e2f0559add9a8a637c2338b478beeab:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 27, 2021
@cockroach-teamcity cockroach-teamcity added this to roachtest/unit test backlog in KV Jul 27, 2021
@tbg
Copy link
Member

tbg commented Jul 27, 2021

@AlexTalks could you take a look at this one? The link above should have "usable" artifacts. I'd start by downloading them (make sure you download only those for the test, not those for all tests - just the two zip files), and then trying to figure out what the test wants to do (code is be in ./pkg/roachtest/tests/ somewhere, searching for "restart-node-one" is going to be an easy way to find it), and what goes wrong. From the looks of it at some point in the test all replicas should move off a node but didn't, so you'll probably want to figure out which replica didn't (the debug.zip might be helpful - nodes/1/ranges/*.json), and then figure out from the logs (cockroach debug merge-logs . | grep -F 'rXXX/' where XXX is the rangeID) what might've happened.
Let me (or if I'm out, someone else) know if/where you get stuck!

@tbg tbg moved this from roachtest/unit test backlog to Current Milestone in KV Jul 27, 2021
@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ 4718beb99f1b90b4f64e769eb3d1a7dd64e51f64:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ 4718beb99f1b90b4f64e769eb3d1a7dd64e51f64:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 2 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 2 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ c7e039fa912c296b3ecf2bfb63679d7e6cbff8ca:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ 131db990b54f5fd526e1194ce1a4d6138bd04698:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ 131db990b54f5fd526e1194ce1a4d6138bd04698:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@knz
Copy link
Contributor

knz commented Jul 28, 2021

skipping in #68158

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ 15b5d06e0702dc0ca5f4aad98d09c6da06708b7b:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ 412117e27d3e11f1ea56982a6a95180ce0a5b7f5:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 2 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 2 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ 115928196de1beaa63e3e696c14dcd9392690ca2:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ c4702ef7d1959c8d1a231cc73ef5b2ccaefdfea4:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@tbg
Copy link
Member

tbg commented Jul 28, 2021

I would guess that this shares a root cause with #68171 (comment) (and the other tests linked to that issue).

It might be worth starting to pull on #68171 since that's a nice short test that can be run via --local and we should be able to bisect down the commit "easily".

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ f19914b8c6281e463645580e1411774c3b0c20c9:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ f19914b8c6281e463645580e1411774c3b0c20c9:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@tbg tbg changed the title roachtest: acceptance/gossip/restart-node-one failed roachtest: acceptance/gossip/restart-node-one failed [slow replication] Jul 29, 2021
tbg added a commit to tbg/cockroach that referenced this issue Jul 29, 2021
craig bot pushed a commit that referenced this issue Jul 29, 2021
68156: roachtest: fix replicagc-changed-peers (again) r=erikgrinaker a=tbg

I wrote buggy code. Since it isn't hit in every invocation, it slipped
through my local testing.

Fixes #68155
Fixes #68162

Release note: None


68157: CODEOWNERS: own catalog package to obs-inf-prs r=erikgrinaker a=tbg

Noticed in #67866.

Release note: None


68203: roachtest: avoid releasing alloc multiple times r=erikgrinaker a=tbg

As of #68103, existing problems in the cluster creation retry logic
are tickled reliably: on retry we are releasing an alloc twice,
resulting in a panic. Unfortunately, the alloc is acquired several
layers above the retry, so for now the best I can do is to avoid
releasing the alloc if we are bound for a retry.

The reason the above PR tickles it more reliably is because it
accidentally removed a sensible failfast: when the cluster already
exists, we shouldn't destroy & try to recreate it again (which
would then explode on the double-release anyway).

Release note: None


68224: roachtest: fix inconsistency r=erikgrinaker a=tbg

The way that test was passing the args no longer works.

Fixes #64806.

Release note: None


68226: roachtest: allow local runs with roachstress r=erikgrinaker a=tbg

It's appealing to use roachstress also as a tool to simply run
a roachtest, since it avoids having to think much about the flags
and incantation.

This commit adds a prompt to roachstress about running in local
mode. If selected, the binaries are built to target the local
architecture, and the builder is not invoked.

Release note: None


68227: roachtest: skip acceptance/gossip/restart-node-one r=erikgrinaker a=tbg

Touches #68107.

Release note: None


Co-authored-by: Tobias Grieger <tobias.schottdorf@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
@cockroach-teamcity
Copy link
Member Author

roachtest.acceptance/gossip/restart-node-one failed with artifacts on master @ baec9cc9d80faba95fdab715a5834d294df1062f:

The test failed on branch=master, cloud=local:
test artifacts and logs in: artifacts/acceptance/gossip/restart-node-one/run_1
	gossip.go:435,acceptance.go:104,test_runner.go:770: node 1 still has 1 replicas
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne.func3
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:426
		  | github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:197
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runGossipRestartNodeOne
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/gossip.go:415
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerAcceptance.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/acceptance.go:104
		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:770
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1371
		Wraps: (2) node 1 still has 1 replicas
		Error types: (1) *withstack.withStack (2) *errutil.leafError
Reproduce

See the corresponding section in the [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@tbg
Copy link
Member

tbg commented Aug 23, 2021

#68169 (comment)

@tbg tbg removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 23, 2021
@tbg
Copy link
Member

tbg commented Aug 24, 2021

The test is now unskipped (#69232), after I verified that it passes (10x).

@tbg tbg closed this as completed Aug 24, 2021
KV automation moved this from Current Milestone to Closed Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. skipped-test
Projects
None yet
Development

No branches or pull requests

4 participants