Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=12/cpu=16 failed #86987

Closed
cockroach-teamcity opened this issue Aug 27, 2022 · 3 comments
Closed

roachtest: tpccbench/nodes=12/cpu=16 failed #86987

cockroach-teamcity opened this issue Aug 27, 2022 · 3 comments
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Aug 27, 2022

roachtest.tpccbench/nodes=12/cpu=16 failed with artifacts on master @ 770ff3c545a51752490403da64d56fb397f49c5e:

test artifacts and logs in: /artifacts/bazel-20220827-6260294/tpccbench/nodes=12/cpu=16/run_1
	monitor.go:127,tpcc.go:1111,tpcc.go:948,test_runner.go:897: monitor failure: monitor task failed: Non-zero exit code: 1
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1111
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:948
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (4) monitor task failed
		Wraps: (5) Non-zero exit code: 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=16 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-19072

@cockroach-teamcity cockroach-teamcity added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Aug 27, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.2 milestone Aug 27, 2022
@srosenberg
Copy link
Member

srosenberg commented Aug 30, 2022

While the TC artifacts weren't published owing to [1], I was able to grab them from the TC agent. The workload generator timed out after an hour,

cat run_075025.210195150_n13_cockroach_workload_run_tpcc.log
run_075025.210195150_n13_cockroach_workload_run_tpcc: 07:50:25 cluster.go:2023: running ./cockroach workload run tpcc --warehouses=10000 --workers=10000 --max-rate=1962 --wait=false --ramp=10m0s --duration=30m0s --scatter --tolerate-errors {pgurl:1-12} on nodes: :13
teamcity-6260294-1661577517-05-n13cpu16: ./cockroach workload run tp...
I220827 07:50:27.965708 1 workload/cli/run.go:427  [-] 1  creating load generator...
Initializing 10000 connections...
E220827 08:51:05.325793 1 workload/cli/run.go:450  [-] 2  Attempt to create load generator failed. It's been more than 1h0m0s since we started trying to create the load generator so we're giving up. Last failure: failed to initialize the load generator: context deadline exceeded
Error: failed to initialize the load generator: context deadline exceeded
LAST EXIT STATUS: 1run_075025.210195150_n13_cockroach_workload_run_tpcc: 08:51:05 cluster.go:2038: > Error for Node 13: Non-zero exit code: 1

while the import completed successfully,

cat run_073304.702636586_n1_cockroach_workload_fixtures_import_tpcc.log
run_073304.702636586_n1_cockroach_workload_fixtures_import_tpcc: 07:33:04 cluster.go:260: > ././cockroach workload fixtures import tpcc --warehouses=10000 --checks=false
I220827 07:33:06.107720 1 ccl/workloadccl/fixture.go:318  [-] 1  starting import of 9 tables
I220827 07:33:06.341697 58 ccl/workloadccl/fixture.go:481  [-] 2  imported 532 KiB in warehouse table (10000 rows, 0 index entries, took 172.361785ms, 3.02 MiB/s)
I220827 07:33:08.827526 64 ccl/workloadccl/fixture.go:481  [-] 3  imported 7.9 MiB in item table (100000 rows, 0 index entries, took 2.657863143s, 2.96 MiB/s)
I220827 07:33:09.042443 59 ccl/workloadccl/fixture.go:481  [-] 4  imported 9.9 MiB in district table (100000 rows, 0 index entries, took 2.872985975s, 3.43 MiB/s)
I220827 07:33:50.526213 63 ccl/workloadccl/fixture.go:481  [-] 5  imported 1.3 GiB in new_order table (90000000 rows, 0 index entries, took 44.356576254s, 30.89 MiB/s)
I220827 07:36:55.937964 62 ccl/workloadccl/fixture.go:481  [-] 6  imported 16 GiB in order table (300000000 rows, 300000000 index entries, took 3m49.768335883s, 72.74 MiB/s)
I220827 07:37:36.666097 61 ccl/workloadccl/fixture.go:481  [-] 7  imported 22 GiB in history table (300000000 rows, 0 index entries, took 4m30.496501566s, 81.62 MiB/s)
I220827 07:44:00.352537 60 ccl/workloadccl/fixture.go:481  [-] 8  imported 173 GiB in customer table (300000000 rows, 300000000 index entries, took 10m54.183029402s, 270.18 MiB/s)
I220827 07:47:18.286149 65 ccl/workloadccl/fixture.go:481  [-] 9  imported 302 GiB in stock table (1000000000 rows, 0 index entries, took 14m12.116449399s, 362.29 MiB/s)
I220827 07:50:23.220516 114 ccl/workloadccl/fixture.go:481  [-] 10  imported 172 GiB in order_line table (3000058576 rows, 0 index entries, took 17m17.050802938s, 170.36 MiB/s)
I220827 07:50:23.270776 1 ccl/workloadccl/fixture.go:326  [-] 11  imported 686 GiB bytes in 9 tables (took 17m17.157435137s, 677.16 MiB/s)
run_073304.702636586_n1_cockroach_workload_fixtures_import_tpcc: 07:50:25 cluster.go:1962: > result: <nil>

It's unclear as to why the load generator timed out. The cluster appears to have been healthy. However, none of the nodes show up in grafana which means that prometheus was unable to reach the nodes. There isn't anything else that's actionable (other than the linked issue), so I am closing.

@tbg Perhaps you've seen this type of timeout before?

[1] #82899 (comment)

Test Engineering automation moved this from Triage to Done Aug 30, 2022
@renatolabs renatolabs removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 6, 2022
@srosenberg
Copy link
Member

Apparently --scatter now takes longer than 60 minutes [1]. The timeout has been bumped to 90 minutes [2].

[1] #72083 (comment)
[2] #88641

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
Development

No branches or pull requests

3 participants