Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/index/tpcc/w=1000 failed #54304

Closed
cockroach-teamcity opened this issue Sep 12, 2020 · 14 comments
Closed

roachtest: schemachange/index/tpcc/w=1000 failed #54304

cockroach-teamcity opened this issue Sep 12, 2020 · 14 comments
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@a504fc7f26be39586c14c3fa22ad728ed8fa9b6d:

		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:171
		  | main.makeIndexAddTpccTest.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/schemachange.go:302
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:754
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (2) failed to get pgurl for nodes: teamcity-2264915-1599891051-76-n5cpu16:1
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod pgurl --external teamcity-2264915-1599891051-76-n5cpu16:1 returned
		  | stderr:
		  |
		  | stdout:
		  | problem loading clusters: could not read /root/.roachprod/hosts/teamcity-2264875-1599890790-55-n4cpu8: open /root/.roachprod/hosts/teamcity-2264875-1599890790-55-n4cpu8: no such file or directory
		  | Error: UNCLASSIFIED_PROBLEM: unknown cluster: teamcity-2264915-1599891051-76-n5cpu16
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Use "roachprod sync" to update the list of available clusters.
		  | Wraps: (3)
		  |   | Available clusters:
		  |   |   piyush-sso
		  |   |   sumeer-1599862316-01-n5cpu16
		  |   |   sumeer-1599862316-03-n5cpu16
		  |   |   sumeer-1599862316-04-n5cpu16
		  |   |   teamcity-2264875-1599890790-18-n4cpu32
		  |   |   teamcity-2264875-1599890790-21-n10cpu4
		  |   |   teamcity-2264875-1599890790-35-n6cpu4
		  |   |   teamcity-2264875-1599890790-38-n32cpu4
		  | Wraps: (4) attached stack trace
		  |   -- stack trace:
		  |   | main.newCluster
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:139
		  |   | main.glob..func24
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1455
		  |   | main.wrap.func1
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:267
		  |   | github.com/spf13/cobra.(*Command).execute
		  |   | 	/home/agent/work/.go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
		  |   | github.com/spf13/cobra.(*Command).ExecuteC
		  |   | 	/home/agent/work/.go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
		  |   | github.com/spf13/cobra.(*Command).Execute
		  |   | 	/home/agent/work/.go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
		  |   | main.main
		  |   | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1839
		  |   | runtime.main
		  |   | 	/usr/local/go/src/runtime/proc.go:203
		  |   | runtime.goexit
		  |   | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		  | Wraps: (5) unknown cluster: teamcity-2264915-1599891051-76-n5cpu16
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withHint (3) *hintdetail.withHint (4) *withstack.withStack (5) *errutil.leafError
		Wraps: (4) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *exec.ExitError

More

Artifacts: /schemachange/index/tpcc/w=1000
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-release-20.2 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 12, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Sep 12, 2020
@thoszhang thoszhang self-assigned this Sep 15, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@85cc9fe61acc633d38c5c7078725e7ea68b5352c:

		  | main.makeIndexAddTpccTest.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/schemachange.go:302
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:754
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2700
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2614
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5228
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:190
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

	cluster.go:1651,context.go:135,cluster.go:1640,test_runner.go:823: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2285049-1600668475-77-n5cpu16 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		1: dead
		3: 4921
		4: 4323
		2: 4655
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1143
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:267
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1839
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /schemachange/index/tpcc/w=1000
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@ajwerner ajwerner self-assigned this Sep 22, 2020
@ajwerner
Copy link
Contributor

This last one is an OOM

@cockroach-teamcity
Copy link
Member Author

(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@bdb8cd0e7b2f25a08569a56464838486b6d16421:

		  |   143.0s        0           72.0           63.4    100.7    268.4    352.3    453.0 stockLevel
		  |   144.0s        0           54.0           63.2    520.1    738.2    906.0    973.1 delivery
		  |   144.0s        0          702.9          633.9    352.3    570.4    671.1   1744.8 newOrder
		  |   144.0s        0           81.0           63.5     37.7    167.8    218.1    268.4 orderStatus
		  |   144.0s        0          745.9          634.6    234.9    402.7    503.3    637.5 payment
		  |   144.0s        0           82.0           63.6    104.9    268.4    369.1    369.1 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   145.0s        0           84.0           63.3    486.5    771.8    838.9    872.4 delivery
		  |   145.0s        0          748.1          634.7    369.1    536.9    637.5    671.1 newOrder
		  |   145.0s        0           59.0           63.4     41.9    117.4    151.0    159.4 orderStatus
		  |   145.0s        0          700.1          635.0    260.0    402.7    486.5    570.4 payment
		  |   145.0s        0           60.0           63.5     83.9    268.4    302.0    352.3 stockLevel
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	cluster.go:2651,tpcc.go:168,schemachange.go:302,test_runner.go:755: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2639
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2647
		  | main.runTPCC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:168
		  | main.makeIndexAddTpccTest.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/schemachange.go:302
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:755
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2695
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2609
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5228
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:190
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

More

Artifacts: /schemachange/index/tpcc/w=1000
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@f180b8d178f7c81dbb73e7f4b139bd7b7bf829dc:

		  |   149.0s        0          685.8          634.7    268.4    436.2    536.9    637.5 payment
		  |   149.0s        0           69.0           63.5    117.4    285.2    318.8    352.3 stockLevel
		  |   150.0s        0           65.0           63.4    570.4    771.8    838.9    906.0 delivery
		  |   150.0s        0          719.0          634.5    385.9    570.4    637.5    738.2 newOrder
		  |   150.0s        0           70.0           63.5     33.6    113.2    134.2    151.0 orderStatus
		  |   150.0s        0          653.0          634.9    251.7    385.9    469.8    637.5 payment
		  |   150.0s        0           72.0           63.6    113.2    302.0    419.4    469.8 stockLevel
		  |   151.0s        0           51.0           63.4    503.3    838.9    939.5   1073.7 delivery
		  |   151.0s        0          701.9          635.0    385.9    637.5    738.2    805.3 newOrder
		  |   151.0s        0           75.0           63.5     56.6    142.6    184.5    218.1 orderStatus
		  |   151.0s        0          702.9          635.3    260.0    436.2    486.5    671.1 payment
		  |   151.0s        0           74.0           63.6    113.2    369.1    453.0    469.8 stockLevel
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	cluster.go:2651,tpcc.go:168,schemachange.go:302,test_runner.go:755: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2639
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2647
		  | main.runTPCC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:168
		  | main.makeIndexAddTpccTest.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/schemachange.go:302
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:755
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2695
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2609
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5228
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:190
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

More

Artifacts: /schemachange/index/tpcc/w=1000
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@thoszhang
Copy link
Contributor

The first failure is an infra flake and the most recent two are from the backfiller memory accounting bug (fixed in #55092). That leaves the OOM.

I don't know what's going on with that one. The OOM was 3 minutes into an index backfill. There's a heap profile from less than a minute before the process was killed that only reports <700 MB inuse memory and nothing particularly stands out about it, and runtime stats indicate most of the allocation is happening in cgo anyway. Everything just looks slow and there are some SST ingestion delays that look very large (this is about a minute before the OOM):

I200921 11:48:17.946835 9497079 kv/kvserver/store_send.go:72 ⋮ [n1,s1] SST ingestion was delayed by 18.907066419s (14.242144ms for storage engine back-pressure)
I200921 11:48:18.617801 454 server/status/runtime.go:522 ⋮ [n1] runtime stats: 6.5 GiB RSS, 2746 goroutines, 1.1 GiB/585 MiB/1.6 GiB GO alloc/idle/total, 3.9 GiB/4.8 GiB CGO alloc/total, 21946.9 CGO/sec, 1264.1/90.4 %(u/s)time, 0.2 %gc (15x), 129 MiB/139 MiB (r/w)net
I200921 11:48:19.096545 88 kv/kvserver/queue.go:582 ⋮ [n1,s1,r3401/1:‹/Table/56/2/6{35/303…-40/341…}›] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
I200921 11:48:20.070481 10994609 kv/kvserver/store_send.go:72 ⋮ [n1,s1] SST ingestion was delayed by 1.186435878s (1.000601105s for storage engine back-pressure)
W200921 11:48:22.222736 330 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3470/2:‹/Table/60/4/{NULL/7…-5/225/…}›] handle raft ready: 1.5s [applied=1, batches=1, state_assertions=0]
W200921 11:48:22.279866 377 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r587/1:‹/Table/57/1/35{8/679…-9/719…}›] handle raft ready: 0.6s [applied=2, batches=2, state_assertions=0]
I200921 11:48:24.241798 9497079 kv/kvserver/store_send.go:72 ⋮ [n1,s1] SST ingestion was delayed by 4.042709349s (2.004452235s for storage engine back-pressure)
I200921 11:48:25.420251 374 kv/kvserver/queue.go:582 ⋮ [n1,s1,r2364/1:‹/Table/57/1/28{3/1920-4/5877}›] rate limited in MaybeAdd (merge): throttled on async limiting semaphore
W200921 11:48:26.867554 297 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3470/2:‹/Table/60/4/{NULL/7…-5/225/…}›] handle raft ready: 2.0s [applied=1, batches=1, state_assertions=0]
I200921 11:48:28.630949 454 server/status/runtime.go:522 ⋮ [n1] runtime stats: 6.6 GiB RSS, 3139 goroutines, 1.2 GiB/522 MiB/1.6 GiB GO alloc/idle/total, 4.0 GiB/4.9 GiB CGO alloc/total, 17962.1 CGO/sec, 1364.1/71.7 %(u/s)time, 0.1 %gc (17x), 122 MiB/118 MiB (r/w)net
I200921 11:48:29.882386 11059166 kv/kvserver/store_send.go:72 ⋮ [n1,s1] SST ingestion was delayed by 6.629895953s (3.005107616s for storage engine back-pressure)
W200921 11:48:34.269199 328 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3470/2:‹/Table/60/4/{NULL/7…-5/225/…}›] handle raft ready: 3.5s [applied=1, batches=1, state_assertions=0]
W200921 11:48:35.095743 341 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3470/2:‹/Table/60/4/{NULL/7…-5/225/…}›] handle raft ready: 0.6s [applied=0, batches=0, state_assertions=0]

Some of this might just be typical for this test, though. @ajwerner I can't remember whether you were looking at this one. Did you find out anything interesting about it?

I think a rare OOM on this test warrants more investigation but is not a release blocker so I'm removing the tag.

@thoszhang thoszhang added branch-release-20.2 and removed branch-release-20.2 release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Oct 6, 2020
@thoszhang
Copy link
Contributor

Also, we've had a few similar-looking failures a few times on master in the last few months. The first one seems to be #44071 (comment).

@thoszhang
Copy link
Contributor

I'm also confused because this test is running on machines with 16 CPUs and 14.4 GB of memory, and dmesg says

[ 2689.787463] Out of memory: Kill process 4991 (cockroach) score 958 or sacrifice child
[ 2689.795641] Killed process 4991 (cockroach) total-vm:28748424kB, anon-rss:14107552kB, file-rss:1200kB, shmem-rss:0kB
[ 2690.664641] oom_reaper: reaped process 4991 (cockroach), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

But our runtime stats report the RSS for the process hovering at around 6 GB before it gets killed, the last entry being

I200921 11:49:18.653212 454 server/status/runtime.go:522 ⋮ [n1] runtime stats: 6.7 GiB RSS, 3737 goroutines, 1.3 GiB/349 MiB/1.8 GiB GO alloc/idle/total, 4.0 GiB/4.9 GiB CGO alloc/total, 13850.8 CGO/sec, 1389.1/68.6 %(u/s)time, 0.1 %gc (14x), 100 MiB/92 MiB (r/w)net

Is anon-rss not some subset of the RSS reported in our runtime stats? I don't know what I'm looking at here.

@thoszhang thoszhang added release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Oct 6, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@8c79e2bc4b35d36c8527f4c40c974f03d9034f46:

		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) output in run_100806.480_n5_workload_run_tpcc
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2657161-1612856692-114-n5cpu16:5 -- ./workload run tpcc --warehouses=1000 --histograms=perf/stats.json --wait=false --tolerate-errors --workers=1000 --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned
		  | stderr:
		  | ./workload: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ./workload)
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ```
		  |   | ./workload run tpcc --warehouses=1000 --histograms=perf/stats.json --wait=false --tolerate-errors --workers=1000 --ramp=5m0s --duration=2h0m0s {pgurl:1-4}
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (4) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *exec.ExitError

	cluster.go:2654,tpcc.go:174,schemachange.go:302,test_runner.go:755: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2642
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2650
		  | main.runTPCC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:174
		  | main.makeIndexAddTpccTest.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/schemachange.go:302
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:755
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2698
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2612
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5652
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:191
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

More

Artifacts: /schemachange/index/tpcc/w=1000
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@b0012907c1bc9627ae2de83e6099c4930a32699e:

		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) output in run_094126.924_n5_workload_run_tpcc
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2661584-1612941367-112-n5cpu16:5 -- ./workload run tpcc --warehouses=1000 --histograms=perf/stats.json --wait=false --tolerate-errors --workers=1000 --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned
		  | stderr:
		  | ./workload: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by ./workload)
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ```
		  |   | ./workload run tpcc --warehouses=1000 --histograms=perf/stats.json --wait=false --tolerate-errors --workers=1000 --ramp=5m0s --duration=2h0m0s {pgurl:1-4}
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (4) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *exec.ExitError

	cluster.go:2654,tpcc.go:174,schemachange.go:302,test_runner.go:755: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2642
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2650
		  | main.runTPCC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:174
		  | main.makeIndexAddTpccTest.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/schemachange.go:302
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:755
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2698
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2612
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5652
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:191
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

More

Artifacts: /schemachange/index/tpcc/w=1000
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@tbg
Copy link
Member

tbg commented Apr 26, 2021

Will triage these recent failures and assign based on the outcome.

@tbg tbg assigned tbg and unassigned ajwerner Apr 26, 2021
@cockroach-teamcity
Copy link
Member Author

(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@01dad2f783c4be5060d332b975f96a3ad4be8ecb:

		  |   807.0s        0           72.0           66.2    192.9    419.4    469.8    570.4 stockLevel
		  |   808.0s        0           91.0           66.2   1073.7   2147.5   2818.6   3355.4 delivery
		  |   808.0s        0          742.0          661.8    704.6   1073.7   1409.3   1677.7 newOrder
		  |   808.0s        0           73.0           66.3     96.5    209.7    251.7    285.2 orderStatus
		  |   808.0s        0          726.0          662.2    503.3    838.9    973.1   1342.2 payment
		  |   808.0s        0           67.0           66.2    184.5    352.3    385.9    469.8 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   809.0s        0           97.0           66.2    906.0   1677.7   1946.2   2281.7 delivery
		  |   809.0s        0          682.9          661.8    704.6   1073.7   1275.1   1342.2 newOrder
		  |   809.0s        0           67.0           66.3     88.1    218.1    302.0    302.0 orderStatus
		  |   809.0s        0          734.9          662.3    570.4    872.4   1140.9   1208.0 payment
		  |   809.0s        0           80.0           66.3    159.4    402.7    486.5    570.4 stockLevel
		Wraps: (4) secondary error attachment
		  | signal: interrupt
		  | (1) signal: interrupt
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	cluster.go:2655,tpcc.go:174,schemachange.go:302,test_runner.go:755: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2643
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2651
		  | main.runTPCC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:174
		  | main.makeIndexAddTpccTest.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/schemachange.go:302
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:755
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2699
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2613
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5652
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:191
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

More

Artifacts: /schemachange/index/tpcc/w=1000
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@tbg tbg removed their assignment May 4, 2021
@tbg
Copy link
Member

tbg commented May 4, 2021

This wasn't actually the test I wanted to bisect (it was #62320). What we saw over there was that this test is running close to overload and so it will sometimes fall over.

I'd suggest toning down the workload here a bunch.

@cockroach-teamcity
Copy link
Member Author

(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@e9f553b570957d34f5bb39f8976f6d2c893bd8b4:

		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3005181-1621623765-78-n5cpu16:1 -- ./cockroach workload fixtures import tpcc --warehouses=1000  returned
		  | stderr:
		  | I210521 23:48:40.110449 1 ccl/workloadccl/fixture.go:342  starting import of 9 tables
		  | I210521 23:48:41.283148 116 ccl/workloadccl/fixture.go:472  imported 1006 KiB in district table (10000 rows, 0 index entries, took 1.166353525s, 0.84 MiB/s)
		  | I210521 23:48:44.844799 115 ccl/workloadccl/fixture.go:472  imported 53 KiB in warehouse table (1000 rows, 0 index entries, took 4.728024991s, 0.01 MiB/s)
		  | I210521 23:48:48.142275 121 ccl/workloadccl/fixture.go:472  imported 7.8 MiB in item table (100000 rows, 0 index entries, took 8.025440014s, 0.97 MiB/s)
		  | I210521 23:48:55.894642 120 ccl/workloadccl/fixture.go:472  imported 126 MiB in new_order table (9000000 rows, 0 index entries, took 15.777853238s, 7.96 MiB/s)
		  | I210521 23:49:54.474240 119 ccl/workloadccl/fixture.go:472  imported 1.6 GiB in order table (30000000 rows, 30000000 index entries, took 1m14.357524227s, 21.45 MiB/s)
		  | I210521 23:49:58.088755 118 ccl/workloadccl/fixture.go:472  imported 2.2 GiB in history table (30000000 rows, 0 index entries, took 1m17.971884343s, 28.29 MiB/s)
		  | Error: importing fixture: importing table customer: pq: communication error: rpc error: code = Canceled desc = context canceled
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import tpcc --warehouses=1000
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (4) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *exec.ExitError

	cluster.go:1658,context.go:135,cluster.go:1647,test_runner.go:836: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3005181-1621623765-78-n5cpu16 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		2: 4719
		4: dead
		1: 4983
		3: 4663
		Error: UNCLASSIFIED_PROBLEM: 4: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1143
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:267
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1839
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 4: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /schemachange/index/tpcc/w=1000

See this test on roachdash
powered by pkg/cmd/internal/issues

@ajwerner
Copy link
Contributor

ajwerner commented Feb 8, 2022

closing this as stale

@ajwerner ajwerner closed this as completed Feb 8, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

5 participants