Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: loqrecovery/half-online/workload=tpcc/rangeSize=16mb failed #97912

Closed
cockroach-teamcity opened this issue Mar 2, 2023 · 10 comments
Closed
Assignees
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-testing Testing tools and infrastructure branch-master Failures on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-kv-replication KV Replication Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 2, 2023

roachtest.loqrecovery/half-online/workload=tpcc/rangeSize=16mb failed with artifacts on master @ 20e2adda3c76c7172dd986c871df0ae9a346918f:

test artifacts and logs in: /artifacts/loqrecovery/half-online/workload=tpcc/rangeSize=16mb/run_1
(cluster.go:1387).FailOnInvalidDescriptors: invalid descriptors check failed: pq: query execution canceled due to statement timeout

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/replication

This test on roachdash | Improve this report!

Jira issue: CRDB-24961

@cockroach-teamcity cockroach-teamcity added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-kv-replication KV Replication Team labels Mar 2, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 2, 2023
@pav-kv
Copy link
Collaborator

pav-kv commented Mar 6, 2023

teardownTest fails after timing out on trying to read from crdb_internal.invalid_objects. Can this timeout be related to the fact that there is a loss of quorum or somesuch? (the test is about LoQ)

@pav-kv pav-kv added A-kv-replication Relating to Raft, consensus, and coordination. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-testing Testing tools and infrastructure labels Mar 6, 2023
@aliher1911
Copy link
Contributor

@herkolategan in #96949 you added a mandatory consistency check to the end of roachtest. Quorum recovery tests often leave cluster broken if it is impossible to recover all replicas.

Any ideas on how to make that check optional if test knows recovery didn't succeed? Unsuccessful recovery is a pass for the test.

@herkolategan
Copy link
Collaborator

@aliher1911 Thanks for pointing this out! I'll create an issue to add something to be able to opt out of the mandatory consistency check.

herkolategan added a commit to herkolategan/cockroach that referenced this issue Mar 17, 2023
A few post validations are currently performed after a test completes. In most
cases it applies to all tests, but there are some tests that are incompatible
with particular post validations. This change adds the ability for a test to
specify that it wants to opt-out of certain validations.

Refs: cockroachdb#97912
Epic: None
herkolategan added a commit to herkolategan/cockroach that referenced this issue Mar 21, 2023
A few post validations are currently performed after a test completes. In most
cases it applies to all tests, but there are some tests that are incompatible
with particular post validations. This change adds the ability for a test to
specify that it wants to opt-out of certain validations.

Refs: cockroachdb#97912
Epic: None
herkolategan added a commit to herkolategan/cockroach that referenced this issue Mar 23, 2023
A few post validations are currently performed after a test completes. In most
cases it applies to all tests, but there are some tests that are incompatible
with particular post validations. This change adds the ability for a test to
specify that it wants to opt-out of certain validations.

Refs: cockroachdb#97912
Epic: None
craig bot pushed a commit that referenced this issue Mar 23, 2023
98848: roachtest: allow opting out of post validation(s) r=aliher1911 a=herkolategan

A few post validations are currently performed after a test completes. In most cases it applies to all tests, but there are some tests that are incompatible with particular post validations. This change adds the ability for a test to specify that it wants to opt-out of certain validations.

Refs: #97912
Epic: None

Co-authored-by: Herko Lategan <herko@cockroachlabs.com>
herkolategan added a commit to herkolategan/cockroach that referenced this issue Mar 23, 2023
A few post validations are currently performed after a test completes. In most
cases it applies to all tests, but there are some tests that are incompatible
with particular post validations. This change adds the ability for a test to
specify that it wants to opt-out of certain validations.

Refs: cockroachdb#97912
Epic: None
@cockroach-teamcity
Copy link
Member Author

roachtest.loqrecovery/half-online/workload=tpcc/rangeSize=16mb failed with artifacts on master @ 0fcc33bc2870961b9387999d8a9fed97fccbb2ae:

test artifacts and logs in: /artifacts/loqrecovery/half-online/workload=tpcc/rangeSize=16mb/run_1
(assertions.go:262).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/loss_of_quorum_recovery.go:418
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: could not validate zone config: RangeMaxBytes 16777216 less than minimum allowed 67108864
	Test:       	loqrecovery/half-online/workload=tpcc/rangeSize=16mb
	Messages:   	failed to set range limits configuration
(require.go:1264).NoError: FailNow called

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.loqrecovery/half-online/workload=tpcc/rangeSize=16mb failed with artifacts on master @ d107217dac5d817cc115bc0e97b7e53c0f2878bf:

test artifacts and logs in: /artifacts/loqrecovery/half-online/workload=tpcc/rangeSize=16mb/run_1
(assertions.go:262).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/loss_of_quorum_recovery.go:418
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: could not validate zone config: RangeMaxBytes 16777216 less than minimum allowed 67108864
	Test:       	loqrecovery/half-online/workload=tpcc/rangeSize=16mb
	Messages:   	failed to set range limits configuration
(require.go:1264).NoError: FailNow called

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@aliher1911
Copy link
Contributor

Looks like this is a completely new issue with changing range size.

@erikgrinaker
Copy link
Contributor

Yes, see #96725.

@cockroach-teamcity
Copy link
Member Author

roachtest.loqrecovery/half-online/workload=tpcc/rangeSize=16mb failed with artifacts on master @ 2bd2c806ab3044569b09e0a205b5bc0452ad4e2b:

test artifacts and logs in: /artifacts/loqrecovery/half-online/workload=tpcc/rangeSize=16mb/run_1
(assertions.go:262).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/loss_of_quorum_recovery.go:418
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: could not validate zone config: RangeMaxBytes 16777216 less than minimum allowed 67108864
	Test:       	loqrecovery/half-online/workload=tpcc/rangeSize=16mb
	Messages:   	failed to set range limits configuration
(require.go:1264).NoError: FailNow called

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.loqrecovery/half-online/workload=tpcc/rangeSize=16mb failed with artifacts on master @ 143b63a6a27aeb286637dd2e5abddafdf0c51874:

test artifacts and logs in: /artifacts/loqrecovery/half-online/workload=tpcc/rangeSize=16mb/run_1
(assertions.go:262).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/loss_of_quorum_recovery.go:418
	            				main/pkg/cmd/roachtest/monitor.go:105
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	Received unexpected error:
	            	pq: could not validate zone config: RangeMaxBytes 16777216 less than minimum allowed 67108864
	Test:       	loqrecovery/half-online/workload=tpcc/rangeSize=16mb
	Messages:   	failed to set range limits configuration
(require.go:1264).NoError: FailNow called

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@aliher1911
Copy link
Contributor

Subsequent failure is now fixed by #99636. All good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. A-testing Testing tools and infrastructure branch-master Failures on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-kv-replication KV Replication Team
Projects
None yet
Development

No branches or pull requests

5 participants