Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/multiregionccl: TestRegionAddDropWithConcurrentBackupOps failed #124352

Closed
github-actions bot opened this issue May 17, 2024 · 4 comments
Closed

ccl/multiregionccl: TestRegionAddDropWithConcurrentBackupOps failed #124352

github-actions bot opened this issue May 17, 2024 · 4 comments
Assignees
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@github-actions
Copy link

github-actions bot commented May 17, 2024

ccl/multiregionccl.TestRegionAddDropWithConcurrentBackupOps failed on master @ f53b1337f77915d6ab71109a158c8b9f9cf16248:

Fatal error:

panic: test timed out after 14m57s
running tests:
	TestRegionAddDropWithConcurrentBackupOps (4m6s)
	TestRegionAddDropWithConcurrentBackupOps/add-region-succeed-backup-database (4s)

Stack:

goroutine 1037376 [running]:
testing.(*M).startAlarm.func1()
	GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
	GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

=== RUN   TestRegionAddDropWithConcurrentBackupOps
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestRegionAddDropWithConcurrentBackupOps3418894860
    test_log_scope.go:81: use -show-logs to present logs inline
=== RUN   TestRegionAddDropWithConcurrentBackupOps/add-region-succeed-backup-database

Parameters:

  • attempt=1
  • run=1
  • shard=2
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-foundations

This test on roachdash | Improve this report!

Jira issue: CRDB-38843

@github-actions github-actions bot added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels May 17, 2024
@github-actions github-actions bot added this to Triage in SQL Foundations May 17, 2024
@rafiss rafiss removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label May 21, 2024
@exalate-issue-sync exalate-issue-sync bot added the P-2 Issues/test failures with a fix SLA of 3 months label May 22, 2024
@fqazi
Copy link
Collaborator

fqazi commented May 22, 2024

We got stuck waiting for full replication here:

github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).lockProcessing(...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:556
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).DrainQueue(0xc0114eaf20, {0x7f58040, 0xc362780}, 0xc01dbfc480)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:1447 +0x90
github.com/cockroachdb/cockroach/pkg/kv/kvserver.forceScanAndProcess({0x7f58040, 0xc362780}, 0xc01f5ab508, 0xc0114eaf20)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue_helpers_testutil.go:42 +0xd1
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).ForceReplicationScanAndProcess(...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue_helpers_testutil.go:55
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).WaitForFullReplication.func2(0xc01f5ab508)
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1440 +0x8e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1(0xc018badd40?, 0xc0478a7800?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:150 +0x26
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range(0x4000000000000000?, 0xc00ab377c0)
	github.com/cockroachdb/cockroach/pkg/util/syncutil/int_map.go:385 +0xd6
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores(0x7f57e48?, 0xc362780?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:149 +0x4e
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).WaitForFullReplication(0xc0243dc708)
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1432 +0x40c
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Start(0xc0243dc708, {0x7f97450, 0xc026290340})
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:456 +0x493
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.StartTestCluster({_, _}, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:238 +0x65
github.com/cockroachdb/cockroach/pkg/ccl/multiregionccl/multiregionccltestutils.TestingCreateMultiRegionClusterWithRegionList({_, _}, {_, _, _}, _, {{0x0, 0x0}, {0x0, 0x0}, ...}, ...)
	github.com/cockroachdb/cockroach/pkg/ccl/multiregionccl/multiregionccltestutils/testutils.go:132 +0x1b3
github.com/cockroachdb/cockroach/pkg/ccl/multiregionccl/multiregionccltestutils.TestingCreateMultiRegionCluster({_, _}, _, {{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...)
	github.com/cockroachdb/cockroach/pkg/ccl/multiregionccl/multiregionccltestutils/testutils.go:89 +0x18f
github.com/cockroachdb/cockroach/pkg/ccl/multiregionccl_test.TestRegionAddDropWithConcurrentBackupOps.func1(0xc026290340)
	github.com/cockroachdb/cockroach/pkg/ccl/multiregionccl_test/pkg/ccl/multiregionccl/region_test.go:942 +0x39d
testing.tRunner(0xc026290340, 0xc01ac9a000)
	GOROOT/src/testing/testing.go:1689 +0xfb
created by testing.(*T).Run in goroutine 733200
	GOROOT/src/testing/testing.go:1742 +0x390

@fqazi fqazi added the T-kv KV Team label May 22, 2024
@blathers-crl blathers-crl bot added this to Incoming in KV May 22, 2024
@fqazi
Copy link
Collaborator

fqazi commented May 22, 2024

We are seeing this test and another (#124457) stuck inside WaitForFullReplication. Does KV have any insight for why this might be happening?

@kvoli
Copy link
Collaborator

kvoli commented May 23, 2024

Seems like the last wait for full replication was blocked draining the queue, due to the replication queue's purgatory also processing up-replication:

goroutine 1028406 [select]:
google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc00e5b39e0)
	google.golang.org/grpc/internal/transport/external/org_golang_google_grpc/internal/transport/transport.go:328 +0x7c
google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
	google.golang.org/grpc/internal/transport/external/org_golang_google_grpc/internal/transport/transport.go:343
google.golang.org/grpc.(*csAttempt).recvMsg(0xc043a89ba0, {0x63f5b40, 0xc0168b93c0}, 0xc02155c140?)
	google.golang.org/grpc/external/org_golang_google_grpc/stream.go:1046 +0xc9
google.golang.org/grpc.(*clientStream).RecvMsg.func1(0x0?)
	google.golang.org/grpc/external/org_golang_google_grpc/stream.go:900 +0x1f
google.golang.org/grpc.(*clientStream).withRetry(0xc00e5b37a0, 0xc028cb8110, 0xc028cb8100)
	google.golang.org/grpc/external/org_golang_google_grpc/stream.go:751 +0x13a
google.golang.org/grpc.(*clientStream).RecvMsg(0xc00e5b37a0, {0x63f5b40?, 0xc0168b93c0?})
	google.golang.org/grpc/external/org_golang_google_grpc/stream.go:899 +0x113
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor.(*tracingClientStream).RecvMsg(0xc030033f20, {0x63f5b40?, 0xc0168b93c0?})
	github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:392 +0x31
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*multiRaftDelegateRaftSnapshotClient).Recv(0xc02155c130)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/bazel-out/k8-fastbuild/bin/pkg/kv/kvserver/kvserver_go_proto_/github.com/cockroachdb/cockroach/pkg/kv/kvserver/storage_services.pb.go:195 +0x46
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*RaftTransport).DelegateSnapshot(0x7f57f98?, {0x7f57f60, 0xc01de23a40}, 0xc0360eb480)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/raft_transport.go:1180 +0x1dc
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendSnapshotUsingDelegate.func2({0x7f57f60?, 0xc01de23a40?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:3004 +0x45
github.com/cockroachdb/cockroach/pkg/util/timeutil.RunWithTimeout({0x7f57f98?, 0xc0265f2c60?}, {0x6676979, 0xd}, 0x34630b8a000, 0xc028cb8598)
	github.com/cockroachdb/cockroach/pkg/util/timeutil/timeout.go:29 +0x97
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendSnapshotUsingDelegate(0xc005eee008, {0x7f57f98, 0xc0265f2c60}, {0x4, 0x4, 0x2, 0x1}, 0x1, 0x40c3888000000000)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:3001 +0x9dc
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).initializeRaftLearners(0xc005eee008, {0x7f57f98, 0xc0265f2c60}, 0xc0010cb3b0, 0x1, 0x40c3888000000000, {0x66b31d7, 0x16}, {0xc020a15590, 0x46}, ...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:1981 +0xaa5
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).changeReplicasImpl(0xc005eee008, {0x7f57f98, 0xc0265f2c60}, 0x7fd7e50?, 0x1, 0x40c3888000000000, {0x66b31d7, 0x16}, {0xc020a15590, 0x46}, ...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replica_command.go:1219 +0x458
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).changeReplicas(...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replicate_queue.go:1046
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).applyChange(0x0?, {0x7f57f98, 0xc0265f2c60}, {0x4, {0x7fb35d0, 0xc005eee008}, {0x7f21a10, 0xc00e4fc640}, {0x1, 0x1, ...}}, ...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replicate_queue.go:797 +0x22d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).processOneChange(0xc027d3c000, {0x7f57f98, 0xc0265f2c60}, 0xc005eee008, 0xc0010cb3b0, 0xc00e4fc500, 0x0, 0x0)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replicate_queue.go:885 +0x2bc
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).processOneChangeWithTracing(0xc027d3c000, {0x7f58008, 0xc02ee0bb90}, 0xc005eee008, 0xc0010cb3b0, 0xc00e4fc500)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replicate_queue.go:741 +0x159
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*replicateQueue).process(0xc027d3c000, {0x7f58008, 0xc02ee0bb90}, 0xc005eee008, {0x7fa60168b208, 0xc0172cf560})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/replicate_queue.go:650 +0x65d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica.func1({0x7f58008, 0xc02ee0bb90})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:984 +0xcd
github.com/cockroachdb/cockroach/pkg/util/timeutil.RunWithTimeout({0x7f57f98?, 0xc0265f2c30?}, {0xc024e3a840, 0x22}, 0xdf8475800, 0xc028cb9950)
	github.com/cockroachdb/cockroach/pkg/util/timeutil/timeout.go:29 +0x97
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplica(0xc0114eaf20, {0x7f57f98, 0xc0265f2c00}, {0x7fad920, 0xc005eee008})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:977 +0x454
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplicasInPurgatory.func1.2({0x7f57f98, 0xc0265f2c00})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:1328 +0xa5
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTask(0xc01dbfc480, {0x7f57f98, 0xc0265f2c00}, {0x1?, 0x1?}, 0xc028cb9b90)
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:320 +0xcb
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplicasInPurgatory.func1(0xc0114eaf20, {0x7f57f98, 0xc01d31bb00}, 0xc01dbfc480)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:1323 +0x5bd
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).processReplicasInPurgatory(0xc0114eaf20, {0x7f57f98, 0xc01d31bb00}, 0x0?)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:1340 +0x26
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*baseQueue).addToPurgatoryLocked.func2({0x7f57f98, 0xc01d31bb00})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/queue.go:1264 +0x2ac
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:485 +0x13a
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx in goroutine 1028291
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:476 +0x3fe

This doesn't hint that the test is dead(live)locked or similar, just slow waiting on snapshots.

@exalate-issue-sync exalate-issue-sync bot removed the T-kv KV Team label Jun 3, 2024
@fqazi
Copy link
Collaborator

fqazi commented Jun 5, 2024

This hasn't occurred again and based on Austen's analysis there wasn't any deadlock or livelock, so assuming this was just overload issue.

@fqazi fqazi closed this as completed Jun 5, 2024
SQL Foundations automation moved this from Triage to Done Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
KV
Incoming
Development

No branches or pull requests

3 participants