Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade/upgrades: TestMigrationWithFailures failed #97457

Closed
cockroach-teamcity opened this issue Feb 22, 2023 · 2 comments · Fixed by #97539
Closed

upgrade/upgrades: TestMigrationWithFailures failed #97457

cockroach-teamcity opened this issue Feb 22, 2023 · 2 comments · Fixed by #97539
Assignees
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 22, 2023

upgrade/upgrades.TestMigrationWithFailures failed with artifacts on master @ 286b3e235171a39b8f9910555affcc7ce310741a:

github.com/cockroachdb/cockroach/pkg/sql/row/kv_batch_fetcher.go:375 row.(*txnKVFetcher).SetupNextFetch ???
github.com/cockroachdb/cockroach/pkg/sql/colfetcher/cfetcher.go:500 colfetcher.cFetcherFirstBatchLimit ???
github.com/cockroachdb/cockroach/pkg/sql/colfetcher/cfetcher.go:549 colfetcher.(*cFetcher).StartScan ???
github.com/cockroachdb/cockroach/pkg/sql/colfetcher/colbatch_scan.go:229 colfetcher.(*ColBatchScan).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexec/invariants_checker.go:67 colexec.(*invariantsChecker).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexec/colexecutils/cancel_checker.go:53 colexecutils.(*CancelChecker).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexec/invariants_checker.go:67 colexec.(*invariantsChecker).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:358 colexecop.(*OneInputInitCloserHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:279 colexecop.(*OneInputHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:279 colexecop.(*OneInputHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:358 colexecop.(*OneInputInitCloserHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:358 colexecop.(*OneInputInitCloserHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:279 colexecop.(*OneInputHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:358 colexecop.(*OneInputInitCloserHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:358 colexecop.(*OneInputInitCloserHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:358 colexecop.(*OneInputInitCloserHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:279 colexecop.(*OneInputHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:358 colexecop.(*OneInputInitCloserHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:279 colexecop.(*OneInputHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexecop/operator.go:358 colexecop.(*OneInputInitCloserHelper).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexec/invariants_checker.go:67 colexec.(*invariantsChecker).Init ???
github.com/cockroachdb/cockroach/pkg/sql/colexec/materializer.go:260 colexec.(*Materializer).Start.func1 ???
github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:92 colexecerror.CatchVectorizedRuntimeError ???
github.com/cockroachdb/cockroach/pkg/sql/colexec/materializer.go:260 colexec.(*Materializer).Start ???
github.com/cockroachdb/cockroach/pkg/sql/row_source_to_plan_node.go:72 sql.(*rowSourceToPlanNode).startExec ???
github.com/cockroachdb/cockroach/pkg/sql/plan.go:523 sql.startExec.func2 ???
github.com/cockroachdb/cockroach/pkg/sql/walk.go:111 sql.(*planVisitor).visitInternal.func1 ???
github.com/cockroachdb/cockroach/pkg/sql/walk.go:298 sql.(*planVisitor).visitInternal ???
github.com/cockroachdb/cockroach/pkg/sql/walk.go:79 sql.(*planVisitor).visit ???
github.com/cockroachdb/cockroach/pkg/sql/walk.go:199 sql.(*planVisitor).visitInternal ???
github.com/cockroachdb/cockroach/pkg/sql/walk.go:92 sql.(*planVisitor).visitConcrete ???
github.com/cockroachdb/cockroach/pkg/sql/walk.go:298 sql.(*planVisitor).visitInternal ???
github.com/cockroachdb/cockroach/pkg/sql/walk.go:79 sql.(*planVisitor).visit ???
github.com/cockroachdb/cockroach/pkg/sql/walk.go:43 sql.walkPlan ???
github.com/cockroachdb/cockroach/pkg/sql/plan.go:526 sql.startExec ???
github.com/cockroachdb/cockroach/pkg/sql/plan_node_to_row_source.go:172 sql.(*planNodeToRowSource).Start ???
github.com/cockroachdb/cockroach/pkg/sql/colflow/flow_coordinator.go:122 colflow.(*FlowCoordinator).Start.func1 ???
github.com/cockroachdb/cockroach/pkg/sql/colexecerror/error.go:92 colexecerror.CatchVectorizedRuntimeError ???
github.com/cockroachdb/cockroach/pkg/sql/colflow/flow_coordinator.go:122 colflow.(*FlowCoordinator).Start ???
github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:732 execinfra.(*ProcessorBaseNoHelper).Run ???
github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:490 flowinfra.(*FlowBase).Run ???
github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:296 colflow.(*vectorizedFlow).Run ???
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:863 sql.(*DistSQLPlanner).Run ???
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1805 sql.(*DistSQLPlanner).PlanAndRun ???
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1555 sql.(*DistSQLPlanner).PlanAndRunAll ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1654 sql.(*connExecutor).execWithDistSQLEngine ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1316 sql.(*connExecutor).dispatchToExecutionEngine ???



Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/sql-schema

This test on roachdash | Improve this report!

Jira issue: CRDB-24711

@cockroach-teamcity cockroach-teamcity added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Feb 22, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Feb 22, 2023
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Feb 22, 2023
@kvoli kvoli self-assigned this Feb 22, 2023
@kvoli
Copy link
Collaborator

kvoli commented Feb 22, 2023

Caused by #97424. I'll look into this.

kvoli added a commit to kvoli/cockroach that referenced this issue Feb 23, 2023
Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: cockroachdb#97000
Resolves: cockroachdb#97445
Resolves: cockroachdb#97450
Resolves: cockroachdb#97452
Resolves: cockroachdb#97457

Release note: None
@cockroach-teamcity
Copy link
Member Author

upgrade/upgrades.TestMigrationWithFailures failed with artifacts on master @ fb6a8838344c7c0486ef92319a86312697196200:

github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:91 rpc.kvAuth.unaryInterceptor ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:232 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:234 rpc.NewServerEx.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1166 grpc.chainUnaryInterceptors.func1 ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:601 serverpb._Migration_BumpClusterVersion_Handler ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1368 grpc.(*Server).processUnaryRPC ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1713 grpc.(*Server).handleStream ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 grpc.(*Server).serveStreams.func1.2 ???

goroutine 7475664 lock 0xc0109ca3d8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:219 kvserver.(*RebalanceObjectiveManager).maybeUpdateRebalanceObjective ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:218 kvserver.(*RebalanceObjectiveManager).maybeUpdateRebalanceObjective ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:189 kvserver.newRebalanceObjectiveManager.func2 ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:227 clusterversion.(*handleImpl).SetOnChange.func1 ???
github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:145 settings.(*Values).settingChanged ???
github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:178 settings.(*Values).setGeneric ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:220 clusterversion.(*handleImpl).SetActiveVersion ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:219 clusterversion.(*handleImpl).SetActiveVersion ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:146 server.bumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:101 server.(*migrationServer).BumpClusterVersion.func1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:102 server.(*migrationServer).BumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:599 serverpb._Migration_BumpClusterVersion_Handler.func1 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:115 grpcinterceptor.ServerInterceptor.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1161 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:265 rpc.NewServerEx.func3 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:91 rpc.kvAuth.unaryInterceptor ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:232 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:234 rpc.NewServerEx.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1166 grpc.chainUnaryInterceptors.func1 ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:601 serverpb._Migration_BumpClusterVersion_Handler ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1368 grpc.(*Server).processUnaryRPC ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1713 grpc.(*Server).handleStream ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 grpc.(*Server).serveStreams.func1.2 ???

goroutine 6819170 lock 0xc01853c800
github.com/cockroachdb/cockroach/pkg/util/schedulerlatency/sampler.go:189 schedulerlatency.(*sampler).sampleOnTickAndInvokeCallbacks ??? <<<<<
github.com/cockroachdb/cockroach/pkg/util/schedulerlatency/sampler.go:188 schedulerlatency.(*sampler).sampleOnTickAndInvokeCallbacks ???
github.com/cockroachdb/cockroach/pkg/util/schedulerlatency/sampler.go:143 schedulerlatency.StartSampler.func1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 stop.(*Stopper).RunAsyncTaskEx.func2 ???



Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

kvoli added a commit to kvoli/cockroach that referenced this issue Feb 23, 2023
Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: cockroachdb#97000
Resolves: cockroachdb#97445
Resolves: cockroachdb#97450
Resolves: cockroachdb#97452
Resolves: cockroachdb#97457

Release note: None
craig bot pushed a commit that referenced this issue Feb 24, 2023
97148: changefeedccl: Expire protected timestamps r=miretskiy a=miretskiy

Changefeeds utilize protected timestamp system (PTS)
to ensure that the data targeted by changefeed is not
garbage collected prematurely.  PTS record is managed
by running changefeed by periodically updating
PTS record timestamp, so that the data older than
the that timestamp may be GCed.  However, if the
changefeed stops running when it is paused (either due
to operator action, or due to `on_error=pause` option,
the PTS record remains so that the changefeed can
be resumed at a later time. However, it is also possible
that operator may not notice that the job is paused for
too long, thus causing buildup of garbage data.

Excessive buildup of GC work is not great since it
impacts overall cluster performance, and, once GC can resume,
its cost is proportional to how much GC work needs to be done.
This PR introduces a new changefeed option
`gc_protect_expires_after` to automatically expire PTS records that
are too old.  This automatic expiration is a safety mechanism
in case changefeed job gets paused by an operator or due to
an error, while holding onto PTS record due to `protect_gc_on_pause`
option.
The operator is still expected to monitor changefeed jobs,
and to restart paused changefeeds expediently.  If the changefeed
job remains paused, and the underlying PTS records expires, then
the changefeed job will be canceled to prevent build up of GC data.

Epic: [CRDB-21953](https://cockroachlabs.atlassian.net/browse/CRDB-21953)
Informs #84598

Release note (enterprise change): Changefeed will automatically
expire PTS records for paused jobs if changefeed is configured
with `gc_protect_expires_after` option.

97539: kvserver: fix deadlock on rebalance obj change r=kvoli a=kvoli

Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: #97000
Resolves: #97445
Resolves: #97450
Resolves: #97452
Resolves: #97457

Release note: None

Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
@craig craig bot closed this as completed in #97539 Feb 24, 2023
@craig craig bot closed this as completed in 51f8f8e Feb 24, 2023
SQL Foundations automation moved this from Triage to Done Feb 24, 2023
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
Development

Successfully merging a pull request may close this issue.

2 participants