Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/kvccl/kvtenantccl: TestTenantUpgradeFailure failed #97445

Closed
cockroach-teamcity opened this issue Feb 22, 2023 · 2 comments · Fixed by #97539
Closed

ccl/kvccl/kvtenantccl: TestTenantUpgradeFailure failed #97445

cockroach-teamcity opened this issue Feb 22, 2023 · 2 comments · Fixed by #97539
Assignees
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 22, 2023

ccl/kvccl/kvtenantccl.TestTenantUpgradeFailure failed with artifacts on master @ 286b3e235171a39b8f9910555affcc7ce310741a:

github.com/cockroachdb/cockroach/pkg/server/migration.go:100 server.(*migrationServer).BumpClusterVersion.func1 ??? <<<<<
github.com/cockroachdb/cockroach/pkg/server/migration.go:99 server.(*migrationServer).BumpClusterVersion.func1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:102 server.(*migrationServer).BumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:598 serverpb._Migration_BumpClusterVersion_Handler.func1 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:115 grpcinterceptor.ServerInterceptor.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1161 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:265 rpc.NewServerEx.func3 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:91 rpc.kvAuth.unaryInterceptor ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:232 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:234 rpc.NewServerEx.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1166 grpc.chainUnaryInterceptors.func1 ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:600 serverpb._Migration_BumpClusterVersion_Handler ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1368 grpc.(*Server).processUnaryRPC ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1713 grpc.(*Server).handleStream ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 grpc.(*Server).serveStreams.func1.2 ???

goroutine 2588946 lock 0xc00d3a4660
github.com/cockroachdb/cockroach/pkg/kv/txn.go:361 kv.(*Txn).ReadTimestamp ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/txn.go:360 kv.(*Txn).ReadTimestamp ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/leased_descriptors.go:162 descs.(*leasedDescriptors).getResult ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/leased_descriptors.go:102 descs.(*leasedDescriptors).getByName ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/descriptor.go:576 descs.(*Collection).getNonVirtualDescriptorID.func6 ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/descriptor.go:614 descs.(*Collection).getNonVirtualDescriptorID ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/descriptor.go:398 descs.getDescriptorByName ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/getters.go:297 descs.ByNameGetter.Table ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/descs/helpers.go:138 descs.PrefixAndTable ???
github.com/cockroachdb/cockroach/pkg/sql/schema_resolver.go:164 sql.(*schemaResolver).LookupObject ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/resolver/resolver.go:366 resolver.ResolveExisting ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/resolver/resolver.go:183 resolver.ResolveExistingObject ???
github.com/cockroachdb/cockroach/pkg/sql/catalog/resolver/resolver.go:109 resolver.ResolveExistingTableObject ???
github.com/cockroachdb/cockroach/pkg/sql/opt_catalog.go:239 sql.(*optCatalog).ResolveDataSource ???
github.com/cockroachdb/cockroach/pkg/sql/opt/metadata.go:306 opt.(*Metadata).CheckDependencies ???
github.com/cockroachdb/cockroach/pkg/sql/opt/memo/memo.go:365 memo.(*Memo).IsStale ???
github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:138 sql.(*planner).prepareUsingOptimizer ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:292 sql.(*connExecutor).populatePrepared ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:244 sql.(*connExecutor).prepare.func2 ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:249 sql.(*connExecutor).prepare ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:113 sql.(*connExecutor).addPreparedStmt ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:81 sql.(*connExecutor).execPrepare ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2075 sql.(*connExecutor).execCmd ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1895 sql.(*connExecutor).run ???
github.com/cockroachdb/cockroach/pkg/sql/internal.go:171 sql.(*InternalExecutor).runWithEx.func1 ???



Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/multi-tenant

This test on roachdash | Improve this report!

Jira issue: CRDB-24699

@cockroach-teamcity cockroach-teamcity added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Feb 22, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Feb 22, 2023
@kvoli kvoli self-assigned this Feb 22, 2023
@kvoli
Copy link
Collaborator

kvoli commented Feb 22, 2023

Caused by #97424. I'll look into this.

kvoli added a commit to kvoli/cockroach that referenced this issue Feb 23, 2023
Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: cockroachdb#97000
Resolves: cockroachdb#97445
Resolves: cockroachdb#97450
Resolves: cockroachdb#97452
Resolves: cockroachdb#97457

Release note: None
@cockroach-teamcity
Copy link
Member Author

ccl/kvccl/kvtenantccl.TestTenantUpgradeFailure failed with artifacts on master @ e028ce5b14505dfd17ef8b13001c0ab8ac811e3c:

github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:733 execinfra.(*ProcessorBaseNoHelper).Run ???
github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:490 flowinfra.(*FlowBase).Run ???
github.com/cockroachdb/cockroach/pkg/sql/colflow/vectorized_flow.go:296 colflow.(*vectorizedFlow).Run ???
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:863 sql.(*DistSQLPlanner).Run ???
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1805 sql.(*DistSQLPlanner).PlanAndRun ???
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1555 sql.(*DistSQLPlanner).PlanAndRunAll ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1654 sql.(*connExecutor).execWithDistSQLEngine ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1316 sql.(*connExecutor).dispatchToExecutionEngine ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:743 sql.(*connExecutor).execStmtInOpenState ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:130 sql.(*connExecutor).execStmt.func1 ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:2541 sql.(*connExecutor).execWithProfiling ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:129 sql.(*connExecutor).execStmt ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:232 sql.(*connExecutor).execPortal ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2051 sql.(*connExecutor).execCmd.func2 ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:2061 sql.(*connExecutor).execCmd ???
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1895 sql.(*connExecutor).run ???
github.com/cockroachdb/cockroach/pkg/sql/internal.go:171 sql.(*InternalExecutor).runWithEx.func1 ???

goroutine 2599442 lock 0xc006f1a6d8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:219 kvserver.(*RebalanceObjectiveManager).maybeUpdateRebalanceObjective ??? <<<<<
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:218 kvserver.(*RebalanceObjectiveManager).maybeUpdateRebalanceObjective ???
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/rebalance_objective.go:189 kvserver.newRebalanceObjectiveManager.func2 ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:227 clusterversion.(*handleImpl).SetOnChange.func1 ???
github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:145 settings.(*Values).settingChanged ???
github.com/cockroachdb/cockroach/pkg/settings/pkg/settings/values.go:178 settings.(*Values).setGeneric ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:220 clusterversion.(*handleImpl).SetActiveVersion ???
github.com/cockroachdb/cockroach/pkg/clusterversion/pkg/clusterversion/clusterversion.go:219 clusterversion.(*handleImpl).SetActiveVersion ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:146 server.bumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:101 server.(*migrationServer).BumpClusterVersion.func1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/server/migration.go:102 server.(*migrationServer).BumpClusterVersion ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:599 serverpb._Migration_BumpClusterVersion_Handler.func1 ???
github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:115 grpcinterceptor.ServerInterceptor.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1161 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:265 rpc.NewServerEx.func3 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:91 rpc.kvAuth.unaryInterceptor ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:232 rpc.NewServerEx.func1.1 ???
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 stop.(*Stopper).RunTaskWithErr ???
github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:234 rpc.NewServerEx.func1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1164 grpc.chainUnaryInterceptors.func1.1 ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1166 grpc.chainUnaryInterceptors.func1 ???
github.com/cockroachdb/cockroach/pkg/server/serverpb/bazel-out/k8-fastbuild/bin/pkg/server/serverpb/serverpb_go_proto_/github.com/cockroachdb/cockroach/pkg/server/serverpb/migration.pb.go:601 serverpb._Migration_BumpClusterVersion_Handler ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1368 grpc.(*Server).processUnaryRPC ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:1713 grpc.(*Server).handleStream ???
google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 grpc.(*Server).serveStreams.func1.2 ???



Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

kvoli added a commit to kvoli/cockroach that referenced this issue Feb 23, 2023
Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: cockroachdb#97000
Resolves: cockroachdb#97445
Resolves: cockroachdb#97450
Resolves: cockroachdb#97452
Resolves: cockroachdb#97457

Release note: None
craig bot pushed a commit that referenced this issue Feb 24, 2023
97148: changefeedccl: Expire protected timestamps r=miretskiy a=miretskiy

Changefeeds utilize protected timestamp system (PTS)
to ensure that the data targeted by changefeed is not
garbage collected prematurely.  PTS record is managed
by running changefeed by periodically updating
PTS record timestamp, so that the data older than
the that timestamp may be GCed.  However, if the
changefeed stops running when it is paused (either due
to operator action, or due to `on_error=pause` option,
the PTS record remains so that the changefeed can
be resumed at a later time. However, it is also possible
that operator may not notice that the job is paused for
too long, thus causing buildup of garbage data.

Excessive buildup of GC work is not great since it
impacts overall cluster performance, and, once GC can resume,
its cost is proportional to how much GC work needs to be done.
This PR introduces a new changefeed option
`gc_protect_expires_after` to automatically expire PTS records that
are too old.  This automatic expiration is a safety mechanism
in case changefeed job gets paused by an operator or due to
an error, while holding onto PTS record due to `protect_gc_on_pause`
option.
The operator is still expected to monitor changefeed jobs,
and to restart paused changefeeds expediently.  If the changefeed
job remains paused, and the underlying PTS records expires, then
the changefeed job will be canceled to prevent build up of GC data.

Epic: [CRDB-21953](https://cockroachlabs.atlassian.net/browse/CRDB-21953)
Informs #84598

Release note (enterprise change): Changefeed will automatically
expire PTS records for paused jobs if changefeed is configured
with `gc_protect_expires_after` option.

97539: kvserver: fix deadlock on rebalance obj change r=kvoli a=kvoli

Previously, changing the rebalance objective could lead to inconsistent
locking order between the load based splitter and rebalance objective.
When the objective was updated, the previous method also blocked
batch requests from completing until every replica lb splitter was
reset.

This commit moves the split objective to be a variable owned by the
decider, rather than inferred on each decider operation. The split
objective is updated on a rebalance objective change atomically over
each replica but not atomically over a store. This removes the need for
blocking batch requests until every replica is updated.

Resolves: #97000
Resolves: #97445
Resolves: #97450
Resolves: #97452
Resolves: #97457

Release note: None

Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
@craig craig bot closed this as completed in 51f8f8e Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants