Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: segfault in rangecache.(*EvictionToken).SyncTokenAndMaybeUpdateCache #123146

Closed
sumeerbhola opened this issue Apr 26, 2024 · 1 comment · Fixed by #123151
Closed

kv: segfault in rangecache.(*EvictionToken).SyncTokenAndMaybeUpdateCache #123146

sumeerbhola opened this issue Apr 26, 2024 · 1 comment · Fixed by #123151
Assignees
Labels
backport-24.1.x Flags PRs that need to be backported to 24.1. branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-kv KV Team
Projects

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Apr 26, 2024

This is running master with a severely overloaded cluster (from a cpu perspective) due to admission-control/tpcc-severe-overload. Full dump in https://drive.google.com/file/d/1wB68PYml74nFWf3c3qxktZo6ERyhrYkB/view?usp=sharing

panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1f6ed24]

goroutine 5032729 gp=0xc16b2fcc40 m=12 mp=0xc000a01008 [running]:
panic({0x5acc8a0?, 0xb86d190?})
	GOROOT/src/runtime/panic.go:779 +0x158 fp=0xc387653290 sp=0xc3876531e0 pc=0x49bd58
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).recover(0xc001091b20?, {0x7d2fef0, 0xc3876209c0})
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:231 +0x65 fp=0xc3876532d8 sp=0xc387653290 pc=0x10caee5
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr.deferwrap1()
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:334 +0x28 fp=0xc387653300 sp=0xc3876532d8 pc=0x10cba68
panic({0x5acc8a0?, 0xb86d190?})
	GOROOT/src/runtime/panic.go:770 +0x132 fp=0xc3876533b0 sp=0xc387653300 pc=0x49bd32
runtime.panicmem(...)
	GOROOT/src/runtime/panic.go:261
runtime.sigpanic()
	GOROOT/src/runtime/signal_unix.go:881 +0x378 fp=0xc387653410 sp=0xc3876533b0 pc=0x4b56f8
github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache.(*EvictionToken).SyncTokenAndMaybeUpdateCache(0xc387653dc0, {0x7d2fef0, 0xc39967d770}, 0xc387381c10, 0xc387381ba0)
	github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangecache/range_cache.go:408 +0x64 fp=0xc3876536e8 sp=0xc387653410 pc=0x1f6ed24
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(0xc003758608, {0x7d2fef0, 0xc39967d770}, 0xc387633b00, {{0xc386f33170, 0x7, 0x8}, {0xc386f33170, 0x8, 0x8}}, ...)
	github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:2051 +0x37d fp=0xc387653d60 sp=0xc3876536e8 pc=0x1f9417d
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(0xc003758608, {0x7d2fef0, 0xc39967d770}, 0xc387633b00, {{0xc386f33170, 0x7, 0x8}, {0xc386f33170, 0x8, 0x8}}, ...)
	github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1635 +0x36a fp=0xc3876541b0 sp=0xc387653d60 pc=0x1f91b0a
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(0xc003758608, {0x7d2fef0, 0xc387620a80}, 0xc387633b00)
	github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1251 +0x646 fp=0xc387654420 sp=0xc3876541b0 pc=0x1f8fe66
github.com/cockroachdb/cockroach/pkg/server.(*Node).maybeProxyRequest(0xc0032ddc08, {0x7d2fef0, 0xc387620a80}, 0xc387633b00, 0xc39708fa80)
	github.com/cockroachdb/cockroach/pkg/server/node.go:1533 +0x1d9 fp=0xc387654520 sp=0xc387654420 pc=0x4e0f479
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal(0xc0032ddc08, {0x7d2fef0?, 0xc387620a50?}, {0xc003aff5c0?}, 0xc3876337a0)
	github.com/cockroachdb/cockroach/pkg/server/node.go:1434 +0x607 fp=0xc387654d20 sp=0xc387654520 pc=0x4e0e5a7
github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch(0xc0032ddc08, {0x7d2fef0, 0xc3876209c0}, 0xc3876337a0)
	github.com/cockroachdb/cockroach/pkg/server/node.go:1637 +0x2da fp=0xc3876553f0 sp=0xc387654d20 pc=0x4e0fdfa
github.com/cockroachdb/cockroach/pkg/kv/kvpb._Internal_Batch_Handler.func1({0x7d2fef0?, 0xc3876209c0?}, {0x640f7a0?, 0xc3876337a0?})
	github.com/cockroachdb/cockroach/pkg/kv/kvpb/bazel-out/aarch64-opt/bin/pkg/kv/kvpb/kvpb_go_proto_/github.com/cockroachdb/cockroach/pkg/kv/kvpb/api.pb.go:10585 +0xcb fp=0xc387655428 sp=0xc3876553f0 pc=0xfaf0eb
github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.ServerInterceptor.func12({0x7d2fef0, 0xc3876209c0}, {0x640f7a0, 0xc3876337a0}, 0xc2d2c833c0, 0xc386d43020)
	github.com/cockroachdb/cockroach/pkg/util/tracing/grpcinterceptor/grpc_interceptor.go:97 +0x4b2 fp=0xc3876555e8 sp=0xc387655428 pc=0x1c65012
google.golang.org/grpc.getChainUnaryHandler.func1({0x7d2fef0, 0xc3876209c0}, {0x640f7a0, 0xc3876337a0})
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1163 +0xb2 fp=0xc387655650 sp=0xc3876555e8 pc=0xc181f2
github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func3({0x7d2fef0, 0xc3876209c0}, {0x640f7a0, 0xc3876337a0}, 0xc2d2c833c0?, 0xc3873fcd40)
	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:169 +0x76 fp=0xc387655680 sp=0xc387655650 pc=0x1c65256
google.golang.org/grpc.getChainUnaryHandler.func1({0x7d2fef0, 0xc3876209c0}, {0x640f7a0, 0xc3876337a0})
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1163 +0xb2 fp=0xc3876556e8 sp=0xc387655680 pc=0xc181f2
github.com/cockroachdb/cockroach/pkg/rpc.kvAuth.unaryInterceptor({0xc002406000?, {{0x4706e5?}, {0x7d6f180?, 0xc003a2e960?}}}, {0x7d2fef0, 0xc3876209c0}, {0x640f7a0, 0xc3876337a0}, 0xc2d2c833c0, 0xc3873fccc0)
	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/auth.go:95 +0x1e4 fp=0xc3876557b8 sp=0xc3876556e8 pc=0x1c5b084
github.com/cockroachdb/cockroach/pkg/rpc.kvAuth.unaryInterceptor-fm({0x7d2fef0?, 0xc3876209c0?}, {0x640f7a0?, 0xc3876337a0?}, 0xc2d2c833c0?, 0xc386d43020?)
	<autogenerated>:1 +0x65 fp=0xc387655818 sp=0xc3876557b8 pc=0x1c7c965
google.golang.org/grpc.getChainUnaryHandler.func1({0x7d2fef0, 0xc3876209c0}, {0x640f7a0, 0xc3876337a0})
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1163 +0xb2 fp=0xc387655880 sp=0xc387655818 pc=0xc181f2
github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func1.1({0x7d2fef0?, 0xc3876209c0?})
	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:136 +0x36 fp=0xc3876558b8 sp=0xc387655880 pc=0x1c65576
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc001d00d80, {0x7d2fef0, 0xc3876209c0}, {0xc3873fcc80?, 0xc387655978?}, 0xc387655970)
	github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:337 +0xcb fp=0xc387655930 sp=0xc3876558b8 pc=0x10cb92b
github.com/cockroachdb/cockroach/pkg/rpc.NewServerEx.func1({0x7d2fef0?, 0xc3876209c0?}, {0x640f7a0?, 0xc3876337a0?}, 0xc2d2c833c0?, 0xc386d43020?)
	github.com/cockroachdb/cockroach/pkg/rpc/pkg/rpc/context.go:134 +0x8f fp=0xc3876559a8 sp=0xc387655930 pc=0x1c654cf
google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1({0x7d2fef0, 0xc3876209c0}, {0x640f7a0, 0xc3876337a0}, 0xc2d2c833c0, 0x70?)
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1154 +0x85 fp=0xc3876559f8 sp=0xc3876559a8 pc=0xc145c5
github.com/cockroachdb/cockroach/pkg/kv/kvpb._Internal_Batch_Handler({0x63dfd40, 0xc0032ddc08}, {0x7d2fef0, 0xc3876209c0}, 0xc2cd149730, 0xc001dd09e0)
	github.com/cockroachdb/cockroach/pkg/kv/kvpb/bazel-out/aarch64-opt/bin/pkg/kv/kvpb/kvpb_go_proto_/github.com/cockroachdb/cockroach/pkg/kv/kvpb/api.pb.go:10587 +0x143 fp=0xc387655a48 sp=0xc3876559f8 pc=0xfaef43
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00340dc20, {0x7d857e0, 0xc00d65c1a0}, 0xc385ba97a0, 0xc0017695f0, 0xb9517a0, 0x0)
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1336 +0xd16 fp=0xc387655e40 sp=0xc387655a48 pc=0xc18f56
google.golang.org/grpc.(*Server).handleStream(0xc00340dc20, {0x7d857e0, 0xc00d65c1a0}, 0xc385ba97a0, 0x0)
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:1704 +0x9da fp=0xc387655f68 sp=0xc387655e40 pc=0xc1dcba
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:965 +0x8d fp=0xc387655fe0 sp=0xc387655f68 pc=0xc16cad
runtime.goexit({})
	src/runtime/asm_amd64.s:1695 +0x1 fp=0xc387655fe8 sp=0xc387655fe0 pc=0x4d7dc1
created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 529
	google.golang.org/grpc/external/org_golang_google_grpc/server.go:963 +0x226

cc @andrewbaptist

Jira issue: CRDB-38216

@sumeerbhola sumeerbhola added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team labels Apr 26, 2024
@blathers-crl blathers-crl bot added this to Incoming in KV Apr 26, 2024
@andrewbaptist andrewbaptist self-assigned this Apr 26, 2024
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Apr 26, 2024
Previously in `DistSender.sendPartialBatch` we could hit an error path
during processing of a proxy request if the method was called with an
invalid token. This could happen if there was a range split or merge
which could invalidate the routing token prior to the method being
called.

This fix validates the token is not null before attempting to Sync.

Epic: none
Fixes: cockroachdb#123146

Release note: None
Copy link

blathers-crl bot commented Apr 26, 2024

Hi @andrewbaptist, please add branch-* labels to identify which branch(es) this GA-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@andrewbaptist andrewbaptist added branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 backport-24.1.x Flags PRs that need to be backported to 24.1. labels Apr 26, 2024
craig bot pushed a commit that referenced this issue May 2, 2024
123151: kvclient: check for invalid routing token r=arulajmani a=andrewbaptist

Previously in `DistSender.sendPartialBatch` we could hit an error path during processing of a proxy request if the method was called with an invalid token. This could happen if there was a range split or merge which could invalidate the routing token prior to the method being called.

This fix validates the token is not null before attempting to Sync.

Epic: none
Fixes: #123146

Release note: None

Co-authored-by: Andrew Baptist <baptist@cockroachlabs.com>
@craig craig bot closed this as completed in 3ca2126 May 2, 2024
KV automation moved this from Incoming to Closed May 2, 2024
blathers-crl bot pushed a commit that referenced this issue May 2, 2024
Previously in `DistSender.sendPartialBatch` we could hit an error path
during processing of a proxy request if the method was called with an
invalid token. This could happen if there was a range split or merge
which could invalidate the routing token prior to the method being
called. Hitting this error required a specific sequence of events.

The scenario is:
1) Start the proxy request with “stale” routing info  (v1)
2) Apply the changes from the client (v2) which invalidates the token
	 due to a merge or split.
3) Attempt to directly load the routng info from meta2.
4) Fail to load the routing info with a retriable range lookup error.
5) Attempt to apply the client a second time with an invalid token.

This fix only attempts to sync the token if the token is currently
valid.

Epic: none
Fixes: #123146

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue May 13, 2024
Previously a proxy request was handled using the same logic within
sendPartialBatch and sendToReplicas. There were short circuits to handle
the different cases of retries, but this was still incorrect in some
places. Instead this change intercepts proxy request at the start of
Send and creates and sends to the transport directly. This greatly
simplifies the code for proxy requests and additionally fixes complex
cases where the routing information changes.

Epic: none
Fixes: cockroachdb#123965
Informs: cockroachdb#123146

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue May 13, 2024
Previously a proxy request was handled using the same logic within
sendPartialBatch and sendToReplicas. There were short circuits to handle
the different cases of retries, but this was still incorrect in some
places. Instead this change intercepts proxy request at the start of
Send and creates and sends to the transport directly. This greatly
simplifies the code for proxy requests and additionally fixes complex
cases where the routing information changes.

Epic: none
Fixes: cockroachdb#123965
Informs: cockroachdb#123146

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue May 13, 2024
Previously in DistSender, a proxy request was handled using the same
logic within sendPartialBatch and sendToReplicas. There were short
circuits to handle the different cases of retries, but in cases with
changing range boundaries, it could return the wrong error. This change
intercepts proxy request at the start of DistSender.Send and sends to
the transport bypassing the rest of the DistSender retry logic. This
simplifies the code for proxy requests.

Epic: none
Fixes: cockroachdb#123965
Informs: cockroachdb#123146

Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue May 14, 2024
Previously in DistSender, a proxy request was handled using the same
logic within sendPartialBatch and sendToReplicas. There were short
circuits to handle the different cases of retries, but in cases with
changing range boundaries, it could return the wrong error. This change
intercepts proxy request at the start of DistSender.Send and sends to
the transport bypassing the rest of the DistSender retry logic. This
simplifies the code for proxy requests.

Epic: none
Fixes: cockroachdb#123965
Informs: cockroachdb#123146

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-24.1.x Flags PRs that need to be backported to 24.1. branch-release-24.1 Used to mark GA and release blockers and technical advisories for 24.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-kv KV Team
Projects
KV
Closed
Development

Successfully merging a pull request may close this issue.

2 participants