Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachpb,storage: Remove the IsNonKV flag #15355

Merged
merged 1 commit into from
Apr 26, 2017

Conversation

bdarnell
Copy link
Contributor

With the move to propEvalKV, the command queue is critical for correct
operation and all commands, even non-KV ones, need to go through it.
The original motivation for this flag (in #8130) was that non-KV
commands were inappropriately synchronizing on the start key of their
range; this is no longer true with the move to per-command DeclareKeys
functions.

This is expected to reduce Merge (timeseries) performance somewhat
because it reverts #9889.

Fixes #15003

@bdarnell bdarnell added this to the 1.0 milestone Apr 26, 2017
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator

Did you make stressrace TestScatter? It was taking me 5-10m to reproduce a failure.


Review status: 0 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/roachpb/api.go, line 908 at r1 (raw file):

// by the command queue (reordering is ok) and they operate on non-MVCC data so
// the timestamp cache is also unnecessary.
func (*MergeRequest) flags() int { return isWrite | isNonKV }

Can/should we change merge request "declare-keys" to return an empty set?


pkg/roachpb/api.go, line 911 at r1 (raw file):

func (*RequestLeaseRequest) flags() int {
	return isWrite | isAlone | isNonKV | skipLeaseCheck

Is it useful to go through the command queue for RequestLeaseRequest given that the command is not necessarily being evaluated on the leaseholder? I suppose it doesn't hurt.


pkg/roachpb/api.go, line 927 at r1 (raw file):

	// `redirectOnOrAcquireLease` already tentatively redirects to the
	// future lease holder.
	return isWrite | isAlone | isNonKV | skipLeaseCheck

I'm not clear on why skipLeaseCheck is specified here. Doesn't TransferLease have to run on the leaseholder?


Comments from Reviewable

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It appears to fix things for me locally

@a-robinson
Copy link
Contributor

Huh, well I haven't been able to repro the original failure, but I did hit a different panic:

E170426 15:43:13.477420 2711 util/log/stderr_redirect.go:66  [n1] a panic has occurred!
panic: span used after call to Finish():
&{tracer:<nil> event:0xccdd80 Mutex:{state:0 sema:0} raw:{Context:{TraceID:8535667686755371615 SpanID:2911314615026932369 Sampled:false Baggage:map[]} ParentSpanID:9096957308716108751 Operation:node.Batch Start:{sec:63628818192 nsec:306743425 loc:0x3315380} Duration:1137823811 Tags:map[] Logs:[]} numDroppedLogs:0} [recovered]
	panic: span used after call to Finish():
&{tracer:<nil> event:0xccdd80 Mutex:{state:0 sema:0} raw:{Context:{TraceID:8535667686755371615 SpanID:2911314615026932369 Sampled:false Baggage:map[]} ParentSpanID:9096957308716108751 Operation:node.Batch Start:{sec:63628818192 nsec:306743425 loc:0x3315380} Duration:1137823811 Tags:map[] Logs:[]} numDroppedLogs:0} [recovered]
	panic: span used after call to Finish():
&{tracer:<nil> event:0xccdd80 Mutex:{state:0 sema:0} raw:{Context:{TraceID:8535667686755371615 SpanID:2911314615026932369 Sampled:false Baggage:map[]} ParentSpanID:9096957308716108751 Operation:node.Batch Start:{sec:63628818192 nsec:306743425 loc:0x3315380} Duration:1137823811 Tags:map[] Logs:[]} numDroppedLogs:0} [recovered]
	panic: span used after call to Finish():
&{tracer:<nil> event:0xccdd80 Mutex:{state:0 sema:0} raw:{Context:{TraceID:8535667686755371615 SpanID:2911314615026932369 Sampled:false Baggage:map[]} ParentSpanID:9096957308716108751 Operation:node.Batch Start:{sec:63628818192 nsec:306743425 loc:0x3315380} Duration:1137823811 Tags:map[] Logs:[]} numDroppedLogs:0}

goroutine 2711 [running]:
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc4201806e0, 0x7f01f10bc600, 0xc422d53080)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:200 +0x104
panic(0x1ef4880, 0xc42357b240)
	/home/alex/go1.8/src/runtime/panic.go:489 +0x2f0
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send.func1(0xc422e15420, 0xc422e15498, 0x14b8fcf1d74ac810, 0x0, 0xc422e15490, 0xc4208fc400)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2425 +0x57b
panic(0x1ef4880, 0xc42357b240)
	/home/alex/go1.8/src/runtime/panic.go:489 +0x2f0
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges.func1(0xc422e13798, 0xc422e13de0, 0xc422e13dd8)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:684 +0x3d3
panic(0x1ef4880, 0xc42357b240)
	/home/alex/go1.8/src/runtime/panic.go:489 +0x2f0
github.com/cockroachdb/cockroach/vendor/github.com/opentracing/basictracer-go.(*spanImpl).maybeAssertSanityLocked(0xc421faa3c0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/vendor/github.com/opentracing/basictracer-go/debug.go:31 +0x443
github.com/cockroachdb/cockroach/vendor/github.com/opentracing/basictracer-go.(*spanImpl).Lock(0xc421faa3c0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/vendor/github.com/opentracing/basictracer-go/debug.go:25 +0x4d
github.com/cockroachdb/cockroach/vendor/github.com/opentracing/basictracer-go.(*spanImpl).LogFields(0xc421faa3c0, 0xc42341dec0, 0x1, 0x1)
	/home/alex/go/src/github.com/cockroachdb/cockroach/vendor/github.com/opentracing/basictracer-go/span.go:123 +0xf3
github.com/cockroachdb/cockroach/pkg/util/log.eventInternal(0x7f01f10bc600, 0xc421c3a0f0, 0x1e90000, 0x20c858e, 0x8, 0xc422e12a88, 0x3, 0x3)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/trace.go:132 +0x2cb
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x7f01f10bc600, 0xc421c3a0f0, 0x1, 0x2, 0x210358d, 0x2a, 0xc422e12c90, 0x3, 0x3)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:145 +0x2c6
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x7f01f10bc600, 0xc421c3a0f0, 0x1, 0x1, 0x210358d, 0x2a, 0xc422e12c90, 0x3, 0x3)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:68 +0x9a
github.com/cockroachdb/cockroach/pkg/util/log.Infof(0x7f01f10bc600, 0xc421c3a0f0, 0x210358d, 0x2a, 0xc422e12c90, 0x3, 0x3)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:86 +0x90
github.com/cockroachdb/cockroach/pkg/kv.(*RangeDescriptorCache).evictCachedRangeDescriptorLocked(0xc420082be0, 0x7f01f10bc600, 0xc421c3a0f0, 0xc4237bae93, 0x0, 0xd, 0xc423468d80, 0xc422e12d00, 0x572d22, 0xc421b09580)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:452 +0x2fe
github.com/cockroachdb/cockroach/pkg/kv.(*RangeDescriptorCache).lookupRangeDescriptorInternal.func3.1(0xc42341d4e8, 0x2ecf1a8)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:300 +0xe2
github.com/cockroachdb/cockroach/pkg/kv.(*EvictionToken).EvictAndReplace.func1()
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:210 +0xfc
sync.(*Once).Do(0xc42341d4c8, 0xc422e12e80)
	/home/alex/go1.8/src/sync/once.go:44 +0xe2
github.com/cockroachdb/cockroach/pkg/kv.(*EvictionToken).EvictAndReplace(0xc42341d4c0, 0x7f01f10bc600, 0xc42225c780, 0x0, 0x0, 0x0, 0x41, 0x0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:219 +0x127
github.com/cockroachdb/cockroach/pkg/kv.(*EvictionToken).Evict(0xc42341d4c0, 0x7f01f10bc600, 0xc42225c780, 0x41, 0xc422e13140)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/range_cache.go:196 +0x6c
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).sendPartialBatch(0xc42047c700, 0x7f01f10bc600, 0xc42225c780, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:965 +0x774
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).divideAndSendBatchToRanges(0xc42047c700, 0x7f01f10bc600, 0xc42225c780, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:814 +0xb6b
github.com/cockroachdb/cockroach/pkg/kv.(*DistSender).Send(0xc42047c700, 0x7f01f10bc600, 0xc42225c780, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/dist_sender.go:629 +0x3ca
github.com/cockroachdb/cockroach/pkg/kv.(*TxnCoordSender).Send(0xc4201f0f20, 0x7f01f10bc600, 0xc42225c6c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_coord_sender.go:456 +0x490
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).send(0xc42000c640, 0x7f01f10bc600, 0xc42354d860, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:528 +0x228
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).(github.com/cockroachdb/cockroach/pkg/internal/client.send)-fm(0x7f01f10bc600, 0xc42354d860, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:464 +0x97
github.com/cockroachdb/cockroach/pkg/internal/client.sendAndFill(0x7f01f10bc600, 0xc42354d860, 0xc422e14830, 0xc420ef5500, 0x28, 0x1fd4340)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:436 +0x13b
github.com/cockroachdb/cockroach/pkg/internal/client.(*DB).Run(0xc42000c640, 0x7f01f10bc600, 0xc42354d860, 0xc420ef5500, 0x0, 0x0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/internal/client/db.go:464 +0xf8
github.com/cockroachdb/cockroach/pkg/storage.(*intentResolver).resolveIntents.func1(0xc42225c6b0, 0xc42225c690)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/storage/intent_resolver.go:431 +0xb3
github.com/cockroachdb/cockroach/pkg/storage.(*intentResolver).resolveIntents(0xc42055aba0, 0x7f01f10bc600, 0xc42354d860, 0xc422a47c20, 0x1, 0x1, 0x14b8fcf1d74a0000, 0x0, 0x0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/storage/intent_resolver.go:442 +0x851
github.com/cockroachdb/cockroach/pkg/storage.(*intentResolver).processWriteIntentError(0xc42055aba0, 0x7f01f10bc600, 0xc42354d860, 0xc4216b33b0, 0x2ee38c0, 0xc4233b3230, 0x14b8fcf1d74ac810, 0x0, 0x100000001, 0x1, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/storage/intent_resolver.go:93 +0x1c0
github.com/cockroachdb/cockroach/pkg/storage.(*Store).Send(0xc4208fc400, 0x7f01f10bc600, 0xc42354d860, 0x14b8fcf1d6d8baf0, 0x0, 0x100000001, 0x1, 0x1, 0x0, 0xc4229540d0, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2602 +0xed6
github.com/cockroachdb/cockroach/pkg/storage.(*Stores).Send(0xc42056e100, 0x7f01f10bc600, 0xc42354d740, 0x0, 0x0, 0x100000001, 0x1, 0x1, 0x0, 0xc4229540d0, ...)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/storage/stores.go:187 +0x24b
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1(0x7f01f10bc600, 0xc42354d740, 0x0, 0x0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:840 +0x20f
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc4201806e0, 0x7f01f10bc600, 0xc422d53080, 0xc422e157d8, 0x0, 0x0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:272 +0x14f
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal(0xc4201a8580, 0x7f01f10bc600, 0xc422d53080, 0xc422df4028, 0xc422d53080, 0x1, 0xc400000001)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:851 +0x1c1
github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch(0xc4201a8580, 0x7f01f10bc600, 0xc422d53080, 0xc422df4028, 0x1f, 0x0, 0x0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:868 +0xb8
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext.func1.1(0xc42355b540, 0xc422df4000, 0x7f01f10bc600, 0xc4233b3890, 0xc4210c1f80, 0xc58ef1, 0xc4210c1f90)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:252 +0x73d
github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext.func1(0xc42355b540, 0xc422df4000, 0x7f01f10bc600, 0xc4233b3890, 0x2ee6d80, 0xc422c235e0, 0xc4230ee3c0)
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:265 +0xdd
created by github.com/cockroachdb/cockroach/pkg/kv.(*grpcTransport).SendNext
	/home/alex/go/src/github.com/cockroachdb/cockroach/pkg/kv/transport.go:278 +0x236

@tamird
Copy link
Contributor

tamird commented Apr 26, 2017

Reviewed 3 of 3 files at r1.
Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


pkg/roachpb/api.go, line 927 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm not clear on why skipLeaseCheck is specified here. Doesn't TransferLease have to run on the leaseholder?

I guess this is what the above comment is trying to justify. Not that it's exactly clear to me.


pkg/storage/replica.go, line 2014 at r1 (raw file):

	}

	var endCmds *endCmds

remove this and use := below?


pkg/storage/replica.go, line 2033 at r1 (raw file):

	// pErr evaluation to its value when returning.
	defer func() {
		if endCmds != nil {

remove this nil check?


pkg/storage/replica.go, line 2182 at r1 (raw file):

	// wrapped to delay pErr evaluation to its value when returning.
	defer func() {
		if endCmds != nil {

this nil check can go away now.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Apr 26, 2017

Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


pkg/storage/replica.go, line 2167 at r1 (raw file):

	}

	var endCmds *endCmds

remove this.


Comments from Reviewable

@petermattis
Copy link
Collaborator

@a-robinson The "span used after call to Finish" is likely a separate. I'd file a new issue and see if @RaduBerinde or @andreimatei can take a look.

With the move to propEvalKV, the command queue is critical for correct
operation and all commands, even non-KV ones, need to go through it.
The original motivation for this flag (in cockroachdb#8130) was that non-KV
commands were inappropriately synchronizing on the start key of their
range; this is no longer true with the move to per-command DeclareKeys
functions.

This is expected to reduce Merge (timeseries) performance somewhat
because it reverts cockroachdb#9889.

Fixes cockroachdb#15003
@RaduBerinde
Copy link
Member

I've been seeing "snapshot intersects existing range" errors during TESTING_RELOCATE (they also show up in a few open issues). Do you think this change addresses those as well?

@bdarnell
Copy link
Contributor Author

Review status: 2 of 3 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


pkg/roachpb/api.go, line 908 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can/should we change merge request "declare-keys" to return an empty set?

No; we still need to use the command queue to manage the merge command's relationship with any concurrent range splits. (we might be able to treat merge like a read instead of a write, but I don't know whether that would actually be safe)


pkg/roachpb/api.go, line 927 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I guess this is what the above comment is trying to justify. Not that it's exactly clear to me.

Yeah, that's what the comment is about but I'm having a hard time understanding it through all the layers of history.


pkg/storage/replica.go, line 2014 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

remove this and use := below?

Done.


pkg/storage/replica.go, line 2033 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

remove this nil check?

Done.


pkg/storage/replica.go, line 2167 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

remove this.

Done.


pkg/storage/replica.go, line 2182 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this nil check can go away now.

Not this one; we assign endCmds = nil sometimes below.


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

@RaduBerinde I don't see why this would lead to "snapshot intersects existing range" issues.

@tamird
Copy link
Contributor

tamird commented Apr 26, 2017

Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


Comments from Reviewable

@bdarnell
Copy link
Contributor Author

I'm going to merge this so we can start testing it. I still need to think some more about whether there are issues with RequestLease that aren't addressed by this change (and whether the confusing comment mentioned above means we're missing something with TransferLease).

@bdarnell bdarnell merged commit 05acf1e into cockroachdb:master Apr 26, 2017
@bdarnell bdarnell deleted the remove-nonkv-flag branch April 26, 2017 17:39
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Apr 28, 2017
Restores the pre-cockroachdb#15355 behavior for this command. RequestLease
is unique in that it is evaluated on followers too, where the command
queue is at best meaningless, and at worst can cause hangs and
possibly deadlocks.

Fixes cockroachdb#15391
bdarnell added a commit to bdarnell/cockroach that referenced this pull request Apr 28, 2017
Restores the pre-cockroachdb#15355 behavior for this command. RequestLease
is unique in that it is evaluated on followers too, where the command
queue is at best meaningless, and at worst can cause hangs and
possibly deadlocks.

Fixes cockroachdb#15391
@andreimatei
Copy link
Contributor

Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/roachpb/api.go, line 927 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Yeah, that's what the comment is about but I'm having a hard time understanding it through all the layers of history.

FWIW, I think this comment is trying to say that the TransferLeaseRequest can't be checked in redirectOnOrAcquireLease, for technical reasons. But it is checked that it is evaluated on the correct node in other places.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/roachpb/api.go, line 927 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

FWIW, I think this comment is trying to say that the TransferLeaseRequest can't be checked in redirectOnOrAcquireLease, for technical reasons. But it is checked that it is evaluated on the correct node in other places.

I agree with the TODO that it would be good to update the comment and point to the mechanism that is protecting TransferLeaseRequest.


Comments from Reviewable

bdarnell added a commit to bdarnell/cockroach that referenced this pull request Apr 28, 2017
Restores the pre-cockroachdb#15355 behavior for this command. RequestLease
is unique in that it is evaluated on followers too, where the command
queue is at best meaningless, and at worst can cause hangs and
possibly deadlocks.

Fixes cockroachdb#15391
@andreimatei
Copy link
Contributor

Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/roachpb/api.go, line 927 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I agree with the TODO that it would be good to update the comment and point to the mechanism that is protecting TransferLeaseRequest.

Attempting to improve this comment in #15788


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants