Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: TestDumpData fails with KV error #48147

Closed
petermattis opened this issue Apr 29, 2020 · 11 comments · Fixed by #48239
Closed

cli: TestDumpData fails with KV error #48147

petermattis opened this issue Apr 29, 2020 · 11 comments · Fixed by #48239
Assignees

Comments

@petermattis
Copy link
Collaborator

Failure seen on this CI build: https://teamcity.cockroachdb.com/viewLog.html?buildId=1908081&buildTypeId=Cockroach_UnitTests

I'm pretty sure this failure is unrelated to that PR. I didn't look closely at the failure. One interesting bit is this error which I believe indicates a KV programming error:

ERROR: cannot read undeclared span /NamespaceTable/30/1/0/0/"tmp"/4/1/NULL
declared:
read global: /NamespaceTable/30/1/0/0/"tmp"/4/1 at 1588162624.118438438,0
read local: /Local/RangeID/26/r/AbortSpan/"739f1b67-399a-41d3-bd85-0e39b6c2e9a9" at 0,0

Cc @tbg for triage

@petermattis petermattis added this to Incoming in KV via automation Apr 29, 2020
@tbg tbg self-assigned this Apr 29, 2020
@tbg
Copy link
Member

tbg commented Apr 29, 2020

Tried repro'ing this via https://github.com/cockroachdb/cockroach/compare/master...tbg:TestDumpData-48147?expand=1 (to avoid stressrace) but no dice so far (20 8cpu nodes):

7648 runs so far, 0 failures, over 27m20s

Going to try stressrace now.

Looking at the error above, it seems that we're evaluating a point lookup (i.e. GetRequest) to /NamespaceTable/30/1/0/0/"tmp"/4/1 but are finding that we're accessing that key's .Next().

One thing we should certainly do is dump a stack trace so it's more obvious where it's happening, I'll send a PR.

tbg added a commit to tbg/cockroach that referenced this issue Apr 29, 2020
It's going to be much easier figuring out what happened when the stack
is included.

See cockroachdb#48147.

Release note: None
craig bot pushed a commit that referenced this issue Apr 29, 2020
48161: spanset: print stack with assertion error r=petermattis,nvanbenschoten a=tbg

It's going to be much easier figuring out what happened when the stack
is included.

See #48147.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@tbg
Copy link
Member

tbg commented Apr 29, 2020

1313 runs so far, 0 failures, over 42m35s

Going to put this on the back burner.

@petermattis
Copy link
Collaborator Author

So maybe I'm wrong and this is due to my PR as I have another failure on CI: https://teamcity.cockroachdb.com/viewLog.html?buildId=1909094&buildTypeId=Cockroach_UnitTests. Relevant portion of the error (with your stacktrace):

ERROR: cannot read undeclared span /NamespaceTable/30/1/0/0/"tmp"/4/1/NULL
declared:
read global: /NamespaceTable/30/1/0/0/"tmp"/4/1 at 1588188642.699403696,0
read local: /Local/RangeID/26/r/AbortSpan/"82db2595-e3d6-4381-91aa-f2c42e6204c8" at 0,0

stack:
goroutine 128324 [running]:
runtime/debug.Stack(0xc001b0ce70, 0xc, 0x10)
	/usr/local/go/src/runtime/debug/stack.go:24 +0xab
github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).checkAllowed(0xc001b0ce40, 0x0, 0xc007681380, 0xd, 0x10, 0x0, 0x0, 0x0, 0xc00378bbb8, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:293 +0x3ad
github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*SpanSet).CheckAllowedAt(0xc001b0ce40, 0x0, 0xc007681380, 0xd, 0x10, 0x0, 0x0, 0x0, 0x160a6127ec6861b0, 0xc000000000, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/spanset.go:235 +0x116
github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset.(*Iterator).SeekGE(0xc006c60b40, 0xc007681380, 0xd, 0x10, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/spanset/batch.go:74 +0x367
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).iterSeek(0xc008e50b00, 0xc007681380, 0xd, 0x10, 0x0, 0x0, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:638 +0xb4
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).nextKey(0xc008e50b00, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:461 +0x57e
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).advanceKey(0xc008e50b00, 0x7fc7d580d480)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:519 +0xba
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).addAndAdvance(0xc008e50b00, 0x7fc7d580d480, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:571 +0x6b
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).getAndAdvance(0xc008e50b00, 0xc007681301)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:276 +0x53b
github.com/cockroachdb/cockroach/pkg/storage.(*pebbleMVCCScanner).get(0xc008e50b00)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/pebble_mvcc_scanner.go:162 +0x117
github.com/cockroachdb/cockroach/pkg/storage.mvccGet(0x7021e80, 0xc007e08510, 0x70ab980, 0xc006c60b40, 0xc007681320, 0xc, 0x10, 0x160a6127ec6861b0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:793 +0x2d7
github.com/cockroachdb/cockroach/pkg/storage.MVCCGet(0x7021e80, 0xc007e08510, 0x706c460, 0xc001b0cf60, 0xc007681320, 0xc, 0x10, 0x160a6127ec6861b0, 0xc000000000, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/mvcc.go:755 +0x1b8
github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval.Get(0x7021e80, 0xc007e08510, 0x706c460, 0xc001b0cf60, 0x70bd8a0, 0xc005a05420, 0x160a6127ec6861b0, 0x0, 0x100000001, 0x1, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/batcheval/cmd_get.go:34 +0x19f
github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateCommand(0x7021e80, 0xc007e08510, 0x0, 0x0, 0x0, 0x70a99a0, 0xc001b0cf60, 0x70bd8a0, 0xc005a05420, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_evaluate.go:473 +0x74c
github.com/cockroachdb/cockroach/pkg/kv/kvserver.evaluateBatch(0x7021e80, 0xc007e08510, 0x0, 0x0, 0x70a99a0, 0xc001b0cf60, 0x70bd8a0, 0xc005a05420, 0x0, 0xc004c5cd00, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_evaluate.go:241 +0x492
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeReadOnlyBatchWithServersideRefreshes(0xc0044a1800, 0x7021e80, 0xc007e08510, 0x70a99a0, 0xc001b0cf60, 0x70bd8a0, 0xc005a05420, 0xc004c5cd00, 0xc001b0ce40, 0xc006255e01, ...)

Let me see if I can reproduce locally.

@petermattis
Copy link
Collaborator Author

Yep, reproduces easily on #48145 using make testrace PKG=./cli TESTS=TestDumpData. Hrmm.

@petermattis
Copy link
Collaborator Author

And fails reliably on master with COCKROACH_STORAGE_ENGINE=pebble make testrace PKG=./pkg/cli TESTS=TestDumpData.

@petermattis
Copy link
Collaborator Author

I'm going to tag @itsbilal into this investigation as it looks like it could be a bug in pebbleMVCCScanner. Somewhat strange that this happens reliably (i.e. every run) under testrace, but not at all under non-race (i.e. stress).

@tbg
Copy link
Member

tbg commented Apr 30, 2020 via email

@itsbilal
Copy link
Member

This one is pretty easy to explain, and I confirmed it with some good ol' debug statements. It's not concerning at all.

When we Get() on a key, and the key doesn't exist (or there's a tombstone as in this case), we advanceKey before returning, with both rocksdb and pebble. But since the RocksDB iterator has specialized MVCCOps (because the MVCC scanner is in C++) while the Pebble one doesn't, the spanset iterator is able to run an additional assertion when SeekGE is called to skip over the deleted MVCC values, in the pebble case.

RocksDB is also SeekGE-ing to the same disallowed key, it's just that all of that action happens in the C++ MVCC scanner unchecked.

@petermattis
Copy link
Collaborator Author

Quick work tracking this down. Any thoughts on what should be done to fix it? Should we change pebbleMVCCScanner, or "unwrap" the iterator passed to pebbleMVCCScanner?

@itsbilal
Copy link
Member

Given an additional seek is expensive anyway, it's probably best to avoid it in pebbleMVCCScanner by short-circuiting advanceKey if we're in a Get.

We're usually wary of making changes to pebbleMVCCScanner (especially those that make it diverge from RocksDB), but this seems like a worthwhile, low-risk change to make.

@petermattis
Copy link
Collaborator Author

Given an additional seek is expensive anyway, it's probably best to avoid it in pebbleMVCCScanner by short-circuiting advanceKey if we're in a Get.

SGTM

itsbilal added a commit to itsbilal/cockroach that referenced this issue Apr 30, 2020
Previously, when doing an MVCCGet with the Pebble MVCC scanner,
we would advance keys after getting (or not getting) the value
we're looking for. In some cases this could result in a seek,
such as if there are too many revisions of that MVCC key. Seeks
are costly and at best avoided when it's ultimatey unnecessary.

Another side effect of doing this seek was that it would trip
an assertion in spanSet.Iterator that checked if all reads
were happening on allowed key spans. This caused TestDumpData
to fail on Pebble (but not on RocksDB since the RocksDB MVCC
scanner is specialized in C++ and doesn't flow through the same
assertion for seeks).

There's one additional change, stemming from a follow-up conversation
in cockroachdb#48160, around unifying logic for terminating an MVCC scan.
The getFromIntentHistory() case also calls into addAndAdvance()
now.

Fixes cockroachdb#48147.

Release note: None.
craig bot pushed a commit that referenced this issue Apr 30, 2020
48239: storage: Don't advance keys in MVCCGet with Pebble r=itsbilal a=itsbilal

Previously, when doing an MVCCGet with the Pebble MVCC scanner,
we would advance keys after getting (or not getting) the value
we're looking for. In some cases this could result in a seek,
such as if there are too many revisions of that MVCC key. Seeks
are costly and at best avoided when it's ultimatey unnecessary.

Another side effect of doing this seek was that it would trip
an assertion in spanSet.Iterator that checked if all reads
were happening on allowed key spans. This caused TestDumpData
to fail on Pebble (but not on RocksDB since the RocksDB MVCC
scanner is specialized in C++ and doesn't flow through the same
assertion for seeks).

There's one additional change, stemming from a follow-up conversation
in #48160, around unifying logic for terminating an MVCC scan.
The getFromIntentHistory() case also calls into addAndAdvance()
now.

Fixes #48147.

Release note: None.

Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
@craig craig bot closed this as completed in d0da8ef Apr 30, 2020
KV automation moved this from Incoming to Closed Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants