fetcher: don't hold batches across boundaries #66376

jordanlewis · 2021-06-11T17:48:28Z

Updates #64906.

Previously, the cfetcher was completely blind to whether or not the keys
it retrieves from the kvfetcher lie on a batch boundary. This led to
the cfetcher retaining a pointer to 2 fetched batches at once during
fetches, since it doesn't deep copy key or value memory out of the
batches. The retained pointer is required, because the cfetcher needs to
keep the bytes of the last key and value it saw for various reasons.

Now, the kvfetcher and kvbatchfetcher participate by returning whether
or not the most recent returned key or batch lies on a batch boundary,
meaning that the next invocation to the fetcher would require sending a
BatchRequest to KV, getting more bytes back.

If this condition is true, the cFetcher will now deep copy its last-seen
KV instead of keeping a pointer into the most recent batch.

This improves memory usage by 2x for a brief moment, since before, the
cFetcher would always have a live pointer to both the last batch and the
next batch until it had a chance to decode the first key of the next
batch.

Commit 5c00812 improved this same
situation by drastically cutting down the window in which the last batch
was retained. But it wasn't quite enough. After this commit, there
should be no moment in time when the fetcher ever retains a pointer to
more than one batch at the same time.

Release note (sql change): reduce instantaneous memory usage during
scans by up to 2x.

cockroach-teamcity · 2021-06-11T17:48:34Z

This change is

jordanlewis · 2021-06-11T20:49:51Z

This PR was failing tests that uncovered an inaccuracy in the way that interleaved "seek prefixes" were previously computed. Added a first commit that fixes that problem.

sumeerbhola

Reviewed 2 of 2 files at r1, 3 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

pkg/sql/row/fetcher.go, line 719 at r2 (raw file):

gcassert:inline

Interesting. Do we run gcassert as part of our build?

pkg/sql/row/kv_batch_fetcher.go, line 544 at r2 (raw file):

	if len(f.remainingBatches) > 0 {
		batchResp, f.remainingBatches = popBatch(f.remainingBatches)
		atBatchBoundary = len(f.remainingBatches) == 0 && !f.fetchEnd

len(f.remainingBatches) == 0 doesn't seem to be a sufficient condition for this fetcher to have to issue a new BatchRequest -- there may still be responses left in f.responses.

The code in this file could use some more comments. It is doing something quite straightforward but it took me a while to figure out that the results from a BatchResponse are being extracted in two levels -- first in each response and then the BatchResponses in each response.
And the origSpan is actually staying the same for multiple []byte that share the same BatchResponses. Hopefully the higher-layer code doesn't interpret the span as something that is fully satisfied by a single batch it has been given. Speaking of that, I don't see any write to f.origSpan -- where is it updated? Do we have a bug here when a scan uses BATCH_RESPONSE instead of KEY_VALUES?

pkg/sql/row/kv_batch_fetcher.go, line 559 at r2 (raw file):

				batchResp, f.remainingBatches = popBatch(t.BatchResponses)
			}
			atBatchBoundary = len(f.responses) == 0 && len(f.remainingBatches) == 0 && !f.fetchEnd

how about a separate function with this condition -- I am assuming it will get inlined.

pkg/sql/row/kv_fetcher.go, line 105 at r2 (raw file):

		f.newSpan = false
		nKvs := len(f.kvs)
		if nKvs != 0 {

Can you add a comment that f.kvs is not used when f.batchResponse is non-nil and vice versa.

yuzefovich

Nice find! I have some nits, but I'll defer to Sumeer for approval.

Reviewed 2 of 2 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @sumeerbhola)

pkg/sql/row/fetcher.go, line 55 at r2 (raw file):

	// version - both must be handled by calling code.
	nextBatch(ctx context.Context) (ok bool, kvs []roachpb.KeyValue,
		batchResponse []byte, origSpan roachpb.Span, moreKeys bool, err error)

nit: s/moreKeys/atBatchBoundary/, right?

pkg/sql/row/fetcher.go, line 719 at r2 (raw file):

Previously, sumeerbhola wrote…

gcassert:inline
Interesting. Do we run gcassert as part of our build?

It is TestGCAssert linter.

Jordan, you'll need to include sql/row package to run the GCAssert against. Or maybe we should switch to opt-in by default for the linter? Will it be too slow?

pkg/sql/row/fetcher.go, line 722 at r2 (raw file):

// gcassert:inline
func (rf *Fetcher) setNextKV(kv roachpb.KeyValue, needsCopy bool) {
	if needsCopy {

super nit: I like having symmetry if possible, maybe refactor one of setNextKV methods to have the structure of the other (e.g. having if !needsCopy check here and return early).

pkg/sql/row/kv_batch_fetcher.go, line 559 at r2 (raw file):

Previously, sumeerbhola wrote…

how about a separate function with this condition -- I am assuming it will get inlined.

+1

Previously, there was a bug in the decoding methods that returned the expected "seek prefix" for interleaved table when encountering a row from an unwanted table. The returned seek prefix was not as precise as it should have been: instead of returning the first key of the desired table, it returned a key somewhere in the middle of the found table. This did not cause any user-visible issues, since the cfetcher code is resilient to finding an incorrect interleaved table row. But, it caused interleaved table fetches to be less efficient than necessary. This commit also adds explicit capacities to the returned keys and values from the KV Fetcher, to prevent issues where callers accidentally slice past the length of a large-capacity backing slice that backs the returned kvs. Release note: None

jordanlewis

Alright, PTAL. Sorry for the long delay, thanks for the reviews. I think I was able to simplify this by a decent amount.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @yuzefovich)

pkg/sql/row/fetcher.go, line 719 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It is TestGCAssert linter.

Jordan, you'll need to include sql/row package to run the GCAssert against. Or maybe we should switch to opt-in by default for the linter? Will it be too slow?

Done.

Yeah I think it would be too slow to turn on for all packages.

pkg/sql/row/fetcher.go, line 722 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: I like having symmetry if possible, maybe refactor one of setNextKV methods to have the structure of the other (e.g. having if !needsCopy check here and return early).

Done.

pkg/sql/row/kv_batch_fetcher.go, line 544 at r2 (raw file):

Previously, sumeerbhola wrote…

len(f.remainingBatches) == 0 doesn't seem to be a sufficient condition for this fetcher to have to issue a new BatchRequest -- there may still be responses left in f.responses.

The code in this file could use some more comments. It is doing something quite straightforward but it took me a while to figure out that the results from a BatchResponse are being extracted in two levels -- first in each response and then the BatchResponses in each response.
And the origSpan is actually staying the same for multiple []byte that share the same BatchResponses. Hopefully the higher-layer code doesn't interpret the span as something that is fully satisfied by a single batch it has been given. Speaking of that, I don't see any write to f.origSpan -- where is it updated? Do we have a bug here when a scan uses BATCH_RESPONSE instead of KEY_VALUES?

Thinking this through some more, I actually always want this function to be returning true, so I've removed the parameter. The purpose of this exercise was to allow higher-level code to know, given a returned roachpb.KeyValue, whether it was the final reference to a larger backing byte slice - indicating that the next call to NextKV would potentially ask for another larger backing byte slice before the caller had a chance to clear its references to the previous larger backing byte slice.

This is always true for every byte slice batch - since for every batch, the only remaining pointers to that batch once its finished is a returned KV. So we can handle this logic completely in the layer that slices up BatchResponses into KVs one by one (the kv_fetcher.go code).

I added some comments about what is actually going on in this loop. I agree it's confusing, the recursion doesn't help although I'm not sure the best way to structure the loop without the recursive call at the end.

You're right about f.origSpan: we don't seem to update it properly. It actually turns out that nobody uses this return value - I deleted it altogether because it's confusing and broken to your point.

pkg/sql/row/kv_batch_fetcher.go, line 559 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

+1

I was able to delete this altogether.

pkg/sql/row/kv_fetcher.go, line 105 at r2 (raw file):

Previously, sumeerbhola wrote…

Can you add a comment that f.kvs is not used when f.batchResponse is non-nil and vice versa.

Done.

Previously, the cfetcher was completely blind to whether or not the keys it retrieves from the kvfetcher lie on a batch boundary. This led to the cfetcher retaining a pointer to 2 fetched batches at once during fetches, since it doesn't deep copy key or value memory out of the batches. The retained pointer is required, because the cfetcher needs to keep the bytes of the last key and value it saw for various reasons. Now, the kvfetcher and kvbatchfetcher participate by returning whether or not the most recent returned key or batch lies on a *batch boundary*, meaning that the next invocation to the fetcher would require sending a BatchRequest to KV, getting more bytes back. If this condition is true, the cFetcher will now deep copy its last-seen KV instead of keeping a pointer into the most recent batch. This improves memory usage by 2x for a brief moment, since before, the cFetcher would always have a live pointer to both the last batch and the next batch until it had a chance to decode the first key of the next batch. Commit 5c00812 improved this same situation by drastically cutting down the window in which the last batch was retained. But it wasn't quite enough. After this commit, there should be no moment in time when the fetcher ever retains a pointer to more than one batch at the same time. Release note (sql change): reduce instantaneous memory usage during scans by up to 2x.

This commit reduces the number of interface calls in the hot path for fetching data by 1 per column per row. Release note: None

sumeerbhola

Reviewed 1 of 5 files at r3, 3 of 6 files at r6, 3 of 3 files at r8, 1 of 1 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola and @yuzefovich)

jordanlewis · 2021-07-09T16:04:34Z

TFTR!

bors r+

jordanlewis · 2021-07-09T16:07:47Z

(this is just a test, not really backporting)

blathers backport 21.1

blathers-crl · 2021-07-09T16:07:50Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from 99a96f1 to blathers/backport-release-21.1-66376: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.1 failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

craig · 2021-07-09T16:53:58Z

Build succeeded:

GitHub CI (Cockroach)

jordanlewis requested review from a team and sumeerbhola June 11, 2021 17:48

jordanlewis mentioned this pull request Jun 11, 2021

kv,storage: use a BytesMonitor to track memory allocations for scans #66362

Merged

jordanlewis force-pushed the cfetcher-boundary-copy branch 2 times, most recently from fb753d6 to ae5b218 Compare June 11, 2021 20:49

sumeerbhola requested changes Jun 14, 2021

View reviewed changes

yuzefovich reviewed Jun 14, 2021

View reviewed changes

jordanlewis force-pushed the cfetcher-boundary-copy branch 2 times, most recently from ec081dc to 138d77e Compare July 2, 2021 20:50

jordanlewis commented Jul 2, 2021

View reviewed changes

jordanlewis added 2 commits July 2, 2021 17:56

colfetcher: use cached types array

e42fe76

This commit reduces the number of interface calls in the hot path for fetching data by 1 per column per row. Release note: None

jordanlewis force-pushed the cfetcher-boundary-copy branch from 138d77e to e42fe76 Compare July 2, 2021 20:57

sumeerbhola approved these changes Jul 9, 2021

View reviewed changes

craig bot merged commit 8f65a25 into cockroachdb:master Jul 9, 2021

jordanlewis deleted the cfetcher-boundary-copy branch July 9, 2021 20:36

jseldess mentioned this pull request Sep 8, 2021

fetcher: don't hold batches across boundaries cockroachdb/docs#11362

Closed

RaduBerinde mentioned this pull request Jan 21, 2022

Database does not return all rows on select query #74736

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fetcher: don't hold batches across boundaries #66376

fetcher: don't hold batches across boundaries #66376

jordanlewis commented Jun 11, 2021

cockroach-teamcity commented Jun 11, 2021

jordanlewis commented Jun 11, 2021

sumeerbhola left a comment

yuzefovich left a comment

jordanlewis left a comment

sumeerbhola left a comment

jordanlewis commented Jul 9, 2021

jordanlewis commented Jul 9, 2021

blathers-crl bot commented Jul 9, 2021

craig bot commented Jul 9, 2021

fetcher: don't hold batches across boundaries #66376

fetcher: don't hold batches across boundaries #66376

Conversation

jordanlewis commented Jun 11, 2021

cockroach-teamcity commented Jun 11, 2021

jordanlewis commented Jun 11, 2021

sumeerbhola left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

jordanlewis commented Jul 9, 2021

jordanlewis commented Jul 9, 2021

blathers-crl bot commented Jul 9, 2021

craig bot commented Jul 9, 2021