fetcher: don't hold batches across boundaries #66376

Previously, there was a bug in the decoding methods that returned the expected "seek prefix" for interleaved table when encountering a row from an unwanted table. The returned seek prefix was not as precise as it should have been: instead of returning the first key of the desired table, it returned a key somewhere in the middle of the found table. This did not cause any user-visible issues, since the cfetcher code is resilient to finding an incorrect interleaved table row. But, it caused interleaved table fetches to be less efficient than necessary. This commit also adds explicit capacities to the returned keys and values from the KV Fetcher, to prevent issues where callers accidentally slice past the length of a large-capacity backing slice that backs the returned kvs. Release note: None

Previously, the cfetcher was completely blind to whether or not the keys it retrieves from the kvfetcher lie on a batch boundary. This led to the cfetcher retaining a pointer to 2 fetched batches at once during fetches, since it doesn't deep copy key or value memory out of the batches. The retained pointer is required, because the cfetcher needs to keep the bytes of the last key and value it saw for various reasons. Now, the kvfetcher and kvbatchfetcher participate by returning whether or not the most recent returned key or batch lies on a *batch boundary*, meaning that the next invocation to the fetcher would require sending a BatchRequest to KV, getting more bytes back. If this condition is true, the cFetcher will now deep copy its last-seen KV instead of keeping a pointer into the most recent batch. This improves memory usage by 2x for a brief moment, since before, the cFetcher would always have a live pointer to both the last batch and the next batch until it had a chance to decode the first key of the next batch. Commit 5c00812 improved this same situation by drastically cutting down the window in which the last batch was retained. But it wasn't quite enough. After this commit, there should be no moment in time when the fetcher ever retains a pointer to more than one batch at the same time. Release note (sql change): reduce instantaneous memory usage during scans by up to 2x.

This commit reduces the number of interface calls in the hot path for fetching data by 1 per column per row. Release note: None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fetcher: don't hold batches across boundaries #66376

fetcher: don't hold batches across boundaries #66376

Commits on Jul 2, 2021