release-20.2: colexec: partially fix memory accounting in the external sorter #61016

yuzefovich · 2021-02-23T18:17:51Z

Backport 1/4 commits from #60284.
Backport 1/2 commits from #60593.

/cc @cockroachdb/release

colexec: remove double-counting of some memory in external sort

Previously, we would be registering the memory usage of the partition
that is currently being spilled twice:

in a standalone memory account in inputPartitioningOperator, this is
needed to know when to start a new partition
in the in-memory sorter used by the external sort to sort each
partition, this would happen during "spooling" phase.

However, the first usage is "fake" - we don't actually buffer all of the
data flowing through the input partitioning operator, so we would
effectively double-count some memory. This commit fixes that behavior by
removing the memory account altogether because a simple variable is
sufficient to determine where the partitions' boundaries are.

Release note: None

colexec: register memory used by dequeued batches from partitioned queue

Previously, we forgot to perform the memory accounting of the batches
that are dequeued from the partitions in the external sort (which could
be substantial when we're merging multiple partitions at once and the
tuples are wide) and in the hash based partitioner. This is now fixed.

Additionally, this commit retains references to some internal operators
in the external sort in order to reuse the memory under the dequeued
batches (this will be beneficial if we perform repeated merging).

Also, this commit fixes an issue with repeated re-initializing of the
disk-backed operators in the disk spiller if the latter has been reset
(the problem would lead to redundant allocations and not reusing of the
available memory).

Slight complication with accounting was because of the fact that we were
using the same allocator for all usages. This would be quite wrong
because in the merge phase we have two distinct memory usage with
different lifecycles - the memory under the dequeued batches is kept
(and reused later) whereas the memory under the output batch of the
ordered synchronizer is released. We now correctly handle these
lifecycles by using separate allocators.

Release note (bug fix): CockroachDB previously didn't account for some
RAM used when disk-spilling operations (like sorts and hash joins) were
using the temporary storage in the vectorized execution engine. This
could result in OOM crashes, especially when the rows are large in size.

Previously, we would be registering the memory usage of the partition that is currently being spilled twice: - in a standalone memory account in `inputPartitioningOperator`, this is needed to know when to start a new partition - in the in-memory sorter used by the external sort to sort each partition, this would happen during "spooling" phase. However, the first usage is "fake" - we don't actually buffer all of the data flowing through the input partitioning operator, so we would effectively double-count some memory. This commit fixes that behavior by removing the memory account altogether because a simple variable is sufficient to determine where the partitions' boundaries are. Release note: None

Previously, we forgot to perform the memory accounting of the batches that are dequeued from the partitions in the external sort (which could be substantial when we're merging multiple partitions at once and the tuples are wide) and in the hash based partitioner. This is now fixed. Additionally, this commit retains references to some internal operators in the external sort in order to reuse the memory under the dequeued batches (this will be beneficial if we perform repeated merging). Also, this commit fixes an issue with repeated re-initializing of the disk-backed operators in the disk spiller if the latter has been reset (the problem would lead to redundant allocations and not reusing of the available memory). Slight complication with accounting was because of the fact that we were using the same allocator for all usages. This would be quite wrong because in the merge phase we have two distinct memory usage with different lifecycles - the memory under the dequeued batches is kept (and reused later) whereas the memory under the output batch of the ordered synchronizer is released. We now correctly handle these lifecycles by using separate allocators. Release note (bug fix): CockroachDB previously didn't account for some RAM used when disk-spilling operations (like sorts and hash joins) were using the temporary storage in the vectorized execution engine. This could result in OOM crashes, especially when the rows are large in size.

cockroach-teamcity · 2021-02-23T18:17:58Z

This change is

yuzefovich · 2021-02-24T02:27:33Z

I picked these two commits for a backport because the first one has very low risk but is a very nice cleanup (and makes backporting the second easier) and the second commit is omission of some memory accounting entirely.

yuzefovich added 2 commits February 23, 2021 10:04

yuzefovich requested a review from asubiotto February 23, 2021 18:17

asubiotto approved these changes Feb 24, 2021

View reviewed changes

yuzefovich merged commit 32b5305 into cockroachdb:release-20.2 Feb 24, 2021

yuzefovich deleted the backport20.2-60284 branch February 24, 2021 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-20.2: colexec: partially fix memory accounting in the external sorter #61016

release-20.2: colexec: partially fix memory accounting in the external sorter #61016

yuzefovich commented Feb 23, 2021

cockroach-teamcity commented Feb 23, 2021

yuzefovich commented Feb 24, 2021

release-20.2: colexec: partially fix memory accounting in the external sorter #61016

release-20.2: colexec: partially fix memory accounting in the external sorter #61016

Conversation

yuzefovich commented Feb 23, 2021

cockroach-teamcity commented Feb 23, 2021

yuzefovich commented Feb 24, 2021