release-20.2: colexec: make unordered distinct streaming-like #57643
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #57579.
/cc @cockroachdb/release
Previously, when executing an unordered distinct, we would build the
whole hash table and consume the input source entirely before emitting
any output. This is a suboptimal behavior when the query has a limit -
we're likely to reach the limit long time before consuming the whole
input source.
This commit makes the unordered distinct more streaming-like - it builds
the hash table one batch at a time, and whenever some distinct tuples
are appended to the hash table, all of them are emitted in the output.
Fixes: #57566.
Release note (performance improvement): Previously, CockroachDB when
performing an unordered DISTINCT operation via the vectorized execution
engine would buffer up all tuples from the input which is a suboptimal
behavior when the query has a LIMIT clause, and this has now been fixed.
This behavior was introduced in 20.1. Note that the old row-by-row
engine doesn't have this issue.