colexec: make unordered distinct streaming-like #57579

yuzefovich · 2020-12-04T18:07:27Z

Previously, when executing an unordered distinct, we would build the
whole hash table and consume the input source entirely before emitting
any output. This is a suboptimal behavior when the query has a limit -
we're likely to reach the limit long time before consuming the whole
input source.

This commit makes the unordered distinct more streaming-like - it builds
the hash table one batch at a time, and whenever some distinct tuples
are appended to the hash table, all of them are emitted in the output.

Fixes: #57566.

Release note (performance improvement): Previously, CockroachDB when
performing an unordered DISTINCT operation via the vectorized execution
engine would buffer up all tuples from the input which is a suboptimal
behavior when the query has a LIMIT clause, and this has now been fixed.
This behavior was introduced in 20.1. Note that the old row-by-row
engine doesn't have this issue.

cockroach-teamcity · 2020-12-04T18:07:33Z

This change is

yuzefovich · 2020-12-04T18:23:38Z

Benchmarks are here. I think the increase in allocations in some cases is due to the fact that we now can no longer allocate the output batch of large size once and, instead, we have dynamic batch size behavior for the output.

asubiotto

Reviewed 3 of 3 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/sql/colexec/hashtable.go, line 359 at r1 (raw file):

	if ht.buildMode != hashTableDistinctBuildMode {
		colexecerror.InternalError(errors.AssertionFailedf(
			"hashTable.fullBuild is called in unexpected build mode %d", ht.buildMode,

nit: s/fullBuild/distinctBuild

Previously, when executing an unordered distinct, we would build the whole hash table and consume the input source entirely before emitting any output. This is a suboptimal behavior when the query has a limit - we're likely to reach the limit long time before consuming the whole input source. This commit makes the unordered distinct more streaming-like - it builds the hash table one batch at a time, and whenever some distinct tuples are appended to the hash table, all of them are emitted in the output. Release note (performance improvement): Previously, CockroachDB when performing an unordered DISTINCT operation via the vectorized execution engine would buffer up all tuples from the input which is a suboptimal behavior when the query has a LIMIT clause, and this has now been fixed. This behavior was introduced in 20.1. Note that the old row-by-row engine doesn't have this issue.

yuzefovich

I decided to add a release note for this, and I think it'll be worth it to backport it to 20.2 since it is a regression between using the vec and the row engines.

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)

craig · 2020-12-07T16:35:49Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich requested review from asubiotto and a team December 4, 2020 18:07

asubiotto approved these changes Dec 7, 2020

View reviewed changes

yuzefovich force-pushed the distinct-limit branch from b3a567c to 18b429f Compare December 7, 2020 15:55

yuzefovich commented Dec 7, 2020

View reviewed changes

craig bot merged commit 278214f into cockroachdb:master Dec 7, 2020

yuzefovich deleted the distinct-limit branch December 7, 2020 17:04

yuzefovich mentioned this pull request Dec 7, 2020

release-20.2: colexec: make unordered distinct streaming-like #57643

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colexec: make unordered distinct streaming-like #57579

colexec: make unordered distinct streaming-like #57579

yuzefovich commented Dec 4, 2020 •

edited

cockroach-teamcity commented Dec 4, 2020

yuzefovich commented Dec 4, 2020 •

edited

asubiotto left a comment

yuzefovich left a comment

craig bot commented Dec 7, 2020

colexec: make unordered distinct streaming-like #57579

colexec: make unordered distinct streaming-like #57579

Conversation

yuzefovich commented Dec 4, 2020 • edited

cockroach-teamcity commented Dec 4, 2020

yuzefovich commented Dec 4, 2020 • edited

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Dec 7, 2020

yuzefovich commented Dec 4, 2020 •

edited

yuzefovich commented Dec 4, 2020 •

edited