colmem: reset flat bytes vector when reused #49223

yuzefovich · 2020-05-18T22:05:02Z

Our MaybeAppendColumn reuses the same vector if a vector of the
desired type is already in the requested position. However, the flat
bytes vector needs special treatment when reused - it needs to be reset
and previously we forgot to do so. This is now fixed.

The addition of vector resetting behavior required a change of
batchSchemaPrefixEnforcer (it is actually more like a fix) to make it
enforce that only a range (a "subset") of vectors that the projecting
operator (and its internal projecting operator chains) own. For example,
consider a scenario when we have a hash joiner that outputs two columns
which feeds into a case operator that has an output column and its
internal projecting chains use two other columns. Previously, the batch
schema prefix enforcer would be "maybe appending" all five columns
although it should only care about the columns at indices 2, 3, 4 (the
ones that case operator owns) and not pay attention to columns 0 and
1 because those are owned by the hash joiner. Now, the schema enforcer
is renamed to batchSchemaSubsetEnforcer and it correctly operates only
on the requested subset of columns. Without this change we would need to
modify Allocator.MaybeAppendColumn signature to include a boolean to
tell whether it is ok to reset the reused vector - if we didn't do it,
the batch schema enforcer would incorrectly reset the columns populated
by the hash joiner which would lead to incorrect results.

Release note (bug fix): Previously, in some cases an internal error
could occur when queries that have columns of BYTES type in the output
were executed via the vectorized engine, and this has been fixed.

cockroach-teamcity · 2020-05-18T22:05:09Z

This change is

jordanlewis · 2020-05-19T03:31:25Z

Is there an issue this closes?

yuzefovich · 2020-05-19T03:32:15Z

No, I encountered this bug on CI run of datum-backed vector PR. But the query I put in the logic test triggers an internal error on master (and probably on 20.1).

yuzefovich · 2020-05-19T03:34:01Z

What exposed the problem is the support of "unknown null" - the query in this PR is slight modification of another query from sqlsmith logic test file with a single NULL return value removed.

asubiotto

Reviewed 1 of 4 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)

pkg/sql/colexec/operator.go, line 335 at r2 (raw file):

// columns that the projecting operator and its internal projecting operators
// own.
func newBatchSchemaSubsetEnforcer(

The commit message only seems to mention the bytes reset issue. Could you add an explanation of the change from prefix to subset as well?

Our `MaybeAppendColumn` reuses the same vector if a vector of the desired type is already in the requested position. However, the flat bytes vector needs special treatment when reused - it needs to be reset and previously we forgot to do so. This is now fixed. The addition of vector resetting behavior required a change of `batchSchemaPrefixEnforcer` (it is actually more like a fix) to make it enforce that only a range (a "subset") of vectors that the projecting operator (and its internal projecting operator chains) own. For example, consider a scenario when we have a hash joiner that outputs two columns which feeds into a case operator that has an output column and its internal projecting chains use two other columns. Previously, the batch schema prefix enforcer would be "maybe appending" all five columns although it should only care about the columns at indices 2, 3, 4 (the ones that case operator owns) and not pay attention to columns 0 and 1 because those are owned by the hash joiner. Now, the schema enforcer is renamed to `batchSchemaSubsetEnforcer` and it correctly operates only on the requested subset of columns. Without this change we would need to modify `Allocator.MaybeAppendColumn` signature to include a boolean to tell whether it is ok to reset the reused vector - if we didn't do it, the batch schema enforcer would incorrectly reset the columns populated by the hash joiner which would lead to incorrect results. Release note (bug fix): Previously, in some cases an internal error could occur when queries that have columns of BYTES type in the output were executed via the vectorized engine, and this has been fixed.

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)

pkg/sql/colexec/operator.go, line 335 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

The commit message only seems to mention the bytes reset issue. Could you add an explanation of the change from prefix to subset as well?

Added extensive explanation for the reasoning to the commit and PR description.

I initially implemented the fix differently but then decided this change is cleaner and more "philosophically" correct.

yuzefovich · 2020-05-20T16:24:57Z

Friendly ping on this since it blocks datum-backed vector PR (since that PR would introduce flakiness to logic tests).

asubiotto

Reviewed 3 of 3 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained

yuzefovich · 2020-05-21T14:55:16Z

TFTR!

bors r+

craig · 2020-05-21T16:13:07Z

Build succeeded

GitHub CI (Cockroach)

yuzefovich added the backport-20.1.x label May 18, 2020

yuzefovich requested review from asubiotto and a team May 18, 2020 22:05

yuzefovich force-pushed the fix-flat-bytes branch 3 times, most recently from 88f66b6 to 78fb5a6 Compare May 19, 2020 01:56

asubiotto suggested changes May 19, 2020

View reviewed changes

yuzefovich force-pushed the fix-flat-bytes branch from 78fb5a6 to 88bbbce Compare May 19, 2020 16:27

yuzefovich commented May 19, 2020

View reviewed changes

asubiotto approved these changes May 21, 2020

View reviewed changes

craig bot merged commit a5bfe86 into cockroachdb:master May 21, 2020

yuzefovich deleted the fix-flat-bytes branch May 21, 2020 17:04

yuzefovich mentioned this pull request May 21, 2020

release-20.1: colmem: reset flat bytes vector when reused #49384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colmem: reset flat bytes vector when reused #49223

colmem: reset flat bytes vector when reused #49223

yuzefovich commented May 18, 2020 •

edited

Loading

cockroach-teamcity commented May 18, 2020

jordanlewis commented May 19, 2020

yuzefovich commented May 19, 2020

yuzefovich commented May 19, 2020

asubiotto left a comment

yuzefovich left a comment

yuzefovich commented May 20, 2020

asubiotto left a comment

yuzefovich commented May 21, 2020

craig bot commented May 21, 2020

colmem: reset flat bytes vector when reused #49223

colmem: reset flat bytes vector when reused #49223

Conversation

yuzefovich commented May 18, 2020 • edited Loading

cockroach-teamcity commented May 18, 2020

jordanlewis commented May 19, 2020

yuzefovich commented May 19, 2020

yuzefovich commented May 19, 2020

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented May 20, 2020

asubiotto left a comment

Choose a reason for hiding this comment

yuzefovich commented May 21, 2020

craig bot commented May 21, 2020

Build succeeded

yuzefovich commented May 18, 2020 •

edited

Loading