Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colmem: reset flat bytes vector when reused #49223

Merged
merged 1 commit into from
May 21, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented May 18, 2020

Our MaybeAppendColumn reuses the same vector if a vector of the
desired type is already in the requested position. However, the flat
bytes vector needs special treatment when reused - it needs to be reset
and previously we forgot to do so. This is now fixed.

The addition of vector resetting behavior required a change of
batchSchemaPrefixEnforcer (it is actually more like a fix) to make it
enforce that only a range (a "subset") of vectors that the projecting
operator (and its internal projecting operator chains) own. For example,
consider a scenario when we have a hash joiner that outputs two columns
which feeds into a case operator that has an output column and its
internal projecting chains use two other columns. Previously, the batch
schema prefix enforcer would be "maybe appending" all five columns
although it should only care about the columns at indices 2, 3, 4 (the
ones that case operator owns) and not pay attention to columns 0 and
1 because those are owned by the hash joiner. Now, the schema enforcer
is renamed to batchSchemaSubsetEnforcer and it correctly operates only
on the requested subset of columns. Without this change we would need to
modify Allocator.MaybeAppendColumn signature to include a boolean to
tell whether it is ok to reset the reused vector - if we didn't do it,
the batch schema enforcer would incorrectly reset the columns populated
by the hash joiner which would lead to incorrect results.

Release note (bug fix): Previously, in some cases an internal error
could occur when queries that have columns of BYTES type in the output
were executed via the vectorized engine, and this has been fixed.

@yuzefovich yuzefovich requested review from asubiotto and a team May 18, 2020 22:05
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the fix-flat-bytes branch 3 times, most recently from 88f66b6 to 78fb5a6 Compare May 19, 2020 01:56
@jordanlewis
Copy link
Member

Is there an issue this closes?

@yuzefovich
Copy link
Member Author

No, I encountered this bug on CI run of datum-backed vector PR. But the query I put in the logic test triggers an internal error on master (and probably on 20.1).

@yuzefovich
Copy link
Member Author

What exposed the problem is the support of "unknown null" - the query in this PR is slight modification of another query from sqlsmith logic test file with a single NULL return value removed.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)


pkg/sql/colexec/operator.go, line 335 at r2 (raw file):

// columns that the projecting operator and its internal projecting operators
// own.
func newBatchSchemaSubsetEnforcer(

The commit message only seems to mention the bytes reset issue. Could you add an explanation of the change from prefix to subset as well?

Our `MaybeAppendColumn` reuses the same vector if a vector of the
desired type is already in the requested position. However, the flat
bytes vector needs special treatment when reused - it needs to be reset
and previously we forgot to do so. This is now fixed.

The addition of vector resetting behavior required a change of
`batchSchemaPrefixEnforcer` (it is actually more like a fix) to make it
enforce that only a range (a "subset") of vectors that the projecting
operator (and its internal projecting operator chains) own. For example,
consider a scenario when we have a hash joiner that outputs two columns
which feeds into a case operator that has an output column and its
internal projecting chains use two other columns. Previously, the batch
schema prefix enforcer would be "maybe appending" all five columns
although it should only care about the columns at indices 2, 3, 4 (the
ones that case operator owns) and not pay attention to columns 0 and
1 because those are owned by the hash joiner. Now, the schema enforcer
is renamed to `batchSchemaSubsetEnforcer` and it correctly operates only
on the requested subset of columns. Without this change we would need to
modify `Allocator.MaybeAppendColumn` signature to include a boolean to
tell whether it is ok to reset the reused vector - if we didn't do it,
the batch schema enforcer would incorrectly reset the columns populated
by the hash joiner which would lead to incorrect results.

Release note (bug fix): Previously, in some cases an internal error
could occur when queries that have columns of BYTES type in the output
were executed via the vectorized engine, and this has been fixed.
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/colexec/operator.go, line 335 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

The commit message only seems to mention the bytes reset issue. Could you add an explanation of the change from prefix to subset as well?

Added extensive explanation for the reasoning to the commit and PR description.

I initially implemented the fix differently but then decided this change is cleaner and more "philosophically" correct.

@yuzefovich
Copy link
Member Author

Friendly ping on this since it blocks datum-backed vector PR (since that PR would introduce flakiness to logic tests).

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented May 21, 2020

Build succeeded

@craig craig bot merged commit a5bfe86 into cockroachdb:master May 21, 2020
@yuzefovich yuzefovich deleted the fix-flat-bytes branch May 21, 2020 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants