-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colmem: reset flat bytes vector when reused #49223
Conversation
88f66b6
to
78fb5a6
Compare
Is there an issue this closes? |
No, I encountered this bug on CI run of datum-backed vector PR. But the query I put in the logic test triggers an internal error on master (and probably on 20.1). |
What exposed the problem is the support of "unknown null" - the query in this PR is slight modification of another query from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/sql/colexec/operator.go, line 335 at r2 (raw file):
// columns that the projecting operator and its internal projecting operators // own. func newBatchSchemaSubsetEnforcer(
The commit message only seems to mention the bytes reset issue. Could you add an explanation of the change from prefix to subset as well?
Our `MaybeAppendColumn` reuses the same vector if a vector of the desired type is already in the requested position. However, the flat bytes vector needs special treatment when reused - it needs to be reset and previously we forgot to do so. This is now fixed. The addition of vector resetting behavior required a change of `batchSchemaPrefixEnforcer` (it is actually more like a fix) to make it enforce that only a range (a "subset") of vectors that the projecting operator (and its internal projecting operator chains) own. For example, consider a scenario when we have a hash joiner that outputs two columns which feeds into a case operator that has an output column and its internal projecting chains use two other columns. Previously, the batch schema prefix enforcer would be "maybe appending" all five columns although it should only care about the columns at indices 2, 3, 4 (the ones that case operator owns) and not pay attention to columns 0 and 1 because those are owned by the hash joiner. Now, the schema enforcer is renamed to `batchSchemaSubsetEnforcer` and it correctly operates only on the requested subset of columns. Without this change we would need to modify `Allocator.MaybeAppendColumn` signature to include a boolean to tell whether it is ok to reset the reused vector - if we didn't do it, the batch schema enforcer would incorrectly reset the columns populated by the hash joiner which would lead to incorrect results. Release note (bug fix): Previously, in some cases an internal error could occur when queries that have columns of BYTES type in the output were executed via the vectorized engine, and this has been fixed.
78fb5a6
to
88bbbce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/colexec/operator.go, line 335 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
The commit message only seems to mention the bytes reset issue. Could you add an explanation of the change from prefix to subset as well?
Added extensive explanation for the reasoning to the commit and PR description.
I initially implemented the fix differently but then decided this change is cleaner and more "philosophically" correct.
Friendly ping on this since it blocks datum-backed vector PR (since that PR would introduce flakiness to logic tests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained
TFTR! bors r+ |
Build succeeded |
Our
MaybeAppendColumn
reuses the same vector if a vector of thedesired type is already in the requested position. However, the flat
bytes vector needs special treatment when reused - it needs to be reset
and previously we forgot to do so. This is now fixed.
The addition of vector resetting behavior required a change of
batchSchemaPrefixEnforcer
(it is actually more like a fix) to make itenforce that only a range (a "subset") of vectors that the projecting
operator (and its internal projecting operator chains) own. For example,
consider a scenario when we have a hash joiner that outputs two columns
which feeds into a case operator that has an output column and its
internal projecting chains use two other columns. Previously, the batch
schema prefix enforcer would be "maybe appending" all five columns
although it should only care about the columns at indices 2, 3, 4 (the
ones that case operator owns) and not pay attention to columns 0 and
1 because those are owned by the hash joiner. Now, the schema enforcer
is renamed to
batchSchemaSubsetEnforcer
and it correctly operates onlyon the requested subset of columns. Without this change we would need to
modify
Allocator.MaybeAppendColumn
signature to include a boolean totell whether it is ok to reset the reused vector - if we didn't do it,
the batch schema enforcer would incorrectly reset the columns populated
by the hash joiner which would lead to incorrect results.
Release note (bug fix): Previously, in some cases an internal error
could occur when queries that have columns of BYTES type in the output
were executed via the vectorized engine, and this has been fixed.