-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: index accelerate <@ (contained by) expressions for array inverted indexes #61219
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really well done! Super exciting to see!
In the PR/commit message:
Fixes: #59763
This should just say "Informs #59763", since this PR doesn't cover JSON or inverted joins.
You should also add a "Release note (performance improvement)", since this PR supports index-acceleration of new types of expressions.
I think we'll probably want to wait to merge this PR until after the branch is cut on March 8th, so no need to worry about a release justification.
Reviewed 8 of 8 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 998 at r1 (raw file):
4 {1,2,3} {b,NULL,c} subtest contained_by_arrays
Nice tests! Can you add these same tests (or a subset) to one of the execbuilder
test files to show the EXPLAIN
output?
pkg/sql/opt/invertedidx/json_array.go, line 92 at r1 (raw file):
// getInvertedExprForJSONOrArrayIndex gets an inverted.Expression that // constrains a json or array index according to the given constant. func getInvertedExprForJSONOrArrayIndex(
[nit] would be good to add an explanation of the contained
parameter
pkg/sql/opt/invertedidx/json_array.go, line 193 at r1 (raw file):
var spanExpr *inverted.SpanExpression if d, ok := nonIndexParam.(tree.Datum); ok { invertedExpr := getInvertedExprForJSONOrArrayIndex(evalCtx, d, false /* contained */)
no need to make the change in this PR, but we'll want to support inverted joins with <@
in a future PR too
pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file):
if right.DataType().Family() == types.ArrayFamily && j.index.Version() < descpb.EmptyArraysInInvertedIndexesVersion { if arr, ok := d.(*tree.DArray); ok && arr.Len() == 0 {
I don't think we want to check the length here. If the index doesn't include empty arrays we will return the wrong result for all arrays in the contained-by case.
pkg/sql/opt/invertedidx/json_array.go, line 352 at r1 (raw file):
} // If neither conditions are met, we cannot create an InvertedExpression.
[nit] neither conditions are -> neither condition is
pkg/sql/opt/invertedidx/json_array_test.go, line 273 at r1 (raw file):
tight: false, unique: true, remainingFilters: "a <@ '{1}'",
It would be good to add a couple more test cases here, for example combining a contained-by filter with another expression using AND or OR (see examples below)
pkg/sql/opt/memo/testdata/stats/inverted-array, line 148 at r1 (raw file):
│ ├── key: (1) │ ├── fd: (1)-->(2) │ └── inverted-filter
seems like we should be able to remove the inverted filter in this case. Probably requires setting the correct value of unique
pkg/sql/opt/memo/testdata/stats/inverted-array, line 210 at r1 (raw file):
# right. An additional filter is required. opt SELECT * FROM t WHERE a <@ '{2}' OR a <@ '{3}'
how about adding another test with a conjunction (i.e. AND)
pkg/sql/rowenc/index_encoding.go, line 830 at r1 (raw file):
return json.EncodeInvertedIndexKeys(inKey, val.(*tree.DJSON).JSON) case types.ArrayFamily: return encodeArrayInvertedIndexTableKeys(val.(*tree.DArray), inKey, version, false)
[nit] this should be false /* excludeNulls */
pkg/sql/rowenc/index_encoding.go, line 900 at r1 (raw file):
// // This function does not return keys for empty arrays or for NULL array elements // unless the version is at least descpb.EmptyArraysInInvertedIndexesVersion.
Update this comment to mention the excludeNulls
param
pkg/sql/rowenc/index_encoding.go, line 984 at r1 (raw file):
val *tree.DArray, inKey []byte, version descpb.IndexDescriptorVersion, ) (invertedExpr inverted.Expression, err error) {
It might be a good idea to check the version here and return an error if it's less than descpb.EmptyArraysInInvertedIndexesVersion
.
pkg/sql/rowenc/index_encoding.go, line 989 at r1 (raw file):
invertedExpr = inverted.ExprForSpan( inverted.MakeSingleValSpan(encoding.EncodeEmptyArray(inKey)), false, /* tight */ )
I think you're missing a spanExpr.Unique=true
here.
pkg/sql/rowenc/index_encoding.go, line 996 at r1 (raw file):
} // We always exclude nulls from the list of keys when evaluating <@.
would help to add an example here to explain why
pkg/sql/rowenc/index_encoding.go, line 1009 at r1 (raw file):
} // The inverted expression produced for <@ will never be tight.
an example would help
pkg/sql/rowenc/index_encoding_test.go, line 533 at r1 (raw file):
result string expected bool contained bool
[nit] I was scratching my head over this test for a little bit because I was expecting it to be more similar to the test above. I like the changes that you've made here (including an expected value for whether the spans should contain the keys, since they aren't tight), but it might help to make these names more self-documenting. For example, I might change "result" to "contains" or "indexedValue", and "expected" to "containsKeys" (or something like that....)
pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):
// Not all results included in the spans are contained by the value, so the // expression is never tight. Also, the expression is a union of spans, so // unique should never be true.
(unless the value is an empty array)
pkg/sql/rowenc/index_encoding_test.go, line 615 at r1 (raw file):
if actual != expected { if expected { t.Errorf("expected spans of %s to include %s but it did not", value, result)
[nit] it did not -> they did not
pkg/sql/rowenc/index_encoding_test.go, line 617 at r1 (raw file):
t.Errorf("expected spans of %s to include %s but it did not", value, result) } else { t.Errorf("expected spans of %s not to include %s but it did", value, result)
[nit] it did -> they did
pkg/sql/rowenc/index_encoding_test.go, line 621 at r1 (raw file):
} // Since the spans are never tight, an additional filter to determine if
[nit] an additional filter -> apply an additional filter
pkg/sql/rowenc/index_encoding_test.go, line 623 at r1 (raw file):
// Since the spans are never tight, an additional filter to determine if // the result is contained. isContained, err := tree.ArrayContains(&evalCtx, value.(*tree.DArray), result.(*tree.DArray))
I think you only want to execute ArrayContains
if ContainsKeys
returns true, right?
pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):
runTest(value, result, c.expected, c.contained) } }
It would be good to add some randomly generated test cases as well (similar to the Containing test). It won't catch false positives (due to the fact that the spans aren't tight), but might catch false negatives.
pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file): Previously, rytaft (Rebecca Taft) wrote…
We have an |
eb5b7d6
to
ed432c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the feedback!! I updated the commit message too.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/logictest/testdata/logic_test/inverted_index, line 998 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Nice tests! Can you add these same tests (or a subset) to one of the
execbuilder
test files to show theEXPLAIN
output?
added to the inverted_index
test file in execbuilder
!
pkg/sql/opt/invertedidx/json_array.go, line 92 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] would be good to add an explanation of the
contained
parameter
Done.
pkg/sql/opt/invertedidx/json_array.go, line 193 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
no need to make the change in this PR, but we'll want to support inverted joins with
<@
in a future PR too
Okay!
pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
We have an
index-version
directive that you can use to test this (see some examples inxform/testdata/rules/select
)
Done - oh wow, that makes sense, fixed it and added a test in the inverted-array
test file!
pkg/sql/opt/invertedidx/json_array.go, line 352 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] neither conditions are -> neither condition is
Done.
pkg/sql/opt/invertedidx/json_array_test.go, line 273 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It would be good to add a couple more test cases here, for example combining a contained-by filter with another expression using AND or OR (see examples below)
Done.
pkg/sql/opt/memo/testdata/stats/inverted-array, line 148 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
seems like we should be able to remove the inverted filter in this case. Probably requires setting the correct value of
unique
Done - made some changes in encodeContainedArrayInvertedIndexSpans
to set unique
correctly
pkg/sql/opt/memo/testdata/stats/inverted-array, line 210 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
how about adding another test with a conjunction (i.e. AND)
Done.
pkg/sql/rowenc/index_encoding.go, line 830 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] this should be
false /* excludeNulls */
Done.
pkg/sql/rowenc/index_encoding.go, line 900 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Update this comment to mention the
excludeNulls
param
Done.
pkg/sql/rowenc/index_encoding.go, line 984 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It might be a good idea to check the version here and return an error if it's less than
descpb.EmptyArraysInInvertedIndexesVersion
.
Done.
pkg/sql/rowenc/index_encoding.go, line 989 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think you're missing a
spanExpr.Unique=true
here.
Done.
pkg/sql/rowenc/index_encoding.go, line 996 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
would help to add an example here to explain why
Done.
pkg/sql/rowenc/index_encoding.go, line 1009 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
an example would help
Done.
pkg/sql/rowenc/index_encoding_test.go, line 533 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] I was scratching my head over this test for a little bit because I was expecting it to be more similar to the test above. I like the changes that you've made here (including an expected value for whether the spans should contain the keys, since they aren't tight), but it might help to make these names more self-documenting. For example, I might change "result" to "contains" or "indexedValue", and "expected" to "containsKeys" (or something like that....)
Done - also added unique
pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
(unless the value is an empty array)
Done - I rephrased it as "unless the value produces a single empty array span" because a value like ARRAY[NULL] or ARRAY[NULL, NULL] would produce the same thing (since nulls are excluded).
pkg/sql/rowenc/index_encoding_test.go, line 615 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] it did not -> they did not
Done.
pkg/sql/rowenc/index_encoding_test.go, line 617 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] it did -> they did
Done.
pkg/sql/rowenc/index_encoding_test.go, line 621 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] an additional filter -> apply an additional filter
Done.
pkg/sql/rowenc/index_encoding_test.go, line 623 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think you only want to execute
ArrayContains
ifContainsKeys
returns true, right?
Done - yes that's right, fixed it!
pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It would be good to add some randomly generated test cases as well (similar to the Containing test). It won't catch false positives (due to the fact that the spans aren't tight), but might catch false negatives.
Added! I made it so that the randomly generated tests will only run if containsKeys=true
so we just test for false negatives.
ed432c1
to
d873471
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column is on either side of the expression.
[nit] I'd say "Some additional expressions with ARRAY columns using the...."
(Don't forget to update the PR message to match the commit message -- you'll need to do that from the GitHub UI)
Reviewed 6 of 6 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)
pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):
// The contained parameter indicates that we are checking if the indexed // column is contained by a constant. This is handled differently than // checking if the indexed column contains a constant.
[nit] I'd make this part of the function comment, and I'd explicitly reference the operators (<@ and @>) to make this clearer. Maybe renaming the parameter to containedBy
would also help. Instead of saying they are "handled differently", I'd give a few more details. Something like, "contains results in a span expression representing the intersection of all paths through the JSON or Array, while containedBy results in a span expression representing the union of all paths through the JSON or Array".
pkg/sql/opt/invertedidx/json_array_test.go, line 344 at r2 (raw file):
}, { // We can constrain the JSON index when the functions are AND-ed.
[nit] mention that this will work once JSON is supported for <@
pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):
# the right. An additional filter is required. opt SELECT * FROM t WHERE a <@ '{2}' OR a <@ '{3}'
I think you meant to make this AND? (Also "conjunction" would be a bit more correct than "intersection" here)
pkg/sql/opt/memo/testdata/stats/inverted-array, line 278 at r2 (raw file):
└── (ARRAY[2] @> a:2) OR (ARRAY[3] @> a:2) [type=bool, outer=(2), immutable] # The inverted index will not be used when the version does not have empty
It would be better to put this test case in xform/testdata/rules/select
(and also add a few more tests for <@ there), since that's where the all the tests for the GenerateInvertedIndexScans
rule live. This test file is for testing that the stats are calculated correctly for inverted array indexes, so it doesn't make much sense to include this test here.
pkg/sql/rowenc/index_encoding.go, line 888 at r2 (raw file):
datum := tree.UnwrapDatum(evalCtx, val) typ := val.ResolvedType().Family() if typ == types.ArrayFamily {
[nit] change this to a switch
pkg/sql/rowenc/index_encoding.go, line 905 at r2 (raw file):
// unless the version is at least descpb.EmptyArraysInInvertedIndexesVersion. // It does not return keys for NULL array elements when a <@ (contained by) // expression is being evaluated. This is indicated by excludeNulls.
[nit] I would phrase this differently, because this function doesn't doesn't know that a contained by expression is being evaluated. It only knows that excludeNulls
is set. Maybe change this to: "It also does not return keys for NULL array elements if excludeNulls is true. This option is used by encodeContainedArrayInvertedIndexSpans
, which builds index spans to evaluate <@ (contained by) expressions."
pkg/sql/rowenc/index_encoding.go, line 1023 at r2 (raw file):
// inverted expression would scan for all elements in x that contain the // empty array or ARRAY[1]. The resulting elements could contain other values // and would need to be passed through an additional filter.
[nit] the word "elements" is a bit confusing since it's not clear whether it means entire arrays or elements of the array. I'd change it to say "arrays" instead. You might also give one additional example at the end to make it really clear, like: "For example, ARRRAY[1, 2, 3] would be returned by the scan, but it should be filtered out since ARRRAY[1, 2, 3] <@ ARRAY[1] is false"
pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):
Previously, angelazxu (Angela Xu) wrote…
Done - I rephrased it as "unless the value produces a single empty array span" because a value like ARRAY[NULL] or ARRAY[NULL, NULL] would produce the same thing (since nulls are excluded).
Nice!
pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):
Previously, angelazxu (Angela Xu) wrote…
Added! I made it so that the randomly generated tests will only run if
containsKeys=true
so we just test for false negatives.
Good idea!
pkg/sql/rowenc/index_encoding_test.go, line 409 at r2 (raw file):
} func TestEncodeContainingArrayInvertedIndexSpans(t *testing.T) {
If you don't mind, it would be great to update this test case to match the new one you're adding, so people reading both can easily see the parallels. I think that corresponds to changing value
here to indexedValue
, and contains
to value
. I'd also change the order of value
and indexedValue
in your test case so that indexedValue
is first, since that matches the order of operands (e.g. indexedValue @> value
and indexedValue <@ value
) (although it doesn't really matter -- just be consistent across the two tests). So the only difference will be that your test case also includes a containsKeys
boolean, which is useful since the <@ spans are not tight.
pkg/sql/rowenc/index_encoding_test.go, line 533 at r2 (raw file):
indexedValue string expected bool containsKeys bool
I think you want these two booleans to be swapped: containsKeys
should always be true for the first block below, and expected
should be the expected result of evaluating indexedValue <@ value
pkg/sql/rowenc/index_encoding_test.go, line 539 at r2 (raw file):
// This test uses EncodeInvertedIndexTableKeys and EncodeContainedInvertedIndexSpans // to determine if the spans produced from the given array value will correctly // include and exclude given results. It then checks if result <@ value.
need to update this comment to refer to the new names
pkg/sql/rowenc/index_encoding_test.go, line 633 at r2 (raw file):
isContained, err := tree.ArrayContains(&evalCtx, value.(*tree.DArray), indexedValue.(*tree.DArray)) require.NoError(t, err) if bool(*isContained) != containsKeys {
I think you should declare isContained
outside of the if expected
block, since we still want to perform this check even if expected
is false. (although I think you'll be changing expected
to containsKeys
...)
pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):
// We cannot check for false positives with these tests (due to the fact that // the spans are not tight), so we will only test for false negatives. containsKeys, err := tree.ArrayContains(&evalCtx, left.(*tree.DArray), right.(*tree.DArray))
Similar to above, I'd name this something other than containsKeys
. contains
, isContained
, or expected
would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)
pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Similar to above, I'd name this something other than
containsKeys
.contains
,isContained
, orexpected
would work.
Also -- isn't this evaluating left @> right
, not left <@ right
? I think this is only working right now since right
is getting passed as the indexedValue
in runTest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)
pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] I'd make this part of the function comment, and I'd explicitly reference the operators (<@ and @>) to make this clearer. Maybe renaming the parameter to
containedBy
would also help. Instead of saying they are "handled differently", I'd give a few more details. Something like, "contains results in a span expression representing the intersection of all paths through the JSON or Array, while containedBy results in a span expression representing the union of all paths through the JSON or Array".
I think we should consider removing this wrapper and simplifying the rowenc APIs instead. AFAICT this is the only non-test code that calls these functions, so there is no reason why they need to be more complicated that this wrapper:
- do we need to pass
b
? I don't think we ever pass anything non-nil here - do we need to pass the version? It's always
EmptyArraysInInvertedIndexesVersion
except for test code that specifically tests the other version
I think it would then be more clear if we just use the rowenc
functions instead, even if it means we have to add an if err != nil { panic(err) }
every time. If we still want a wrapper, I'd make separate wrappers instead of a boolean argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):
Previously, RaduBerinde wrote…
I think we should consider removing this wrapper and simplifying the rowenc APIs instead. AFAICT this is the only non-test code that calls these functions, so there is no reason why they need to be more complicated that this wrapper:
- do we need to pass
b
? I don't think we ever pass anything non-nil here- do we need to pass the version? It's always
EmptyArraysInInvertedIndexesVersion
except for test code that specifically tests the other versionI think it would then be more clear if we just use the
rowenc
functions instead, even if it means we have to add anif err != nil { panic(err) }
every time. If we still want a wrapper, I'd make separate wrappers instead of a boolean argument.
I ended up getting rid of contained
and making separate wrappers. The rowenc
functions do show up in quite a few places so I think having the wrapper makes it look cleaner. I did get rid of b
and version
though - now I just declare var inKey [byte]
directly in the encoding spans functions, and I pass in EmptyArraysInInvertedIndexesVersion
to encodeArrayInvertedIndexTableKeys
(the only place it is needed).
pkg/sql/opt/invertedidx/json_array_test.go, line 344 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] mention that this will work once JSON is supported for <@
Done.
pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think you meant to make this AND? (Also "conjunction" would be a bit more correct than "intersection" here)
Fixed, oops!!
pkg/sql/opt/memo/testdata/stats/inverted-array, line 278 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
It would be better to put this test case in
xform/testdata/rules/select
(and also add a few more tests for <@ there), since that's where the all the tests for theGenerateInvertedIndexScans
rule live. This test file is for testing that the stats are calculated correctly for inverted array indexes, so it doesn't make much sense to include this test here.
Done.
pkg/sql/rowenc/index_encoding.go, line 888 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] change this to a switch
Done.
pkg/sql/rowenc/index_encoding.go, line 905 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] I would phrase this differently, because this function doesn't doesn't know that a contained by expression is being evaluated. It only knows that
excludeNulls
is set. Maybe change this to: "It also does not return keys for NULL array elements if excludeNulls is true. This option is used byencodeContainedArrayInvertedIndexSpans
, which builds index spans to evaluate <@ (contained by) expressions."
Done.
pkg/sql/rowenc/index_encoding.go, line 1023 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] the word "elements" is a bit confusing since it's not clear whether it means entire arrays or elements of the array. I'd change it to say "arrays" instead. You might also give one additional example at the end to make it really clear, like: "For example, ARRRAY[1, 2, 3] would be returned by the scan, but it should be filtered out since ARRRAY[1, 2, 3] <@ ARRAY[1] is false"
Done.
pkg/sql/rowenc/index_encoding_test.go, line 409 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
If you don't mind, it would be great to update this test case to match the new one you're adding, so people reading both can easily see the parallels. I think that corresponds to changing
value
here toindexedValue
, andcontains
tovalue
. I'd also change the order ofvalue
andindexedValue
in your test case so thatindexedValue
is first, since that matches the order of operands (e.g.indexedValue @> value
andindexedValue <@ value
) (although it doesn't really matter -- just be consistent across the two tests). So the only difference will be that your test case also includes acontainsKeys
boolean, which is useful since the <@ spans are not tight.
Done!
pkg/sql/rowenc/index_encoding_test.go, line 533 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think you want these two booleans to be swapped:
containsKeys
should always be true for the first block below, andexpected
should be the expected result of evaluatingindexedValue <@ value
Done!
pkg/sql/rowenc/index_encoding_test.go, line 539 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
need to update this comment to refer to the new names
Done.
pkg/sql/rowenc/index_encoding_test.go, line 633 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think you should declare
isContained
outside of theif expected
block, since we still want to perform this check even ifexpected
is false. (although I think you'll be changingexpected
tocontainsKeys
...)
Ohh I see what you mean. When I put it inside the if expected
block, it was to mimic the logic where we are applying an additional filter only on the arrays contained by the spans. So if the array was excluded from the spans (aka containsKeys
was false), the filter wouldn't be applied to it in the first place (since it wouldn't have been included in the results). In other words, whenever containsKeys
is false, expected
is always false because the previous step will have already removed it as a possibility. I agree we should be checking it regardless though so I've changed it!
pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Also -- isn't this evaluating
left @> right
, notleft <@ right
? I think this is only working right now sinceright
is getting passed as theindexedValue
inrunTest
.
Yeah you're right! I reordered runtest
parameters to be indexedValue, value
now and also passed in right, left
as haystack, needles
in ArrayContains
, so now we are evaluating left <@ right
:) Also renamed everything so that isContained
matches containsKeys
, and actual
matches expected
when we are checking the results.
d873471
to
ce059ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)
pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):
Previously, angelazxu (Angela Xu) wrote…
I ended up getting rid of
contained
and making separate wrappers. Therowenc
functions do show up in quite a few places so I think having the wrapper makes it look cleaner. I did get rid ofb
andversion
though - now I just declarevar inKey [byte]
directly in the encoding spans functions, and I pass inEmptyArraysInInvertedIndexesVersion
toencodeArrayInvertedIndexTableKeys
(the only place it is needed).
Thanks, this is much better!
pkg/sql/opt/invertedidx/json_array.go, line 94 at r3 (raw file):
// This results in a span expression representing the intersection of all paths // through the JSON or Array. func getInvertedExprForJSONOrArrayIndex(
[nit] add ForContaining
pkg/sql/opt/invertedidx/json_array.go, line 109 at r3 (raw file):
// through the JSON or Array. This function is only used when checking if an // indexed column is contained by (<@) a constant. func getInvertedExprForJSONOrArrayIndexForContained(
[nit] ForContainedBy (to match the ComparisonOperator
).
pkg/sql/rowenc/index_encoding.go, line 854 at r3 (raw file):
return nil, nil } var inKey []byte
[nit] you can also directly pass nil /* inKey */
pkg/sql/rowenc/index_encoding.go, line 862 at r3 (raw file):
return encodeContainingArrayInvertedIndexSpans(val.(*tree.DArray), inKey) } return nil, errors.AssertionFailedf(
[nit] move to default:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! Just a few remaining nits
Reviewed 6 of 6 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @RaduBerinde)
pkg/sql/opt/invertedidx/json_array.go, line 94 at r3 (raw file):
Previously, RaduBerinde wrote…
[nit] add ForContaining
Also mention @> in the comment (like you did below for the <@ function)
pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):
Previously, angelazxu (Angela Xu) wrote…
Fixed, oops!!
Looks good -- you'll probably need to use TESTFLAGS='-rewrite=true'
to fix the output
pkg/sql/opt/xform/testdata/rules/select, line 2606 at r3 (raw file):
# The inverted index will not be used when the version does not have empty # arrays in the inverted index.
[nit] mention in the comment that we can never use the inverted index for <@ expressions if the index version does not have empty arrays
pkg/sql/rowenc/index_encoding.go, line 847 at r3 (raw file):
// comments in the SpanExpression definition for details. // // The input inKey is prefixed to the keys in all returned spans.
you can remove this last bit from the comment since you've removed the inKey
param
pkg/sql/rowenc/index_encoding.go, line 880 at r3 (raw file):
// SpanExpression definition for details. // // The input inKey is prefixed to the keys in all returned spans.
you can remove this last bit from the comment since you've removed the inKey
param
pkg/sql/rowenc/index_encoding.go, line 889 at r3 (raw file):
var inKey []byte datum := tree.UnwrapDatum(evalCtx, val) switch typ := val.ResolvedType().Family(); typ {
[nit] are you using typ
? If not, you can just switch on val.ResolvedType().Family()
pkg/sql/rowenc/index_encoding_test.go, line 538 at r3 (raw file):
// This test uses EncodeInvertedIndexTableKeys and EncodeContainedInvertedIndexSpans // to determine if the spans produced from the first Array value will // correctly include or exclude the second value, indicated by
I think the spans are actually produced from the second value, so I think you want to swap these
(first <-> second)
pkg/sql/rowenc/index_encoding_test.go, line 588 at r3 (raw file):
} runTest := func(indexedValue, value tree.Datum, containsKeys, expected, unique bool) {
[nit] I'd change containsKeys
to expectContainsKeys
and unique
to expectUnique
pkg/sql/rowenc/index_encoding_test.go, line 618 at r3 (raw file):
require.NoError(t, err) if isContained != containsKeys {
[nit] If you change containsKeys
to expectContainsKeys
, then you can change isContained
to containsKeys
pkg/sql/rowenc/index_encoding_test.go, line 633 at r3 (raw file):
if expected { t.Errorf("expected %s to be contained by %s but it was not", indexedValue, value) }
I don't think t.Errorf
stops execution, so you probably need an else
here
ce059ee
to
1669563
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/xform/testdata/rules/select, line 2606 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] mention in the comment that we can never use the inverted index for <@ expressions if the index version does not have empty arrays
Done.
pkg/sql/rowenc/index_encoding.go, line 847 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you can remove this last bit from the comment since you've removed the
inKey
param
Done.
pkg/sql/rowenc/index_encoding.go, line 880 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
you can remove this last bit from the comment since you've removed the
inKey
param
Done.
pkg/sql/rowenc/index_encoding.go, line 889 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] are you using
typ
? If not, you can just switch onval.ResolvedType().Family()
Done.
pkg/sql/rowenc/index_encoding_test.go, line 538 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think the spans are actually produced from the second value, so I think you want to swap these
(first <-> second)
Done.
pkg/sql/rowenc/index_encoding_test.go, line 588 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] I'd change
containsKeys
toexpectContainsKeys
andunique
toexpectUnique
Done.
pkg/sql/rowenc/index_encoding_test.go, line 618 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] If you change
containsKeys
toexpectContainsKeys
, then you can changeisContained
tocontainsKeys
Done.
pkg/sql/rowenc/index_encoding_test.go, line 633 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't think
t.Errorf
stops execution, so you probably need anelse
here
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized I forgot to change the names for getInvertedExprForJSONOrArrayIndexForContaining/ContainedBy in the comments, will change that in a bit!!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
1669563
to
b8d3023
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 8 files at r1, 1 of 6 files at r2, 1 of 6 files at r3, 5 of 5 files at r4.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @angelazxu)
pkg/sql/opt/invertedidx/json_array.go, line 344 at r4 (raw file):
// When the second argument is a variable or expression corresponding to // the index column and the first argument is a constant, we get the // equivalent InvertedExpression for right <@ left.
[nit] It might be clearer to put this comment inside the else if
block below, since that's what it is describing. If you're concerned about consistency with the comment regarding the left @> right
case above, you could move that into the if
block too.
pkg/sql/opt/invertedidx/json_array.go, line 353 at r4 (raw file):
return inverted.NonInvertedColExpression{} } return getInvertedExprForJSONOrArrayIndexForContainedBy(evalCtx, d)
[nit] you might be able to clean up some of the duplicated code in these two paths. Something like:
var indexColumn opt.ScalarExpr
var constantVal opt.ScalarExpr
if isIndexColumn(j.tabID, j.index, left, j.computedColumns && memo.CanExtractConstDatum(right) {
indexColumn = left
constantVal = right
} else if isIndexColumn(j.tabID, j.index, right, j.computedColumns && memo.CanExtractConstDatum(left) {
indexColumn = right
constantVal = left
}
// rest of the logic on indexColumn and constantVal
pkg/sql/rowenc/index_encoding.go, line 865 at r4 (raw file):
} // EncodeContainedInvertedIndexSpans takes in a key prefix and returns the
I think this needs to be updated - there's no key prefix argument.
pkg/sql/rowenc/index_encoding.go, line 1020 at r4 (raw file):
// and would need to be passed through an additional filter. For example, // ARRAY[1, 2, 3] would be returned by the scan, but it should be filtered // out since ARRAY[1, 2, 3] <@ ARRAY[1] is false.
Great explanation and example! 💯
b8d3023
to
e9a2a7b
Compare
Previously, we did not support index acceleration when checking if an indexed column is <@ (contained by) a constant, or in other words, when the indexed column is on the right side of a @> (contains) expression. We already perform index acceleration for @> (contains) expressions where an indexed JSON or Array column is on the left side of the expression. This change adds support for using the inverted index with <@ expressions on Array columns. When there is an inverted index available, a scan will be done on the Array column using the spans found from the constant value. An additional filter will then be applied, as the span expression will never be tight. Support for JSON columns will be added later. Informs: cockroachdb#59763 Release note (performance improvement): Some additional expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column on either side of the expression.
e9a2a7b
to
7c5154b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @angelazxu)
bors r+ |
👎 Rejected by label |
bors r+ |
Build succeeded: |
Previously, we did not support index acceleration when checking if an indexed
column is <@ (contained by) a constant, or in other words, when the indexed
column is on the right side of a @> (contains) expression. We already perform
index acceleration for @> (contains) expressions where an indexed JSON or Array
column is on the left side of the expression.
This change adds support for using the inverted index with <@ expressions on
Array columns. When there is an inverted index available, a scan will be done on
the Array column using the spans found from the constant value. An additional
filter will then be applied, as the span expression will never be tight.
Support for JSON columns will be added later.
Informs: #59763
Release note (performance improvement): Some additional expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column on either side of the expression.