opt: index accelerate <@ (contained by) expressions for array inverted indexes #61219

angelazxu · 2021-02-27T02:24:17Z

Previously, we did not support index acceleration when checking if an indexed
column is <@ (contained by) a constant, or in other words, when the indexed
column is on the right side of a @> (contains) expression. We already perform
index acceleration for @> (contains) expressions where an indexed JSON or Array
column is on the left side of the expression.

This change adds support for using the inverted index with <@ expressions on
Array columns. When there is an inverted index available, a scan will be done on
the Array column using the spans found from the constant value. An additional
filter will then be applied, as the span expression will never be tight.
Support for JSON columns will be added later.

Informs: #59763

Release note (performance improvement): Some additional expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column on either side of the expression.

cockroach-teamcity · 2021-02-27T02:24:24Z

This change is

rytaft

This is really well done! Super exciting to see!

In the PR/commit message:

Fixes: #59763

This should just say "Informs #59763", since this PR doesn't cover JSON or inverted joins.

You should also add a "Release note (performance improvement)", since this PR supports index-acceleration of new types of expressions.

I think we'll probably want to wait to merge this PR until after the branch is cut on March 8th, so no need to worry about a release justification.

Reviewed 8 of 8 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)

pkg/sql/logictest/testdata/logic_test/inverted_index, line 998 at r1 (raw file):

4  {1,2,3}  {b,NULL,c}

subtest contained_by_arrays

Nice tests! Can you add these same tests (or a subset) to one of the execbuilder test files to show the EXPLAIN output?

pkg/sql/opt/invertedidx/json_array.go, line 92 at r1 (raw file):

// getInvertedExprForJSONOrArrayIndex gets an inverted.Expression that
// constrains a json or array index according to the given constant.
func getInvertedExprForJSONOrArrayIndex(

[nit] would be good to add an explanation of the contained parameter

pkg/sql/opt/invertedidx/json_array.go, line 193 at r1 (raw file):

			var spanExpr *inverted.SpanExpression
			if d, ok := nonIndexParam.(tree.Datum); ok {
				invertedExpr := getInvertedExprForJSONOrArrayIndex(evalCtx, d, false /* contained */)

no need to make the change in this PR, but we'll want to support inverted joins with <@ in a future PR too

pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file):

		if right.DataType().Family() == types.ArrayFamily &&
			j.index.Version() < descpb.EmptyArraysInInvertedIndexesVersion {
			if arr, ok := d.(*tree.DArray); ok && arr.Len() == 0 {

I don't think we want to check the length here. If the index doesn't include empty arrays we will return the wrong result for all arrays in the contained-by case.

pkg/sql/opt/invertedidx/json_array.go, line 352 at r1 (raw file):

	}

	// If neither conditions are met, we cannot create an InvertedExpression.

[nit] neither conditions are -> neither condition is

pkg/sql/opt/invertedidx/json_array_test.go, line 273 at r1 (raw file):

			tight:            false,
			unique:           true,
			remainingFilters: "a <@ '{1}'",

It would be good to add a couple more test cases here, for example combining a contained-by filter with another expression using AND or OR (see examples below)

pkg/sql/opt/memo/testdata/stats/inverted-array, line 148 at r1 (raw file):

 │    ├── key: (1)
 │    ├── fd: (1)-->(2)
 │    └── inverted-filter

seems like we should be able to remove the inverted filter in this case. Probably requires setting the correct value of unique

pkg/sql/opt/memo/testdata/stats/inverted-array, line 210 at r1 (raw file):

# right. An additional filter is required.
opt
SELECT * FROM t WHERE a <@ '{2}' OR a <@ '{3}'

how about adding another test with a conjunction (i.e. AND)

pkg/sql/rowenc/index_encoding.go, line 830 at r1 (raw file):

		return json.EncodeInvertedIndexKeys(inKey, val.(*tree.DJSON).JSON)
	case types.ArrayFamily:
		return encodeArrayInvertedIndexTableKeys(val.(*tree.DArray), inKey, version, false)

[nit] this should be false /* excludeNulls */

pkg/sql/rowenc/index_encoding.go, line 900 at r1 (raw file):

//
// This function does not return keys for empty arrays or for NULL array elements
// unless the version is at least descpb.EmptyArraysInInvertedIndexesVersion.

Update this comment to mention the excludeNulls param

pkg/sql/rowenc/index_encoding.go, line 984 at r1 (raw file):

	val *tree.DArray, inKey []byte, version descpb.IndexDescriptorVersion,
) (invertedExpr inverted.Expression, err error) {

It might be a good idea to check the version here and return an error if it's less than descpb.EmptyArraysInInvertedIndexesVersion.

pkg/sql/rowenc/index_encoding.go, line 989 at r1 (raw file):

	invertedExpr = inverted.ExprForSpan(
		inverted.MakeSingleValSpan(encoding.EncodeEmptyArray(inKey)), false, /* tight */
	)

I think you're missing a spanExpr.Unique=true here.

pkg/sql/rowenc/index_encoding.go, line 996 at r1 (raw file):

	}

	// We always exclude nulls from the list of keys when evaluating <@.

would help to add an example here to explain why

pkg/sql/rowenc/index_encoding.go, line 1009 at r1 (raw file):

	}

	// The inverted expression produced for <@ will never be tight.

an example would help

pkg/sql/rowenc/index_encoding_test.go, line 533 at r1 (raw file):

		result    string
		expected  bool
		contained bool

[nit] I was scratching my head over this test for a little bit because I was expecting it to be more similar to the test above. I like the changes that you've made here (including an expected value for whether the spans should contain the keys, since they aren't tight), but it might help to make these names more self-documenting. For example, I might change "result" to "contains" or "indexedValue", and "expected" to "containsKeys" (or something like that....)

pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):

		// Not all results included in the spans are contained by the value, so the
		// expression is never tight. Also, the expression is a union of spans, so
		// unique should never be true.

(unless the value is an empty array)

pkg/sql/rowenc/index_encoding_test.go, line 615 at r1 (raw file):

		if actual != expected {
			if expected {
				t.Errorf("expected spans of %s to include %s but it did not", value, result)

[nit] it did not -> they did not

pkg/sql/rowenc/index_encoding_test.go, line 617 at r1 (raw file):

				t.Errorf("expected spans of %s to include %s but it did not", value, result)
			} else {
				t.Errorf("expected spans of %s not to include %s but it did", value, result)

[nit] it did -> they did

pkg/sql/rowenc/index_encoding_test.go, line 621 at r1 (raw file):

		}

		// Since the spans are never tight, an additional filter to determine if

[nit] an additional filter -> apply an additional filter

pkg/sql/rowenc/index_encoding_test.go, line 623 at r1 (raw file):

		// Since the spans are never tight, an additional filter to determine if
		// the result is contained.
		isContained, err := tree.ArrayContains(&evalCtx, value.(*tree.DArray), result.(*tree.DArray))

I think you only want to execute ArrayContains if ContainsKeys returns true, right?

pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):

		runTest(value, result, c.expected, c.contained)
	}
}

It would be good to add some randomly generated test cases as well (similar to the Containing test). It won't catch false positives (due to the fact that the spans aren't tight), but might catch false negatives.

rytaft · 2021-03-01T14:03:57Z

pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I don't think we want to check the length here. If the index doesn't include empty arrays we will return the wrong result for all arrays in the contained-by case.

We have an index-version directive that you can use to test this (see some examples in xform/testdata/rules/select)

angelazxu

Thank you for all the feedback!! I updated the commit message too.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/logictest/testdata/logic_test/inverted_index, line 998 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Nice tests! Can you add these same tests (or a subset) to one of the execbuilder test files to show the EXPLAIN output?

added to the inverted_index test file in execbuilder!

pkg/sql/opt/invertedidx/json_array.go, line 92 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] would be good to add an explanation of the contained parameter

Done.

pkg/sql/opt/invertedidx/json_array.go, line 193 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

no need to make the change in this PR, but we'll want to support inverted joins with <@ in a future PR too

Okay!

pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

We have an index-version directive that you can use to test this (see some examples in xform/testdata/rules/select)

Done - oh wow, that makes sense, fixed it and added a test in the inverted-array test file!

pkg/sql/opt/invertedidx/json_array.go, line 352 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] neither conditions are -> neither condition is

Done.

pkg/sql/opt/invertedidx/json_array_test.go, line 273 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It would be good to add a couple more test cases here, for example combining a contained-by filter with another expression using AND or OR (see examples below)

Done.

pkg/sql/opt/memo/testdata/stats/inverted-array, line 148 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

seems like we should be able to remove the inverted filter in this case. Probably requires setting the correct value of unique

Done - made some changes in encodeContainedArrayInvertedIndexSpans to set unique correctly

pkg/sql/opt/memo/testdata/stats/inverted-array, line 210 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

how about adding another test with a conjunction (i.e. AND)

Done.

pkg/sql/rowenc/index_encoding.go, line 830 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] this should be false /* excludeNulls */

Done.

pkg/sql/rowenc/index_encoding.go, line 900 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Update this comment to mention the excludeNulls param

Done.

pkg/sql/rowenc/index_encoding.go, line 984 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It might be a good idea to check the version here and return an error if it's less than descpb.EmptyArraysInInvertedIndexesVersion.

Done.

pkg/sql/rowenc/index_encoding.go, line 989 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you're missing a spanExpr.Unique=true here.

Done.

pkg/sql/rowenc/index_encoding.go, line 996 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

would help to add an example here to explain why

Done.

pkg/sql/rowenc/index_encoding.go, line 1009 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

an example would help

Done.

pkg/sql/rowenc/index_encoding_test.go, line 533 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I was scratching my head over this test for a little bit because I was expecting it to be more similar to the test above. I like the changes that you've made here (including an expected value for whether the spans should contain the keys, since they aren't tight), but it might help to make these names more self-documenting. For example, I might change "result" to "contains" or "indexedValue", and "expected" to "containsKeys" (or something like that....)

Done - also added unique

pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

(unless the value is an empty array)

Done - I rephrased it as "unless the value produces a single empty array span" because a value like ARRAY[NULL] or ARRAY[NULL, NULL] would produce the same thing (since nulls are excluded).

pkg/sql/rowenc/index_encoding_test.go, line 615 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] it did not -> they did not

Done.

pkg/sql/rowenc/index_encoding_test.go, line 617 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] it did -> they did

Done.

pkg/sql/rowenc/index_encoding_test.go, line 621 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] an additional filter -> apply an additional filter

Done.

pkg/sql/rowenc/index_encoding_test.go, line 623 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you only want to execute ArrayContains if ContainsKeys returns true, right?

Done - yes that's right, fixed it!

pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It would be good to add some randomly generated test cases as well (similar to the Containing test). It won't catch false positives (due to the fact that the spans aren't tight), but might catch false negatives.

Added! I made it so that the randomly generated tests will only run if containsKeys=true so we just test for false negatives.

rytaft

Expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column is on either side of the expression.

[nit] I'd say "Some additional expressions with ARRAY columns using the...."

(Don't forget to update the PR message to match the commit message -- you'll need to do that from the GitHub UI)

Reviewed 6 of 6 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)

pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):

	// The contained parameter indicates that we are checking if the indexed
	// column is contained by a constant. This is handled differently than
	// checking if the indexed column contains a constant.

[nit] I'd make this part of the function comment, and I'd explicitly reference the operators (<@ and @>) to make this clearer. Maybe renaming the parameter to containedBy would also help. Instead of saying they are "handled differently", I'd give a few more details. Something like, "contains results in a span expression representing the intersection of all paths through the JSON or Array, while containedBy results in a span expression representing the union of all paths through the JSON or Array".

pkg/sql/opt/invertedidx/json_array_test.go, line 344 at r2 (raw file):

		},
		{
			// We can constrain the JSON index when the functions are AND-ed.

[nit] mention that this will work once JSON is supported for <@

pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):

# the right. An additional filter is required.
opt
SELECT * FROM t WHERE a <@ '{2}' OR a <@ '{3}'

I think you meant to make this AND? (Also "conjunction" would be a bit more correct than "intersection" here)

pkg/sql/opt/memo/testdata/stats/inverted-array, line 278 at r2 (raw file):

      └── (ARRAY[2] @> a:2) OR (ARRAY[3] @> a:2) [type=bool, outer=(2), immutable]

# The inverted index will not be used when the version does not have empty

It would be better to put this test case in xform/testdata/rules/select (and also add a few more tests for <@ there), since that's where the all the tests for the GenerateInvertedIndexScans rule live. This test file is for testing that the stats are calculated correctly for inverted array indexes, so it doesn't make much sense to include this test here.

pkg/sql/rowenc/index_encoding.go, line 888 at r2 (raw file):

	datum := tree.UnwrapDatum(evalCtx, val)
	typ := val.ResolvedType().Family()
	if typ == types.ArrayFamily {

[nit] change this to a switch

pkg/sql/rowenc/index_encoding.go, line 905 at r2 (raw file):

// unless the version is at least descpb.EmptyArraysInInvertedIndexesVersion.
// It does not return keys for NULL array elements when a <@ (contained by)
// expression is being evaluated. This is indicated by excludeNulls.

[nit] I would phrase this differently, because this function doesn't doesn't know that a contained by expression is being evaluated. It only knows that excludeNulls is set. Maybe change this to: "It also does not return keys for NULL array elements if excludeNulls is true. This option is used by encodeContainedArrayInvertedIndexSpans, which builds index spans to evaluate <@ (contained by) expressions."

pkg/sql/rowenc/index_encoding.go, line 1023 at r2 (raw file):

	// inverted expression would scan for all elements in x that contain the
	// empty array or ARRAY[1]. The resulting elements could contain other values
	// and would need to be passed through an additional filter.

[nit] the word "elements" is a bit confusing since it's not clear whether it means entire arrays or elements of the array. I'd change it to say "arrays" instead. You might also give one additional example at the end to make it really clear, like: "For example, ARRRAY[1, 2, 3] would be returned by the scan, but it should be filtered out since ARRRAY[1, 2, 3] <@ ARRAY[1] is false"

pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Done - I rephrased it as "unless the value produces a single empty array span" because a value like ARRAY[NULL] or ARRAY[NULL, NULL] would produce the same thing (since nulls are excluded).

Nice!

pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Added! I made it so that the randomly generated tests will only run if containsKeys=true so we just test for false negatives.

Good idea!

pkg/sql/rowenc/index_encoding_test.go, line 409 at r2 (raw file):

}

func TestEncodeContainingArrayInvertedIndexSpans(t *testing.T) {

If you don't mind, it would be great to update this test case to match the new one you're adding, so people reading both can easily see the parallels. I think that corresponds to changing value here to indexedValue, and contains to value. I'd also change the order of value and indexedValue in your test case so that indexedValue is first, since that matches the order of operands (e.g. indexedValue @> value and indexedValue <@ value) (although it doesn't really matter -- just be consistent across the two tests). So the only difference will be that your test case also includes a containsKeys boolean, which is useful since the <@ spans are not tight.

pkg/sql/rowenc/index_encoding_test.go, line 533 at r2 (raw file):

		indexedValue string
		expected     bool
		containsKeys bool

I think you want these two booleans to be swapped: containsKeys should always be true for the first block below, and expected should be the expected result of evaluating indexedValue <@ value

pkg/sql/rowenc/index_encoding_test.go, line 539 at r2 (raw file):

		// This test uses EncodeInvertedIndexTableKeys and EncodeContainedInvertedIndexSpans
		// to determine if the spans produced from the given array value will correctly
		// include and exclude given results. It then checks if result <@ value.

need to update this comment to refer to the new names

pkg/sql/rowenc/index_encoding_test.go, line 633 at r2 (raw file):

			isContained, err := tree.ArrayContains(&evalCtx, value.(*tree.DArray), indexedValue.(*tree.DArray))
			require.NoError(t, err)
			if bool(*isContained) != containsKeys {

I think you should declare isContained outside of the if expected block, since we still want to perform this check even if expected is false. (although I think you'll be changing expected to containsKeys...)

pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):

		// We cannot check for false positives with these tests (due to the fact that
		// the spans are not tight), so we will only test for false negatives.
		containsKeys, err := tree.ArrayContains(&evalCtx, left.(*tree.DArray), right.(*tree.DArray))

Similar to above, I'd name this something other than containsKeys. contains, isContained, or expected would work.

rytaft

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)

pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Similar to above, I'd name this something other than containsKeys. contains, isContained, or expected would work.

Also -- isn't this evaluating left @> right, not left <@ right? I think this is only working right now since right is getting passed as the indexedValue in runTest.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)

pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I'd make this part of the function comment, and I'd explicitly reference the operators (<@ and @>) to make this clearer. Maybe renaming the parameter to containedBy would also help. Instead of saying they are "handled differently", I'd give a few more details. Something like, "contains results in a span expression representing the intersection of all paths through the JSON or Array, while containedBy results in a span expression representing the union of all paths through the JSON or Array".

I think we should consider removing this wrapper and simplifying the rowenc APIs instead. AFAICT this is the only non-test code that calls these functions, so there is no reason why they need to be more complicated that this wrapper:

do we need to pass b? I don't think we ever pass anything non-nil here
do we need to pass the version? It's always EmptyArraysInInvertedIndexesVersion except for test code that specifically tests the other version

I think it would then be more clear if we just use the rowenc functions instead, even if it means we have to add an if err != nil { panic(err) } every time. If we still want a wrapper, I'd make separate wrappers instead of a boolean argument.

RaduBerinde

Nice work!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)

angelazxu

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):

Previously, RaduBerinde wrote…

I think we should consider removing this wrapper and simplifying the rowenc APIs instead. AFAICT this is the only non-test code that calls these functions, so there is no reason why they need to be more complicated that this wrapper:

do we need to pass b? I don't think we ever pass anything non-nil here

do we need to pass the version? It's always EmptyArraysInInvertedIndexesVersion except for test code that specifically tests the other version

I think it would then be more clear if we just use the rowenc functions instead, even if it means we have to add an if err != nil { panic(err) } every time. If we still want a wrapper, I'd make separate wrappers instead of a boolean argument.

I ended up getting rid of contained and making separate wrappers. The rowenc functions do show up in quite a few places so I think having the wrapper makes it look cleaner. I did get rid of b and version though - now I just declare var inKey [byte] directly in the encoding spans functions, and I pass in EmptyArraysInInvertedIndexesVersion to encodeArrayInvertedIndexTableKeys (the only place it is needed).

pkg/sql/opt/invertedidx/json_array_test.go, line 344 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] mention that this will work once JSON is supported for <@

Done.

pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you meant to make this AND? (Also "conjunction" would be a bit more correct than "intersection" here)

Fixed, oops!!

pkg/sql/opt/memo/testdata/stats/inverted-array, line 278 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It would be better to put this test case in xform/testdata/rules/select (and also add a few more tests for <@ there), since that's where the all the tests for the GenerateInvertedIndexScans rule live. This test file is for testing that the stats are calculated correctly for inverted array indexes, so it doesn't make much sense to include this test here.

Done.

pkg/sql/rowenc/index_encoding.go, line 888 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] change this to a switch

Done.

pkg/sql/rowenc/index_encoding.go, line 905 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I would phrase this differently, because this function doesn't doesn't know that a contained by expression is being evaluated. It only knows that excludeNulls is set. Maybe change this to: "It also does not return keys for NULL array elements if excludeNulls is true. This option is used by encodeContainedArrayInvertedIndexSpans, which builds index spans to evaluate <@ (contained by) expressions."

Done.

pkg/sql/rowenc/index_encoding.go, line 1023 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] the word "elements" is a bit confusing since it's not clear whether it means entire arrays or elements of the array. I'd change it to say "arrays" instead. You might also give one additional example at the end to make it really clear, like: "For example, ARRRAY[1, 2, 3] would be returned by the scan, but it should be filtered out since ARRRAY[1, 2, 3] <@ ARRAY[1] is false"

Done.

pkg/sql/rowenc/index_encoding_test.go, line 409 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

If you don't mind, it would be great to update this test case to match the new one you're adding, so people reading both can easily see the parallels. I think that corresponds to changing value here to indexedValue, and contains to value. I'd also change the order of value and indexedValue in your test case so that indexedValue is first, since that matches the order of operands (e.g. indexedValue @> value and indexedValue <@ value) (although it doesn't really matter -- just be consistent across the two tests). So the only difference will be that your test case also includes a containsKeys boolean, which is useful since the <@ spans are not tight.

Done!

pkg/sql/rowenc/index_encoding_test.go, line 533 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you want these two booleans to be swapped: containsKeys should always be true for the first block below, and expected should be the expected result of evaluating indexedValue <@ value

Done!

pkg/sql/rowenc/index_encoding_test.go, line 539 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

need to update this comment to refer to the new names

Done.

pkg/sql/rowenc/index_encoding_test.go, line 633 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you should declare isContained outside of the if expected block, since we still want to perform this check even if expected is false. (although I think you'll be changing expected to containsKeys...)

Ohh I see what you mean. When I put it inside the if expected block, it was to mimic the logic where we are applying an additional filter only on the arrays contained by the spans. So if the array was excluded from the spans (aka containsKeys was false), the filter wouldn't be applied to it in the first place (since it wouldn't have been included in the results). In other words, whenever containsKeys is false, expected is always false because the previous step will have already removed it as a possibility. I agree we should be checking it regardless though so I've changed it!

pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Also -- isn't this evaluating left @> right, not left <@ right? I think this is only working right now since right is getting passed as the indexedValue in runTest.

Yeah you're right! I reordered runtest parameters to be indexedValue, value now and also passed in right, left as haystack, needles in ArrayContains, so now we are evaluating left <@ right :) Also renamed everything so that isContained matches containsKeys, and actual matches expected when we are checking the results.

RaduBerinde

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)

pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):

Previously, angelazxu (Angela Xu) wrote…

I ended up getting rid of contained and making separate wrappers. The rowenc functions do show up in quite a few places so I think having the wrapper makes it look cleaner. I did get rid of b and version though - now I just declare var inKey [byte] directly in the encoding spans functions, and I pass in EmptyArraysInInvertedIndexesVersion to encodeArrayInvertedIndexTableKeys (the only place it is needed).

Thanks, this is much better!

pkg/sql/opt/invertedidx/json_array.go, line 94 at r3 (raw file):

// This results in a span expression representing the intersection of all paths
// through the JSON or Array.
func getInvertedExprForJSONOrArrayIndex(

[nit] add ForContaining

pkg/sql/opt/invertedidx/json_array.go, line 109 at r3 (raw file):

// through the JSON or Array. This function is only used when checking if an
// indexed column is contained by (<@) a constant.
func getInvertedExprForJSONOrArrayIndexForContained(

[nit] ForContainedBy (to match the ComparisonOperator).

pkg/sql/rowenc/index_encoding.go, line 854 at r3 (raw file):

		return nil, nil
	}
	var inKey []byte

[nit] you can also directly pass nil /* inKey */

pkg/sql/rowenc/index_encoding.go, line 862 at r3 (raw file):

		return encodeContainingArrayInvertedIndexSpans(val.(*tree.DArray), inKey)
	}
	return nil, errors.AssertionFailedf(

[nit] move to default:

rytaft

This is looking great! Just a few remaining nits

Reviewed 6 of 6 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @RaduBerinde)

pkg/sql/opt/invertedidx/json_array.go, line 94 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] add ForContaining

Also mention @> in the comment (like you did below for the <@ function)

pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Fixed, oops!!

Looks good -- you'll probably need to use TESTFLAGS='-rewrite=true' to fix the output

pkg/sql/opt/xform/testdata/rules/select, line 2606 at r3 (raw file):


# The inverted index will not be used when the version does not have empty
# arrays in the inverted index.

[nit] mention in the comment that we can never use the inverted index for <@ expressions if the index version does not have empty arrays

pkg/sql/rowenc/index_encoding.go, line 847 at r3 (raw file):

// comments in the SpanExpression definition for details.
//
// The input inKey is prefixed to the keys in all returned spans.

you can remove this last bit from the comment since you've removed the inKey param

pkg/sql/rowenc/index_encoding.go, line 880 at r3 (raw file):

// SpanExpression definition for details.
//
// The input inKey is prefixed to the keys in all returned spans.

you can remove this last bit from the comment since you've removed the inKey param

pkg/sql/rowenc/index_encoding.go, line 889 at r3 (raw file):

	var inKey []byte
	datum := tree.UnwrapDatum(evalCtx, val)
	switch typ := val.ResolvedType().Family(); typ {

[nit] are you using typ? If not, you can just switch on val.ResolvedType().Family()

pkg/sql/rowenc/index_encoding_test.go, line 538 at r3 (raw file):

		// This test uses EncodeInvertedIndexTableKeys and EncodeContainedInvertedIndexSpans
		// to determine if the spans produced from the first Array value will
		// correctly include or exclude the second value, indicated by

I think the spans are actually produced from the second value, so I think you want to swap these
(first <-> second)

pkg/sql/rowenc/index_encoding_test.go, line 588 at r3 (raw file):

	}

	runTest := func(indexedValue, value tree.Datum, containsKeys, expected, unique bool) {

[nit] I'd change containsKeys to expectContainsKeys and unique to expectUnique

pkg/sql/rowenc/index_encoding_test.go, line 618 at r3 (raw file):

		require.NoError(t, err)

		if isContained != containsKeys {

[nit] If you change containsKeys to expectContainsKeys, then you can change isContained to containsKeys

pkg/sql/rowenc/index_encoding_test.go, line 633 at r3 (raw file):

			if expected {
				t.Errorf("expected %s to be contained by %s but it was not", indexedValue, value)
			}

I don't think t.Errorf stops execution, so you probably need an else here

angelazxu

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/opt/xform/testdata/rules/select, line 2606 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] mention in the comment that we can never use the inverted index for <@ expressions if the index version does not have empty arrays

Done.

pkg/sql/rowenc/index_encoding.go, line 847 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

you can remove this last bit from the comment since you've removed the inKey param

Done.

pkg/sql/rowenc/index_encoding.go, line 880 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

you can remove this last bit from the comment since you've removed the inKey param

Done.

pkg/sql/rowenc/index_encoding.go, line 889 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] are you using typ? If not, you can just switch on val.ResolvedType().Family()

Done.

pkg/sql/rowenc/index_encoding_test.go, line 538 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think the spans are actually produced from the second value, so I think you want to swap these
(first <-> second)

Done.

pkg/sql/rowenc/index_encoding_test.go, line 588 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I'd change containsKeys to expectContainsKeys and unique to expectUnique

Done.

pkg/sql/rowenc/index_encoding_test.go, line 618 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] If you change containsKeys to expectContainsKeys, then you can change isContained to containsKeys

Done.

pkg/sql/rowenc/index_encoding_test.go, line 633 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I don't think t.Errorf stops execution, so you probably need an else here

Done.

angelazxu

Just realized I forgot to change the names for getInvertedExprForJSONOrArrayIndexForContaining/ContainedBy in the comments, will change that in a bit!!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

rytaft

Nice work!!

Reviewed 5 of 5 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained

mgartner

Great work! 💯

Reviewed 3 of 8 files at r1, 1 of 6 files at r2, 1 of 6 files at r3, 5 of 5 files at r4.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @angelazxu)

pkg/sql/opt/invertedidx/json_array.go, line 344 at r4 (raw file):

		// When the second argument is a variable or expression corresponding to
		// the index column and the first argument is a constant, we get the
		// equivalent InvertedExpression for right <@ left.

[nit] It might be clearer to put this comment inside the else if block below, since that's what it is describing. If you're concerned about consistency with the comment regarding the left @> right case above, you could move that into the if block too.

pkg/sql/opt/invertedidx/json_array.go, line 353 at r4 (raw file):

			return inverted.NonInvertedColExpression{}
		}
		return getInvertedExprForJSONOrArrayIndexForContainedBy(evalCtx, d)

[nit] you might be able to clean up some of the duplicated code in these two paths. Something like:

var indexColumn opt.ScalarExpr
var constantVal opt.ScalarExpr
if isIndexColumn(j.tabID, j.index, left, j.computedColumns && memo.CanExtractConstDatum(right) {
    indexColumn = left
    constantVal = right
} else if isIndexColumn(j.tabID, j.index, right, j.computedColumns && memo.CanExtractConstDatum(left) {
    indexColumn = right
    constantVal = left
}
// rest of the logic on indexColumn and constantVal

pkg/sql/rowenc/index_encoding.go, line 865 at r4 (raw file):

}

// EncodeContainedInvertedIndexSpans takes in a key prefix and returns the

I think this needs to be updated - there's no key prefix argument.

pkg/sql/rowenc/index_encoding.go, line 1020 at r4 (raw file):

	// and would need to be passed through an additional filter. For example,
	// ARRAY[1, 2, 3] would be returned by the scan, but it should be filtered
	// out since ARRAY[1, 2, 3] <@ ARRAY[1] is false.

Great explanation and example! 💯

Previously, we did not support index acceleration when checking if an indexed column is <@ (contained by) a constant, or in other words, when the indexed column is on the right side of a @> (contains) expression. We already perform index acceleration for @> (contains) expressions where an indexed JSON or Array column is on the left side of the expression. This change adds support for using the inverted index with <@ expressions on Array columns. When there is an inverted index available, a scan will be done on the Array column using the spans found from the constant value. An additional filter will then be applied, as the span expression will never be tight. Support for JSON columns will be added later. Informs: cockroachdb#59763 Release note (performance improvement): Some additional expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column on either side of the expression.

mgartner

Reviewed 2 of 2 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @angelazxu)

angelazxu · 2021-03-10T02:26:13Z

bors r+

craig · 2021-03-10T02:26:15Z

👎 Rejected by label

angelazxu · 2021-03-10T02:26:50Z

bors r+

craig · 2021-03-10T02:56:03Z

Build succeeded:

GitHub CI (Cockroach)

angelazxu requested a review from a team as a code owner February 27, 2021 02:24

rytaft reviewed Mar 1, 2021

View reviewed changes

angelazxu force-pushed the opt_59673_arrays branch from eb5b7d6 to ed432c1 Compare March 2, 2021 02:09

angelazxu commented Mar 2, 2021

View reviewed changes

angelazxu force-pushed the opt_59673_arrays branch from ed432c1 to d873471 Compare March 2, 2021 04:02

rytaft reviewed Mar 2, 2021

View reviewed changes

RaduBerinde approved these changes Mar 2, 2021

View reviewed changes

angelazxu commented Mar 4, 2021

View reviewed changes

angelazxu force-pushed the opt_59673_arrays branch from d873471 to ce059ee Compare March 4, 2021 02:16

RaduBerinde reviewed Mar 4, 2021

View reviewed changes

rytaft reviewed Mar 4, 2021

View reviewed changes

angelazxu force-pushed the opt_59673_arrays branch from ce059ee to 1669563 Compare March 4, 2021 19:11

angelazxu commented Mar 4, 2021

View reviewed changes

angelazxu force-pushed the opt_59673_arrays branch from 1669563 to b8d3023 Compare March 4, 2021 20:55

rytaft approved these changes Mar 4, 2021

View reviewed changes

RaduBerinde added the do-not-merge bors won't merge a PR with this label. label Mar 4, 2021

mgartner approved these changes Mar 5, 2021

View reviewed changes

angelazxu force-pushed the opt_59673_arrays branch from b8d3023 to e9a2a7b Compare March 10, 2021 00:19

angelazxu force-pushed the opt_59673_arrays branch from e9a2a7b to 7c5154b Compare March 10, 2021 01:30

mgartner approved these changes Mar 10, 2021

View reviewed changes

angelazxu removed the do-not-merge bors won't merge a PR with this label. label Mar 10, 2021

craig bot merged commit 3bae381 into cockroachdb:master Mar 10, 2021

angelazxu deleted the opt_59673_arrays branch March 10, 2021 03:01

jseldess mentioned this pull request Sep 8, 2021

opt: index accelerate <@ (contained by) expressions for array inverted indexes cockroachdb/docs#11570

Open

mgartner mentioned this pull request Jan 26, 2022

opt: index accelerate array overlap operator && #75477

Closed

exalate-issue-sync bot mentioned this pull request Jun 8, 2022

opt: index accelerate <@ (contained by) expressions for array inverted indexes cockroachdb/docs#14160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: index accelerate <@ (contained by) expressions for array inverted indexes #61219

opt: index accelerate <@ (contained by) expressions for array inverted indexes #61219

angelazxu commented Feb 27, 2021 •

edited

Loading

cockroach-teamcity commented Feb 27, 2021

rytaft left a comment

rytaft commented Mar 1, 2021

angelazxu left a comment

rytaft left a comment

rytaft left a comment

RaduBerinde left a comment

RaduBerinde left a comment

angelazxu left a comment

RaduBerinde left a comment

rytaft left a comment

angelazxu left a comment

angelazxu left a comment

rytaft left a comment

mgartner left a comment

mgartner left a comment

angelazxu commented Mar 10, 2021

craig bot commented Mar 10, 2021

angelazxu commented Mar 10, 2021

craig bot commented Mar 10, 2021

opt: index accelerate <@ (contained by) expressions for array inverted indexes #61219

opt: index accelerate <@ (contained by) expressions for array inverted indexes #61219

Conversation

angelazxu commented Feb 27, 2021 • edited Loading

cockroach-teamcity commented Feb 27, 2021

rytaft left a comment

Choose a reason for hiding this comment

rytaft commented Mar 1, 2021

angelazxu left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

angelazxu left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

angelazxu left a comment

Choose a reason for hiding this comment

angelazxu left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

mgartner left a comment

Choose a reason for hiding this comment

angelazxu commented Mar 10, 2021

craig bot commented Mar 10, 2021

angelazxu commented Mar 10, 2021

craig bot commented Mar 10, 2021

angelazxu commented Feb 27, 2021 •

edited

Loading