Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: index accelerate <@ (contained by) expressions for array inverted indexes #61219

Merged
merged 1 commit into from
Mar 10, 2021

Conversation

angelazxu
Copy link
Contributor

@angelazxu angelazxu commented Feb 27, 2021

Previously, we did not support index acceleration when checking if an indexed
column is <@ (contained by) a constant, or in other words, when the indexed
column is on the right side of a @> (contains) expression. We already perform
index acceleration for @> (contains) expressions where an indexed JSON or Array
column is on the left side of the expression.

This change adds support for using the inverted index with <@ expressions on
Array columns. When there is an inverted index available, a scan will be done on
the Array column using the spans found from the constant value. An additional
filter will then be applied, as the span expression will never be tight.
Support for JSON columns will be added later.

Informs: #59763

Release note (performance improvement): Some additional expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column on either side of the expression.

@angelazxu angelazxu requested a review from a team as a code owner February 27, 2021 02:24
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really well done! Super exciting to see!

In the PR/commit message:

Fixes: #59763

This should just say "Informs #59763", since this PR doesn't cover JSON or inverted joins.

You should also add a "Release note (performance improvement)", since this PR supports index-acceleration of new types of expressions.

I think we'll probably want to wait to merge this PR until after the branch is cut on March 8th, so no need to worry about a release justification.

Reviewed 8 of 8 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 998 at r1 (raw file):

4  {1,2,3}  {b,NULL,c}

subtest contained_by_arrays

Nice tests! Can you add these same tests (or a subset) to one of the execbuilder test files to show the EXPLAIN output?


pkg/sql/opt/invertedidx/json_array.go, line 92 at r1 (raw file):

// getInvertedExprForJSONOrArrayIndex gets an inverted.Expression that
// constrains a json or array index according to the given constant.
func getInvertedExprForJSONOrArrayIndex(

[nit] would be good to add an explanation of the contained parameter


pkg/sql/opt/invertedidx/json_array.go, line 193 at r1 (raw file):

			var spanExpr *inverted.SpanExpression
			if d, ok := nonIndexParam.(tree.Datum); ok {
				invertedExpr := getInvertedExprForJSONOrArrayIndex(evalCtx, d, false /* contained */)

no need to make the change in this PR, but we'll want to support inverted joins with <@ in a future PR too


pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file):

		if right.DataType().Family() == types.ArrayFamily &&
			j.index.Version() < descpb.EmptyArraysInInvertedIndexesVersion {
			if arr, ok := d.(*tree.DArray); ok && arr.Len() == 0 {

I don't think we want to check the length here. If the index doesn't include empty arrays we will return the wrong result for all arrays in the contained-by case.


pkg/sql/opt/invertedidx/json_array.go, line 352 at r1 (raw file):

	}

	// If neither conditions are met, we cannot create an InvertedExpression.

[nit] neither conditions are -> neither condition is


pkg/sql/opt/invertedidx/json_array_test.go, line 273 at r1 (raw file):

			tight:            false,
			unique:           true,
			remainingFilters: "a <@ '{1}'",

It would be good to add a couple more test cases here, for example combining a contained-by filter with another expression using AND or OR (see examples below)


pkg/sql/opt/memo/testdata/stats/inverted-array, line 148 at r1 (raw file):

 │    ├── key: (1)
 │    ├── fd: (1)-->(2)
 │    └── inverted-filter

seems like we should be able to remove the inverted filter in this case. Probably requires setting the correct value of unique


pkg/sql/opt/memo/testdata/stats/inverted-array, line 210 at r1 (raw file):

# right. An additional filter is required.
opt
SELECT * FROM t WHERE a <@ '{2}' OR a <@ '{3}'

how about adding another test with a conjunction (i.e. AND)


pkg/sql/rowenc/index_encoding.go, line 830 at r1 (raw file):

		return json.EncodeInvertedIndexKeys(inKey, val.(*tree.DJSON).JSON)
	case types.ArrayFamily:
		return encodeArrayInvertedIndexTableKeys(val.(*tree.DArray), inKey, version, false)

[nit] this should be false /* excludeNulls */


pkg/sql/rowenc/index_encoding.go, line 900 at r1 (raw file):

//
// This function does not return keys for empty arrays or for NULL array elements
// unless the version is at least descpb.EmptyArraysInInvertedIndexesVersion.

Update this comment to mention the excludeNulls param


pkg/sql/rowenc/index_encoding.go, line 984 at r1 (raw file):

	val *tree.DArray, inKey []byte, version descpb.IndexDescriptorVersion,
) (invertedExpr inverted.Expression, err error) {

It might be a good idea to check the version here and return an error if it's less than descpb.EmptyArraysInInvertedIndexesVersion.


pkg/sql/rowenc/index_encoding.go, line 989 at r1 (raw file):

	invertedExpr = inverted.ExprForSpan(
		inverted.MakeSingleValSpan(encoding.EncodeEmptyArray(inKey)), false, /* tight */
	)

I think you're missing a spanExpr.Unique=true here.


pkg/sql/rowenc/index_encoding.go, line 996 at r1 (raw file):

	}

	// We always exclude nulls from the list of keys when evaluating <@.

would help to add an example here to explain why


pkg/sql/rowenc/index_encoding.go, line 1009 at r1 (raw file):

	}

	// The inverted expression produced for <@ will never be tight.

an example would help


pkg/sql/rowenc/index_encoding_test.go, line 533 at r1 (raw file):

		result    string
		expected  bool
		contained bool

[nit] I was scratching my head over this test for a little bit because I was expecting it to be more similar to the test above. I like the changes that you've made here (including an expected value for whether the spans should contain the keys, since they aren't tight), but it might help to make these names more self-documenting. For example, I might change "result" to "contains" or "indexedValue", and "expected" to "containsKeys" (or something like that....)


pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):

		// Not all results included in the spans are contained by the value, so the
		// expression is never tight. Also, the expression is a union of spans, so
		// unique should never be true.

(unless the value is an empty array)


pkg/sql/rowenc/index_encoding_test.go, line 615 at r1 (raw file):

		if actual != expected {
			if expected {
				t.Errorf("expected spans of %s to include %s but it did not", value, result)

[nit] it did not -> they did not


pkg/sql/rowenc/index_encoding_test.go, line 617 at r1 (raw file):

				t.Errorf("expected spans of %s to include %s but it did not", value, result)
			} else {
				t.Errorf("expected spans of %s not to include %s but it did", value, result)

[nit] it did -> they did


pkg/sql/rowenc/index_encoding_test.go, line 621 at r1 (raw file):

		}

		// Since the spans are never tight, an additional filter to determine if

[nit] an additional filter -> apply an additional filter


pkg/sql/rowenc/index_encoding_test.go, line 623 at r1 (raw file):

		// Since the spans are never tight, an additional filter to determine if
		// the result is contained.
		isContained, err := tree.ArrayContains(&evalCtx, value.(*tree.DArray), result.(*tree.DArray))

I think you only want to execute ArrayContains if ContainsKeys returns true, right?


pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):

		runTest(value, result, c.expected, c.contained)
	}
}

It would be good to add some randomly generated test cases as well (similar to the Containing test). It won't catch false positives (due to the fact that the spans aren't tight), but might catch false negatives.

@rytaft
Copy link
Collaborator

rytaft commented Mar 1, 2021


pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I don't think we want to check the length here. If the index doesn't include empty arrays we will return the wrong result for all arrays in the contained-by case.

We have an index-version directive that you can use to test this (see some examples in xform/testdata/rules/select)

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the feedback!! I updated the commit message too.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/logictest/testdata/logic_test/inverted_index, line 998 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Nice tests! Can you add these same tests (or a subset) to one of the execbuilder test files to show the EXPLAIN output?

added to the inverted_index test file in execbuilder!


pkg/sql/opt/invertedidx/json_array.go, line 92 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] would be good to add an explanation of the contained parameter

Done.


pkg/sql/opt/invertedidx/json_array.go, line 193 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

no need to make the change in this PR, but we'll want to support inverted joins with <@ in a future PR too

Okay!


pkg/sql/opt/invertedidx/json_array.go, line 343 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

We have an index-version directive that you can use to test this (see some examples in xform/testdata/rules/select)

Done - oh wow, that makes sense, fixed it and added a test in the inverted-array test file!


pkg/sql/opt/invertedidx/json_array.go, line 352 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] neither conditions are -> neither condition is

Done.


pkg/sql/opt/invertedidx/json_array_test.go, line 273 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It would be good to add a couple more test cases here, for example combining a contained-by filter with another expression using AND or OR (see examples below)

Done.


pkg/sql/opt/memo/testdata/stats/inverted-array, line 148 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

seems like we should be able to remove the inverted filter in this case. Probably requires setting the correct value of unique

Done - made some changes in encodeContainedArrayInvertedIndexSpans to set unique correctly


pkg/sql/opt/memo/testdata/stats/inverted-array, line 210 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

how about adding another test with a conjunction (i.e. AND)

Done.


pkg/sql/rowenc/index_encoding.go, line 830 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] this should be false /* excludeNulls */

Done.


pkg/sql/rowenc/index_encoding.go, line 900 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Update this comment to mention the excludeNulls param

Done.


pkg/sql/rowenc/index_encoding.go, line 984 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It might be a good idea to check the version here and return an error if it's less than descpb.EmptyArraysInInvertedIndexesVersion.

Done.


pkg/sql/rowenc/index_encoding.go, line 989 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you're missing a spanExpr.Unique=true here.

Done.


pkg/sql/rowenc/index_encoding.go, line 996 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

would help to add an example here to explain why

Done.


pkg/sql/rowenc/index_encoding.go, line 1009 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

an example would help

Done.


pkg/sql/rowenc/index_encoding_test.go, line 533 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I was scratching my head over this test for a little bit because I was expecting it to be more similar to the test above. I like the changes that you've made here (including an expected value for whether the spans should contain the keys, since they aren't tight), but it might help to make these names more self-documenting. For example, I might change "result" to "contains" or "indexedValue", and "expected" to "containsKeys" (or something like that....)

Done - also added unique


pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

(unless the value is an empty array)

Done - I rephrased it as "unless the value produces a single empty array span" because a value like ARRAY[NULL] or ARRAY[NULL, NULL] would produce the same thing (since nulls are excluded).


pkg/sql/rowenc/index_encoding_test.go, line 615 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] it did not -> they did not

Done.


pkg/sql/rowenc/index_encoding_test.go, line 617 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] it did -> they did

Done.


pkg/sql/rowenc/index_encoding_test.go, line 621 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] an additional filter -> apply an additional filter

Done.


pkg/sql/rowenc/index_encoding_test.go, line 623 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you only want to execute ArrayContains if ContainsKeys returns true, right?

Done - yes that's right, fixed it!


pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It would be good to add some randomly generated test cases as well (similar to the Containing test). It won't catch false positives (due to the fact that the spans aren't tight), but might catch false negatives.

Added! I made it so that the randomly generated tests will only run if containsKeys=true so we just test for false negatives.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column is on either side of the expression.

[nit] I'd say "Some additional expressions with ARRAY columns using the...."

(Don't forget to update the PR message to match the commit message -- you'll need to do that from the GitHub UI)

Reviewed 6 of 6 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)


pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):

	// The contained parameter indicates that we are checking if the indexed
	// column is contained by a constant. This is handled differently than
	// checking if the indexed column contains a constant.

[nit] I'd make this part of the function comment, and I'd explicitly reference the operators (<@ and @>) to make this clearer. Maybe renaming the parameter to containedBy would also help. Instead of saying they are "handled differently", I'd give a few more details. Something like, "contains results in a span expression representing the intersection of all paths through the JSON or Array, while containedBy results in a span expression representing the union of all paths through the JSON or Array".


pkg/sql/opt/invertedidx/json_array_test.go, line 344 at r2 (raw file):

		},
		{
			// We can constrain the JSON index when the functions are AND-ed.

[nit] mention that this will work once JSON is supported for <@


pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):

# the right. An additional filter is required.
opt
SELECT * FROM t WHERE a <@ '{2}' OR a <@ '{3}'

I think you meant to make this AND? (Also "conjunction" would be a bit more correct than "intersection" here)


pkg/sql/opt/memo/testdata/stats/inverted-array, line 278 at r2 (raw file):

      └── (ARRAY[2] @> a:2) OR (ARRAY[3] @> a:2) [type=bool, outer=(2), immutable]

# The inverted index will not be used when the version does not have empty

It would be better to put this test case in xform/testdata/rules/select (and also add a few more tests for <@ there), since that's where the all the tests for the GenerateInvertedIndexScans rule live. This test file is for testing that the stats are calculated correctly for inverted array indexes, so it doesn't make much sense to include this test here.


pkg/sql/rowenc/index_encoding.go, line 888 at r2 (raw file):

	datum := tree.UnwrapDatum(evalCtx, val)
	typ := val.ResolvedType().Family()
	if typ == types.ArrayFamily {

[nit] change this to a switch


pkg/sql/rowenc/index_encoding.go, line 905 at r2 (raw file):

// unless the version is at least descpb.EmptyArraysInInvertedIndexesVersion.
// It does not return keys for NULL array elements when a <@ (contained by)
// expression is being evaluated. This is indicated by excludeNulls.

[nit] I would phrase this differently, because this function doesn't doesn't know that a contained by expression is being evaluated. It only knows that excludeNulls is set. Maybe change this to: "It also does not return keys for NULL array elements if excludeNulls is true. This option is used by encodeContainedArrayInvertedIndexSpans, which builds index spans to evaluate <@ (contained by) expressions."


pkg/sql/rowenc/index_encoding.go, line 1023 at r2 (raw file):

	// inverted expression would scan for all elements in x that contain the
	// empty array or ARRAY[1]. The resulting elements could contain other values
	// and would need to be passed through an additional filter.

[nit] the word "elements" is a bit confusing since it's not clear whether it means entire arrays or elements of the array. I'd change it to say "arrays" instead. You might also give one additional example at the end to make it really clear, like: "For example, ARRRAY[1, 2, 3] would be returned by the scan, but it should be filtered out since ARRRAY[1, 2, 3] <@ ARRAY[1] is false"


pkg/sql/rowenc/index_encoding_test.go, line 542 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Done - I rephrased it as "unless the value produces a single empty array span" because a value like ARRAY[NULL] or ARRAY[NULL, NULL] would produce the same thing (since nulls are excluded).

Nice!


pkg/sql/rowenc/index_encoding_test.go, line 638 at r1 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Added! I made it so that the randomly generated tests will only run if containsKeys=true so we just test for false negatives.

Good idea!


pkg/sql/rowenc/index_encoding_test.go, line 409 at r2 (raw file):

}

func TestEncodeContainingArrayInvertedIndexSpans(t *testing.T) {

If you don't mind, it would be great to update this test case to match the new one you're adding, so people reading both can easily see the parallels. I think that corresponds to changing value here to indexedValue, and contains to value. I'd also change the order of value and indexedValue in your test case so that indexedValue is first, since that matches the order of operands (e.g. indexedValue @> value and indexedValue <@ value) (although it doesn't really matter -- just be consistent across the two tests). So the only difference will be that your test case also includes a containsKeys boolean, which is useful since the <@ spans are not tight.


pkg/sql/rowenc/index_encoding_test.go, line 533 at r2 (raw file):

		indexedValue string
		expected     bool
		containsKeys bool

I think you want these two booleans to be swapped: containsKeys should always be true for the first block below, and expected should be the expected result of evaluating indexedValue <@ value


pkg/sql/rowenc/index_encoding_test.go, line 539 at r2 (raw file):

		// This test uses EncodeInvertedIndexTableKeys and EncodeContainedInvertedIndexSpans
		// to determine if the spans produced from the given array value will correctly
		// include and exclude given results. It then checks if result <@ value.

need to update this comment to refer to the new names


pkg/sql/rowenc/index_encoding_test.go, line 633 at r2 (raw file):

			isContained, err := tree.ArrayContains(&evalCtx, value.(*tree.DArray), indexedValue.(*tree.DArray))
			require.NoError(t, err)
			if bool(*isContained) != containsKeys {

I think you should declare isContained outside of the if expected block, since we still want to perform this check even if expected is false. (although I think you'll be changing expected to containsKeys...)


pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):

		// We cannot check for false positives with these tests (due to the fact that
		// the spans are not tight), so we will only test for false negatives.
		containsKeys, err := tree.ArrayContains(&evalCtx, left.(*tree.DArray), right.(*tree.DArray))

Similar to above, I'd name this something other than containsKeys. contains, isContained, or expected would work.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu)


pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Similar to above, I'd name this something other than containsKeys. contains, isContained, or expected would work.

Also -- isn't this evaluating left @> right, not left <@ right? I think this is only working right now since right is getting passed as the indexedValue in runTest.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)


pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I'd make this part of the function comment, and I'd explicitly reference the operators (<@ and @>) to make this clearer. Maybe renaming the parameter to containedBy would also help. Instead of saying they are "handled differently", I'd give a few more details. Something like, "contains results in a span expression representing the intersection of all paths through the JSON or Array, while containedBy results in a span expression representing the union of all paths through the JSON or Array".

I think we should consider removing this wrapper and simplifying the rowenc APIs instead. AFAICT this is the only non-test code that calls these functions, so there is no reason why they need to be more complicated that this wrapper:

  • do we need to pass b? I don't think we ever pass anything non-nil here
  • do we need to pass the version? It's always EmptyArraysInInvertedIndexesVersion except for test code that specifically tests the other version

I think it would then be more clear if we just use the rowenc functions instead, even if it means we have to add an if err != nil { panic(err) } every time. If we still want a wrapper, I'd make separate wrappers instead of a boolean argument.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):

Previously, RaduBerinde wrote…

I think we should consider removing this wrapper and simplifying the rowenc APIs instead. AFAICT this is the only non-test code that calls these functions, so there is no reason why they need to be more complicated that this wrapper:

  • do we need to pass b? I don't think we ever pass anything non-nil here
  • do we need to pass the version? It's always EmptyArraysInInvertedIndexesVersion except for test code that specifically tests the other version

I think it would then be more clear if we just use the rowenc functions instead, even if it means we have to add an if err != nil { panic(err) } every time. If we still want a wrapper, I'd make separate wrappers instead of a boolean argument.

I ended up getting rid of contained and making separate wrappers. The rowenc functions do show up in quite a few places so I think having the wrapper makes it look cleaner. I did get rid of b and version though - now I just declare var inKey [byte] directly in the encoding spans functions, and I pass in EmptyArraysInInvertedIndexesVersion to encodeArrayInvertedIndexTableKeys (the only place it is needed).


pkg/sql/opt/invertedidx/json_array_test.go, line 344 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] mention that this will work once JSON is supported for <@

Done.


pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you meant to make this AND? (Also "conjunction" would be a bit more correct than "intersection" here)

Fixed, oops!!


pkg/sql/opt/memo/testdata/stats/inverted-array, line 278 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

It would be better to put this test case in xform/testdata/rules/select (and also add a few more tests for <@ there), since that's where the all the tests for the GenerateInvertedIndexScans rule live. This test file is for testing that the stats are calculated correctly for inverted array indexes, so it doesn't make much sense to include this test here.

Done.


pkg/sql/rowenc/index_encoding.go, line 888 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] change this to a switch

Done.


pkg/sql/rowenc/index_encoding.go, line 905 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I would phrase this differently, because this function doesn't doesn't know that a contained by expression is being evaluated. It only knows that excludeNulls is set. Maybe change this to: "It also does not return keys for NULL array elements if excludeNulls is true. This option is used by encodeContainedArrayInvertedIndexSpans, which builds index spans to evaluate <@ (contained by) expressions."

Done.


pkg/sql/rowenc/index_encoding.go, line 1023 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] the word "elements" is a bit confusing since it's not clear whether it means entire arrays or elements of the array. I'd change it to say "arrays" instead. You might also give one additional example at the end to make it really clear, like: "For example, ARRRAY[1, 2, 3] would be returned by the scan, but it should be filtered out since ARRRAY[1, 2, 3] <@ ARRAY[1] is false"

Done.


pkg/sql/rowenc/index_encoding_test.go, line 409 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

If you don't mind, it would be great to update this test case to match the new one you're adding, so people reading both can easily see the parallels. I think that corresponds to changing value here to indexedValue, and contains to value. I'd also change the order of value and indexedValue in your test case so that indexedValue is first, since that matches the order of operands (e.g. indexedValue @> value and indexedValue <@ value) (although it doesn't really matter -- just be consistent across the two tests). So the only difference will be that your test case also includes a containsKeys boolean, which is useful since the <@ spans are not tight.

Done!


pkg/sql/rowenc/index_encoding_test.go, line 533 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you want these two booleans to be swapped: containsKeys should always be true for the first block below, and expected should be the expected result of evaluating indexedValue <@ value

Done!


pkg/sql/rowenc/index_encoding_test.go, line 539 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

need to update this comment to refer to the new names

Done.


pkg/sql/rowenc/index_encoding_test.go, line 633 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think you should declare isContained outside of the if expected block, since we still want to perform this check even if expected is false. (although I think you'll be changing expected to containsKeys...)

Ohh I see what you mean. When I put it inside the if expected block, it was to mimic the logic where we are applying an additional filter only on the arrays contained by the spans. So if the array was excluded from the spans (aka containsKeys was false), the filter wouldn't be applied to it in the first place (since it wouldn't have been included in the results). In other words, whenever containsKeys is false, expected is always false because the previous step will have already removed it as a possibility. I agree we should be checking it regardless though so I've changed it!


pkg/sql/rowenc/index_encoding_test.go, line 659 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Also -- isn't this evaluating left @> right, not left <@ right? I think this is only working right now since right is getting passed as the indexedValue in runTest.

Yeah you're right! I reordered runtest parameters to be indexedValue, value now and also passed in right, left as haystack, needles in ArrayContains, so now we are evaluating left <@ right :) Also renamed everything so that isContained matches containsKeys, and actual matches expected when we are checking the results.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @rytaft)


pkg/sql/opt/invertedidx/json_array.go, line 98 at r2 (raw file):

Previously, angelazxu (Angela Xu) wrote…

I ended up getting rid of contained and making separate wrappers. The rowenc functions do show up in quite a few places so I think having the wrapper makes it look cleaner. I did get rid of b and version though - now I just declare var inKey [byte] directly in the encoding spans functions, and I pass in EmptyArraysInInvertedIndexesVersion to encodeArrayInvertedIndexTableKeys (the only place it is needed).

Thanks, this is much better!


pkg/sql/opt/invertedidx/json_array.go, line 94 at r3 (raw file):

// This results in a span expression representing the intersection of all paths
// through the JSON or Array.
func getInvertedExprForJSONOrArrayIndex(

[nit] add ForContaining


pkg/sql/opt/invertedidx/json_array.go, line 109 at r3 (raw file):

// through the JSON or Array. This function is only used when checking if an
// indexed column is contained by (<@) a constant.
func getInvertedExprForJSONOrArrayIndexForContained(

[nit] ForContainedBy (to match the ComparisonOperator).


pkg/sql/rowenc/index_encoding.go, line 854 at r3 (raw file):

		return nil, nil
	}
	var inKey []byte

[nit] you can also directly pass nil /* inKey */


pkg/sql/rowenc/index_encoding.go, line 862 at r3 (raw file):

		return encodeContainingArrayInvertedIndexSpans(val.(*tree.DArray), inKey)
	}
	return nil, errors.AssertionFailedf(

[nit] move to default:

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Just a few remaining nits

Reviewed 6 of 6 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @angelazxu and @RaduBerinde)


pkg/sql/opt/invertedidx/json_array.go, line 94 at r3 (raw file):

Previously, RaduBerinde wrote…

[nit] add ForContaining

Also mention @> in the comment (like you did below for the <@ function)


pkg/sql/opt/memo/testdata/stats/inverted-array, line 242 at r2 (raw file):

Previously, angelazxu (Angela Xu) wrote…

Fixed, oops!!

Looks good -- you'll probably need to use TESTFLAGS='-rewrite=true' to fix the output


pkg/sql/opt/xform/testdata/rules/select, line 2606 at r3 (raw file):


# The inverted index will not be used when the version does not have empty
# arrays in the inverted index.

[nit] mention in the comment that we can never use the inverted index for <@ expressions if the index version does not have empty arrays


pkg/sql/rowenc/index_encoding.go, line 847 at r3 (raw file):

// comments in the SpanExpression definition for details.
//
// The input inKey is prefixed to the keys in all returned spans.

you can remove this last bit from the comment since you've removed the inKey param


pkg/sql/rowenc/index_encoding.go, line 880 at r3 (raw file):

// SpanExpression definition for details.
//
// The input inKey is prefixed to the keys in all returned spans.

you can remove this last bit from the comment since you've removed the inKey param


pkg/sql/rowenc/index_encoding.go, line 889 at r3 (raw file):

	var inKey []byte
	datum := tree.UnwrapDatum(evalCtx, val)
	switch typ := val.ResolvedType().Family(); typ {

[nit] are you using typ? If not, you can just switch on val.ResolvedType().Family()


pkg/sql/rowenc/index_encoding_test.go, line 538 at r3 (raw file):

		// This test uses EncodeInvertedIndexTableKeys and EncodeContainedInvertedIndexSpans
		// to determine if the spans produced from the first Array value will
		// correctly include or exclude the second value, indicated by

I think the spans are actually produced from the second value, so I think you want to swap these
(first <-> second)


pkg/sql/rowenc/index_encoding_test.go, line 588 at r3 (raw file):

	}

	runTest := func(indexedValue, value tree.Datum, containsKeys, expected, unique bool) {

[nit] I'd change containsKeys to expectContainsKeys and unique to expectUnique


pkg/sql/rowenc/index_encoding_test.go, line 618 at r3 (raw file):

		require.NoError(t, err)

		if isContained != containsKeys {

[nit] If you change containsKeys to expectContainsKeys, then you can change isContained to containsKeys


pkg/sql/rowenc/index_encoding_test.go, line 633 at r3 (raw file):

			if expected {
				t.Errorf("expected %s to be contained by %s but it was not", indexedValue, value)
			}

I don't think t.Errorf stops execution, so you probably need an else here

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/xform/testdata/rules/select, line 2606 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] mention in the comment that we can never use the inverted index for <@ expressions if the index version does not have empty arrays

Done.


pkg/sql/rowenc/index_encoding.go, line 847 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

you can remove this last bit from the comment since you've removed the inKey param

Done.


pkg/sql/rowenc/index_encoding.go, line 880 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

you can remove this last bit from the comment since you've removed the inKey param

Done.


pkg/sql/rowenc/index_encoding.go, line 889 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] are you using typ? If not, you can just switch on val.ResolvedType().Family()

Done.


pkg/sql/rowenc/index_encoding_test.go, line 538 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think the spans are actually produced from the second value, so I think you want to swap these
(first <-> second)

Done.


pkg/sql/rowenc/index_encoding_test.go, line 588 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] I'd change containsKeys to expectContainsKeys and unique to expectUnique

Done.


pkg/sql/rowenc/index_encoding_test.go, line 618 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

[nit] If you change containsKeys to expectContainsKeys, then you can change isContained to containsKeys

Done.


pkg/sql/rowenc/index_encoding_test.go, line 633 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I don't think t.Errorf stops execution, so you probably need an else here

Done.

Copy link
Contributor Author

@angelazxu angelazxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized I forgot to change the names for getInvertedExprForJSONOrArrayIndexForContaining/ContainedBy in the comments, will change that in a bit!!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!!

Reviewed 5 of 5 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@RaduBerinde RaduBerinde added the do-not-merge bors won't merge a PR with this label. label Mar 4, 2021
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great work! 💯

Reviewed 3 of 8 files at r1, 1 of 6 files at r2, 1 of 6 files at r3, 5 of 5 files at r4.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @angelazxu)


pkg/sql/opt/invertedidx/json_array.go, line 344 at r4 (raw file):

		// When the second argument is a variable or expression corresponding to
		// the index column and the first argument is a constant, we get the
		// equivalent InvertedExpression for right <@ left.

[nit] It might be clearer to put this comment inside the else if block below, since that's what it is describing. If you're concerned about consistency with the comment regarding the left @> right case above, you could move that into the if block too.


pkg/sql/opt/invertedidx/json_array.go, line 353 at r4 (raw file):

			return inverted.NonInvertedColExpression{}
		}
		return getInvertedExprForJSONOrArrayIndexForContainedBy(evalCtx, d)

[nit] you might be able to clean up some of the duplicated code in these two paths. Something like:

var indexColumn opt.ScalarExpr
var constantVal opt.ScalarExpr
if isIndexColumn(j.tabID, j.index, left, j.computedColumns && memo.CanExtractConstDatum(right) {
    indexColumn = left
    constantVal = right
} else if isIndexColumn(j.tabID, j.index, right, j.computedColumns && memo.CanExtractConstDatum(left) {
    indexColumn = right
    constantVal = left
}
// rest of the logic on indexColumn and constantVal

pkg/sql/rowenc/index_encoding.go, line 865 at r4 (raw file):

}

// EncodeContainedInvertedIndexSpans takes in a key prefix and returns the

I think this needs to be updated - there's no key prefix argument.


pkg/sql/rowenc/index_encoding.go, line 1020 at r4 (raw file):

	// and would need to be passed through an additional filter. For example,
	// ARRAY[1, 2, 3] would be returned by the scan, but it should be filtered
	// out since ARRAY[1, 2, 3] <@ ARRAY[1] is false.

Great explanation and example! 💯

Previously, we did not support index acceleration when checking if an indexed
column is <@ (contained by) a constant, or in other words, when the indexed
column is on the right side of a @> (contains) expression. We already perform
index acceleration for @> (contains) expressions where an indexed JSON or Array
column is on the left side of the expression.

This change adds support for using the inverted index with <@ expressions on
Array columns. When there is an inverted index available, a scan will be done on
the Array column using the spans found from the constant value. An additional
filter will then be applied, as the span expression will never be tight.
Support for JSON columns will be added later.

Informs: cockroachdb#59763

Release note (performance improvement): Some additional expressions using the <@ (contained by) and @> (contains) operators now support index-acceleration with the indexed column on either side of the expression.
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @angelazxu)

@angelazxu
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 10, 2021

👎 Rejected by label

@angelazxu angelazxu removed the do-not-merge bors won't merge a PR with this label. label Mar 10, 2021
@angelazxu
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 10, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants