Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: inverted-index accelerate filters of the form j = ... and j IN (...) #101178

Merged
merged 1 commit into from Apr 14, 2023

Conversation

Shivs11
Copy link
Contributor

@Shivs11 Shivs11 commented Apr 10, 2023

An inverted index can be used to accelerate queries, using the = and the IN operator,
with filters having the fetch value operator present;
eg: j->0 = '{"b": "c"}' and j->0 IN ('1', '2')

where j is a JSON column.
This has been done completed here: #96471 and #94666

This PR aims to add support to generate inverted spans for queries not involving the
fetch val operator and having the = operator or the IN operator. This was done by
creating JSON objects from the JSON values present on the right side of the equality or
the IN expression. Moreover, in this case, no "keys" were extracted from the left hand
side of the operators while creating these JSON objects which were useful for creating the
inverted spans.

Epic: CRDB-3301

Fixes: #96658

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that do not use the JSON fetch value operator (->) alongside
the IN and the = operators. eg: json_col = '{"b": "c"}' OR json_col IN ('"a"', '1')

@Shivs11 Shivs11 requested a review from a team as a code owner April 10, 2023 22:27
@Shivs11 Shivs11 requested a review from michae2 April 10, 2023 22:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Shivs11 Shivs11 requested review from mgartner and rytaft and removed request for michae2 April 10, 2023 22:27
@Shivs11
Copy link
Contributor Author

Shivs11 commented Apr 11, 2023

bors r+

@rytaft
Copy link
Collaborator

rytaft commented Apr 11, 2023

bors r-

@rytaft
Copy link
Collaborator

rytaft commented Apr 11, 2023

This hasn't been reviewed yet. I think maybe you meant to bors the other PR?

@Shivs11
Copy link
Contributor Author

Shivs11 commented Apr 11, 2023

This hasn't been reviewed yet. I think maybe you meant to bors the other PR?

Ah my apologies. Changes have been made.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)


-- commits line 7 at r1:
I don't understand this last sentence


pkg/sql/opt/invertedidx/json_array.go line 591 at r1 (raw file):

	// on which an inverted index is constructed. Other entries inside of this
	// index, such as JSON arrays and objects, may consist of the element 1 present
	// within them. Since the encodings inside the index do not contain the position

nit: "may consist of the element 1 present within them" sounds a bit convoluted. I'd change this to "may contain the element 1 along with other elements" or something like that.


pkg/sql/logictest/testdata/logic_test/inverted_index line 829 at r1 (raw file):


query T
SELECT j FROM f@i WHERE j = '{"a": "a"}' ORDER BY k

this is a duplicate of the one above it

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @rytaft !

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


-- commits line 7 at r1:

Previously, rytaft (Rebecca Taft) wrote…

I don't understand this last sentence

Replaced this with the intended PR numbers.


pkg/sql/opt/invertedidx/json_array.go line 591 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: "may consist of the element 1 present within them" sounds a bit convoluted. I'd change this to "may contain the element 1 along with other elements" or something like that.

I agree. Done.


pkg/sql/logictest/testdata/logic_test/inverted_index line 829 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

this is a duplicate of the one above it

Removed.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @Shivs11)


-- commits line 7 at r1:

Previously, Shivs11 (Shivam) wrote…

Replaced this with the intended PR numbers.

Did you push your latest changes? The last sentence looks incomplete: "This has been completed here:"


-- commits line 2 at r2:
nit: Should "and" be "or" here?


-- commits line 13 at r2:
I don't understand this last sentence. Are you suggesting that keys from the LHS were useful in creating inverted spans previously, but no longer are useful? Or simply that there are not keys on the LHS?


-- commits line 21 at r2:
It's odd to write this with the negative perspective: "queries that do not use...". I think you can word it something like "for queries of this form..." to be less confusing.


pkg/sql/opt/invertedidx/json_array.go line 437 at r2 (raw file):

		fetch = false
	default:
		return nil

nit: I think you should return inverted.NonInvertedColExpression here to match the comment for the function and the rest of the function.


pkg/sql/opt/invertedidx/json_array.go line 562 at r2 (raw file):

}

// extractJSONFetchValEqCondition extracts an InvertedExpression representing an

nit: incorrect function name


pkg/sql/opt/invertedidx/json_array.go line 582 at r2 (raw file):

	// For Equals expressions, we will generate the inverted expression for the
	// single object built from the keys and val.
	invertedExpr := getInvertedExprForJSONOrArrayIndexForContaining(ctx, evalCtx, val)

See my comment in the tests - I'm skeptical that we can use getInvertedExprForJSONOrArrayIndexForContaining in this case.


pkg/sql/opt/xform/testdata/rules/select line 2808 at r2 (raw file):

      │         │    └── spans
      │         │         ├── ["7\x00\x01\x00", "7\x00\x01\x00"]
      │         │         └── ["7\x00\x03\x00\x01\x00", "7\x00\x03\x00\x01\x00"]

We should only have a single span here, right? AFAIK there's only one possible JSON path for the JSON value 1. I think it's because we generating spans with getInvertedExprForJSONOrArrayIndexForContaining. "Containment" is different than "equality", and I believe it generates two spans to handle the case when the JSON value is an array containing the value 1.


pkg/sql/opt/invertedidx/json_array_test.go line 1089 at r2 (raw file):

			ok:               true,
			tight:            false,
			unique:           false,

Ideally, this should be unique=true, correct? Is it possible for the generated spans to scan two entries with the same PK? I don't think it should be, but maybe it's due to the use of "containment" spans not "equality" spans (see my other comments).

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


-- commits line 7 at r1:

Previously, mgartner (Marcus Gartner) wrote…

Did you push your latest changes? The last sentence looks incomplete: "This has been completed here:"

For some reason the PR numbers were not showing up on the log message. This has been fixed and the issue was about line wrapping the text to 72 columns.


-- commits line 2 at r2:

Previously, mgartner (Marcus Gartner) wrote…

nit: Should "and" be "or" here?

I believe "and". The current PR adds support in two separate cases.


-- commits line 13 at r2:

Previously, mgartner (Marcus Gartner) wrote…

I don't understand this last sentence. Are you suggesting that keys from the LHS were useful in creating inverted spans previously, but no longer are useful? Or simply that there are not keys on the LHS?

What I meant is the following: keys from the LHS were useful in creating inverted spans previously, but no longer are useful since there are none present this time around. Shall rephrase this, thank you.


-- commits line 21 at r2:

Previously, mgartner (Marcus Gartner) wrote…

It's odd to write this with the negative perspective: "queries that do not use...". I think you can word it something like "for queries of this form..." to be less confusing.

Sure. Changes have been made.


pkg/sql/opt/invertedidx/json_array.go line 437 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: I think you should return inverted.NonInvertedColExpression here to match the comment for the function and the rest of the function.

Hmm, okay. Changed.


pkg/sql/opt/invertedidx/json_array.go line 562 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: incorrect function name

Resolved.


pkg/sql/opt/invertedidx/json_array.go line 582 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

See my comment in the tests - I'm skeptical that we can use getInvertedExprForJSONOrArrayIndexForContaining in this case.

Resolved.


pkg/sql/opt/xform/testdata/rules/select line 2808 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

We should only have a single span here, right? AFAIK there's only one possible JSON path for the JSON value 1. I think it's because we generating spans with getInvertedExprForJSONOrArrayIndexForContaining. "Containment" is different than "equality", and I believe it generates two spans to handle the case when the JSON value is an array containing the value 1.

Was resolved offline. The decision, for now, is to use the function getInvertedExprForJSONOrArrayIndexForContaining even though it is not the most optimum solution.


pkg/sql/opt/invertedidx/json_array_test.go line 1089 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Ideally, this should be unique=true, correct? Is it possible for the generated spans to scan two entries with the same PK? I don't think it should be, but maybe it's due to the use of "containment" spans not "equality" spans (see my other comments).

Resolved offline.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 5 files at r1, 1 of 1 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @Shivs11)


-- commits line 7 at r1:

Previously, Shivs11 (Shivam) wrote…

For some reason the PR numbers were not showing up on the log message. This has been fixed and the issue was about line wrapping the text to 72 columns.

I see. In a git commit message the # character at the beginning of a line denotes a comment, so you have to be careful with #123 to issues and PRs.


pkg/sql/opt/xform/testdata/rules/select line 2808 at r2 (raw file):

Previously, Shivs11 (Shivam) wrote…

Was resolved offline. The decision, for now, is to use the function getInvertedExprForJSONOrArrayIndexForContaining even though it is not the most optimum solution.

👍


pkg/sql/opt/xform/testdata/rules/select line 2785 at r4 (raw file):

# applied again.

nit: remove extra new line

(...)

An inverted index can be used to accelerate queries, using the = and the
IN operator, with filters having the fetch value operator present; eg:
j->0  = '{"b": "c"}' and j->0 IN ('1', '2') where j is a JSON column.
This has been completed here: cockroachdb#96471 and cockroachdb#94666

This PR aims to add support to generate inverted spans for queries not
involving the fetch val operator and having the `=` operator or the `IN`
operator. This was done by creating JSON objects from the JSON values
present on the right side of the equality or the IN expression.
Previously, keys from the LHS were useful in creating inverted spans
but are no longer useful since they are absent in this scenario.

Epic: CRDB-3301

Fixes: cockroachdb#96658

Release note (performance improvement): The optimizer now plans inverted
index scans for queries using `IN` or the `=` operators without the
fetch val (`->`) operator.  eg: json_col = '{"b":"c"}' OR json_col
IN ('"a"', '1')
Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review! @mgartner

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

@Shivs11
Copy link
Contributor Author

Shivs11 commented Apr 14, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 14, 2023

Build succeeded:

@craig craig bot merged commit 14615c5 into cockroachdb:master Apr 14, 2023
7 checks passed
mgartner added a commit to mgartner/cockroach that referenced this pull request Oct 18, 2023
This commit fixes a bug introduced in cockroachdb#101178 that allows the optimizer
to generated inverted index scans on columns that are not filtered by
the query. For example, an inverted index over the column `j1` could be
scanned for a filter involving a different column, like `j2 = '5'`. The
bug is caused by a simple omission of code that must check that the
column in the filter is an indexed column.

Fixes cockroachdb#111963

There is no release note because this bug is not present in any
releases.

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this pull request Oct 20, 2023
This commit fixes a bug introduced in cockroachdb#101178 that allows the optimizer
to generated inverted index scans on columns that are not filtered by
the query. For example, an inverted index over the column `j1` could be
scanned for a filter involving a different column, like `j2 = '5'`. The
bug is caused by a simple omission of code that must check that the
column in the filter is an indexed column.

Fixes cockroachdb#111963

There is no release note because this bug is not present in any
releases.

Release note: None
craig bot pushed a commit that referenced this pull request Oct 20, 2023
111713: sql: fix nil-pointer error in local retry r=DrewKimball a=DrewKimball

#### tree: return correct parse error for pg_lsn

This patch changes the error returned upon failing to parse a PG_LSN
value to match postgres. Previously, the error was an internal error.

Informs #111327

Release note: None

#### sql: fix nil-pointer error in local retry

In #105451, we added logic to locally retry a distributed query
after an error. However, the retry logic unconditionally updated a
field of `DistSQLReceiver` that may be nil, which could cause a
nil-pointer error in some code paths (e.g. apply-join). This patch
adds a check that the field is non-nil, as is done for other places
where it is updated.

There is no release note because the change has not yet made it into
a release.

Fixes #111327

Release note: None

112654: opt: fix inverted index constrained scans for equality filters r=mgartner a=mgartner

#### opt: fix inverted index constrained scans for equality filters

This commit fixes a bug introduced in #101178 that allows the optimizer
to generated inverted index scans on columns that are not filtered by
the query. For example, an inverted index over the column `j1` could be
scanned for a filter involving a different column, like `j2 = '5'`. The
bug is caused by a simple omission of code that must check that the
column in the filter is an indexed column.

Fixes #111963

There is no release note because this bug is not present in any
releases.

Release note: None

#### randgen: generate single-column indexes more often

This commit makes `randgen` more likely to generate single-column
indexes. It is motivated by the bug #111963, which surprisingly lived on
the master branch for sixth months without being detected. It's not
entirely clear why TLP or other randomized tests did not catch the bug,
which has such a simple reproduction.

One theory is that indexes tend to be multi-column and constrained scans
on multi-column inverted indexes are not commonly planned for randomly
generated queries because the set of requirements to generate the scan
are very specific: the query must hold each prefix column constant, e.g.
`a=1 AND b=2 AND j='5'::JSON`. The likelihood of randomly generating
such an expression may be so low that the bug was not caught.

By making 10% of indexes single-column, this bug may have been more
likely to be caught because only the inverted index column needs to be
constrained by an equality filter.

Release note: None


112690: sql: disallow invocation of procedures outside of CALL r=mgartner a=mgartner

#### sql: disallow invocation of procedures outside of CALL

This commit adds some missing checks to ensure that procedures cannot be
invoked in any context besides as the root expression in `CALL`
statements.

Epic: CRDB-25388

Release note: None

#### sql: add tests with function invocation in procedure argument

This commit adds a couple of tests that show that functions can be used
in procedure argument expressions.

Release note: None


112698: sql: clarify comments/naming of descriptorChanged flag r=rafiss a=rafiss

fixes #110727
Release note: None

112701: sql/logictest: fix flakes in select_for_update_read_committed r=mgartner a=mgartner

The `select_for_update_read_committed` tests were flaking because not
all statements were being run under READ COMMITTED isolation. The logic
test infrastructure does not allow fine-grained control of sessions, and
setting the isolation level in one statement would only apply to a
single session. Subsequent statements are not guaranteed to run in the
same session because they could run in any session in the connection
pool. This commit wraps each statement in an explicitly transaction with
an explicit isolation level to ensure READ COMMITTED is used.

In the future, we should investigate allowing fine-grained and explicit
control of sessions in logic tests.

Fixes #112677

Release note: None


112726: sql: make tests error if a leaf txn is not created when expected r=rharding6373 a=rharding6373

This adds a test-only error if a leaf transaction is expected to be used by a plan but a root transaction is used instead.

Epic: none
Informs: #111097

Release note: None

112767: log: fix stacktrace test goroutine counts r=rickystewart a=dhartunian

Previously, we would use the count of the string `goroutine ` as a proxy for the number of goroutines in the stacktrace. This stopped working in go 1.21 due to this change:
golang/go@51225f6

We should consider using a stacktrace parser in the future.

Supports #112088

Epic: None
Release note: None

Co-authored-by: Drew Kimball <drewk@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com>
Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this pull request Oct 20, 2023
This commit fixes a bug introduced in #101178 that allows the optimizer
to generated inverted index scans on columns that are not filtered by
the query. For example, an inverted index over the column `j1` could be
scanned for a filter involving a different column, like `j2 = '5'`. The
bug is caused by a simple omission of code that must check that the
column in the filter is an indexed column.

Fixes #111963

There is no release note because this bug is not present in any
releases.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: inverted-index accelerate filters of the form: j = ... and j IN (...)
4 participants