New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: inverted-index accelerate filters of the form j = ... and j IN (...) #101178
Conversation
bors r+ |
bors r- |
This hasn't been reviewed yet. I think maybe you meant to bors the other PR? |
Ah my apologies. Changes have been made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)
-- commits
line 7 at r1:
I don't understand this last sentence
pkg/sql/opt/invertedidx/json_array.go
line 591 at r1 (raw file):
// on which an inverted index is constructed. Other entries inside of this // index, such as JSON arrays and objects, may consist of the element 1 present // within them. Since the encodings inside the index do not contain the position
nit: "may consist of the element 1 present within them" sounds a bit convoluted. I'd change this to "may contain the element 1 along with other elements" or something like that.
pkg/sql/logictest/testdata/logic_test/inverted_index
line 829 at r1 (raw file):
query T SELECT j FROM f@i WHERE j = '{"a": "a"}' ORDER BY k
this is a duplicate of the one above it
f93a42e
to
de6b2eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @rytaft !
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
Previously, rytaft (Rebecca Taft) wrote…
I don't understand this last sentence
Replaced this with the intended PR numbers.
pkg/sql/opt/invertedidx/json_array.go
line 591 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: "may consist of the element 1 present within them" sounds a bit convoluted. I'd change this to "may contain the element 1 along with other elements" or something like that.
I agree. Done.
pkg/sql/logictest/testdata/logic_test/inverted_index
line 829 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
this is a duplicate of the one above it
Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rytaft and @Shivs11)
Previously, Shivs11 (Shivam) wrote…
Replaced this with the intended PR numbers.
Did you push your latest changes? The last sentence looks incomplete: "This has been completed here:"
-- commits
line 2 at r2:
nit: Should "and" be "or" here?
-- commits
line 13 at r2:
I don't understand this last sentence. Are you suggesting that keys from the LHS were useful in creating inverted spans previously, but no longer are useful? Or simply that there are not keys on the LHS?
-- commits
line 21 at r2:
It's odd to write this with the negative perspective: "queries that do not use...". I think you can word it something like "for queries of this form..." to be less confusing.
pkg/sql/opt/invertedidx/json_array.go
line 437 at r2 (raw file):
fetch = false default: return nil
nit: I think you should return inverted.NonInvertedColExpression
here to match the comment for the function and the rest of the function.
pkg/sql/opt/invertedidx/json_array.go
line 562 at r2 (raw file):
} // extractJSONFetchValEqCondition extracts an InvertedExpression representing an
nit: incorrect function name
pkg/sql/opt/invertedidx/json_array.go
line 582 at r2 (raw file):
// For Equals expressions, we will generate the inverted expression for the // single object built from the keys and val. invertedExpr := getInvertedExprForJSONOrArrayIndexForContaining(ctx, evalCtx, val)
See my comment in the tests - I'm skeptical that we can use getInvertedExprForJSONOrArrayIndexForContaining
in this case.
pkg/sql/opt/xform/testdata/rules/select
line 2808 at r2 (raw file):
│ │ └── spans │ │ ├── ["7\x00\x01\x00", "7\x00\x01\x00"] │ │ └── ["7\x00\x03\x00\x01\x00", "7\x00\x03\x00\x01\x00"]
We should only have a single span here, right? AFAIK there's only one possible JSON path for the JSON value 1
. I think it's because we generating spans with getInvertedExprForJSONOrArrayIndexForContaining
. "Containment" is different than "equality", and I believe it generates two spans to handle the case when the JSON value is an array containing the value 1
.
pkg/sql/opt/invertedidx/json_array_test.go
line 1089 at r2 (raw file):
ok: true, tight: false, unique: false,
Ideally, this should be unique=true
, correct? Is it possible for the generated spans to scan two entries with the same PK? I don't think it should be, but maybe it's due to the use of "containment" spans not "equality" spans (see my other comments).
de6b2eb
to
246c1f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
Previously, mgartner (Marcus Gartner) wrote…
Did you push your latest changes? The last sentence looks incomplete: "This has been completed here:"
For some reason the PR numbers were not showing up on the log message. This has been fixed and the issue was about line wrapping the text to 72 columns.
Previously, mgartner (Marcus Gartner) wrote…
nit: Should "and" be "or" here?
I believe "and". The current PR adds support in two separate cases.
Previously, mgartner (Marcus Gartner) wrote…
I don't understand this last sentence. Are you suggesting that keys from the LHS were useful in creating inverted spans previously, but no longer are useful? Or simply that there are not keys on the LHS?
What I meant is the following: keys from the LHS were useful in creating inverted spans previously, but no longer are useful since there are none present this time around. Shall rephrase this, thank you.
Previously, mgartner (Marcus Gartner) wrote…
It's odd to write this with the negative perspective: "queries that do not use...". I think you can word it something like "for queries of this form..." to be less confusing.
Sure. Changes have been made.
pkg/sql/opt/invertedidx/json_array.go
line 437 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: I think you should return
inverted.NonInvertedColExpression
here to match the comment for the function and the rest of the function.
Hmm, okay. Changed.
pkg/sql/opt/invertedidx/json_array.go
line 562 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: incorrect function name
Resolved.
pkg/sql/opt/invertedidx/json_array.go
line 582 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
See my comment in the tests - I'm skeptical that we can use
getInvertedExprForJSONOrArrayIndexForContaining
in this case.
Resolved.
pkg/sql/opt/xform/testdata/rules/select
line 2808 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
We should only have a single span here, right? AFAIK there's only one possible JSON path for the JSON value
1
. I think it's because we generating spans withgetInvertedExprForJSONOrArrayIndexForContaining
. "Containment" is different than "equality", and I believe it generates two spans to handle the case when the JSON value is an array containing the value1
.
Was resolved offline. The decision, for now, is to use the function getInvertedExprForJSONOrArrayIndexForContaining
even though it is not the most optimum solution.
pkg/sql/opt/invertedidx/json_array_test.go
line 1089 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Ideally, this should be
unique=true
, correct? Is it possible for the generated spans to scan two entries with the same PK? I don't think it should be, but maybe it's due to the use of "containment" spans not "equality" spans (see my other comments).
Resolved offline.
246c1f1
to
6e4921a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
6e4921a
to
a09ddc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r1, 1 of 1 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @Shivs11)
Previously, Shivs11 (Shivam) wrote…
For some reason the PR numbers were not showing up on the log message. This has been fixed and the issue was about line wrapping the text to 72 columns.
I see. In a git commit message the #
character at the beginning of a line denotes a comment, so you have to be careful with #123
to issues and PRs.
pkg/sql/opt/xform/testdata/rules/select
line 2808 at r2 (raw file):
Previously, Shivs11 (Shivam) wrote…
Was resolved offline. The decision, for now, is to use the function
getInvertedExprForJSONOrArrayIndexForContaining
even though it is not the most optimum solution.
👍
pkg/sql/opt/xform/testdata/rules/select
line 2785 at r4 (raw file):
# applied again.
nit: remove extra new line
(...) An inverted index can be used to accelerate queries, using the = and the IN operator, with filters having the fetch value operator present; eg: j->0 = '{"b": "c"}' and j->0 IN ('1', '2') where j is a JSON column. This has been completed here: cockroachdb#96471 and cockroachdb#94666 This PR aims to add support to generate inverted spans for queries not involving the fetch val operator and having the `=` operator or the `IN` operator. This was done by creating JSON objects from the JSON values present on the right side of the equality or the IN expression. Previously, keys from the LHS were useful in creating inverted spans but are no longer useful since they are absent in this scenario. Epic: CRDB-3301 Fixes: cockroachdb#96658 Release note (performance improvement): The optimizer now plans inverted index scans for queries using `IN` or the `=` operators without the fetch val (`->`) operator. eg: json_col = '{"b":"c"}' OR json_col IN ('"a"', '1')
a09ddc8
to
85400d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review! @mgartner
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
bors r+ |
Build succeeded: |
This commit fixes a bug introduced in cockroachdb#101178 that allows the optimizer to generated inverted index scans on columns that are not filtered by the query. For example, an inverted index over the column `j1` could be scanned for a filter involving a different column, like `j2 = '5'`. The bug is caused by a simple omission of code that must check that the column in the filter is an indexed column. Fixes cockroachdb#111963 There is no release note because this bug is not present in any releases. Release note: None
This commit fixes a bug introduced in cockroachdb#101178 that allows the optimizer to generated inverted index scans on columns that are not filtered by the query. For example, an inverted index over the column `j1` could be scanned for a filter involving a different column, like `j2 = '5'`. The bug is caused by a simple omission of code that must check that the column in the filter is an indexed column. Fixes cockroachdb#111963 There is no release note because this bug is not present in any releases. Release note: None
111713: sql: fix nil-pointer error in local retry r=DrewKimball a=DrewKimball #### tree: return correct parse error for pg_lsn This patch changes the error returned upon failing to parse a PG_LSN value to match postgres. Previously, the error was an internal error. Informs #111327 Release note: None #### sql: fix nil-pointer error in local retry In #105451, we added logic to locally retry a distributed query after an error. However, the retry logic unconditionally updated a field of `DistSQLReceiver` that may be nil, which could cause a nil-pointer error in some code paths (e.g. apply-join). This patch adds a check that the field is non-nil, as is done for other places where it is updated. There is no release note because the change has not yet made it into a release. Fixes #111327 Release note: None 112654: opt: fix inverted index constrained scans for equality filters r=mgartner a=mgartner #### opt: fix inverted index constrained scans for equality filters This commit fixes a bug introduced in #101178 that allows the optimizer to generated inverted index scans on columns that are not filtered by the query. For example, an inverted index over the column `j1` could be scanned for a filter involving a different column, like `j2 = '5'`. The bug is caused by a simple omission of code that must check that the column in the filter is an indexed column. Fixes #111963 There is no release note because this bug is not present in any releases. Release note: None #### randgen: generate single-column indexes more often This commit makes `randgen` more likely to generate single-column indexes. It is motivated by the bug #111963, which surprisingly lived on the master branch for sixth months without being detected. It's not entirely clear why TLP or other randomized tests did not catch the bug, which has such a simple reproduction. One theory is that indexes tend to be multi-column and constrained scans on multi-column inverted indexes are not commonly planned for randomly generated queries because the set of requirements to generate the scan are very specific: the query must hold each prefix column constant, e.g. `a=1 AND b=2 AND j='5'::JSON`. The likelihood of randomly generating such an expression may be so low that the bug was not caught. By making 10% of indexes single-column, this bug may have been more likely to be caught because only the inverted index column needs to be constrained by an equality filter. Release note: None 112690: sql: disallow invocation of procedures outside of CALL r=mgartner a=mgartner #### sql: disallow invocation of procedures outside of CALL This commit adds some missing checks to ensure that procedures cannot be invoked in any context besides as the root expression in `CALL` statements. Epic: CRDB-25388 Release note: None #### sql: add tests with function invocation in procedure argument This commit adds a couple of tests that show that functions can be used in procedure argument expressions. Release note: None 112698: sql: clarify comments/naming of descriptorChanged flag r=rafiss a=rafiss fixes #110727 Release note: None 112701: sql/logictest: fix flakes in select_for_update_read_committed r=mgartner a=mgartner The `select_for_update_read_committed` tests were flaking because not all statements were being run under READ COMMITTED isolation. The logic test infrastructure does not allow fine-grained control of sessions, and setting the isolation level in one statement would only apply to a single session. Subsequent statements are not guaranteed to run in the same session because they could run in any session in the connection pool. This commit wraps each statement in an explicitly transaction with an explicit isolation level to ensure READ COMMITTED is used. In the future, we should investigate allowing fine-grained and explicit control of sessions in logic tests. Fixes #112677 Release note: None 112726: sql: make tests error if a leaf txn is not created when expected r=rharding6373 a=rharding6373 This adds a test-only error if a leaf transaction is expected to be used by a plan but a root transaction is used instead. Epic: none Informs: #111097 Release note: None 112767: log: fix stacktrace test goroutine counts r=rickystewart a=dhartunian Previously, we would use the count of the string `goroutine ` as a proxy for the number of goroutines in the stacktrace. This stopped working in go 1.21 due to this change: golang/go@51225f6 We should consider using a stacktrace parser in the future. Supports #112088 Epic: None Release note: None Co-authored-by: Drew Kimball <drewk@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com> Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
This commit fixes a bug introduced in #101178 that allows the optimizer to generated inverted index scans on columns that are not filtered by the query. For example, an inverted index over the column `j1` could be scanned for a filter involving a different column, like `j2 = '5'`. The bug is caused by a simple omission of code that must check that the column in the filter is an indexed column. Fixes #111963 There is no release note because this bug is not present in any releases. Release note: None
An inverted index can be used to accelerate queries, using the = and the IN operator,
with filters having the fetch value operator present;
eg: j->0 = '{"b": "c"}'
andj->0 IN ('1', '2')
where j is a JSON column.
This has been done completed here: #96471 and #94666
This PR aims to add support to generate inverted spans for queries not involving the
fetch val operator and having the
=
operator or theIN
operator. This was done bycreating JSON objects from the JSON values present on the right side of the equality or
the IN expression. Moreover, in this case, no "keys" were extracted from the left hand
side of the operators while creating these JSON objects which were useful for creating the
inverted spans.
Epic: CRDB-3301
Fixes: #96658
Release note (performance improvement): The optimizer now plans
inverted index scans for queries that do not use the JSON fetch value operator (
->
) alongsidethe
IN
and the=
operators. eg:json_col = '{"b": "c"}'
ORjson_col IN ('"a"', '1')