New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: incorrect joins to inverted expression indexes on json equality #111963
Comments
Hi @michae2, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Looks like this wasn't present in any releases prior to 23.2, adding release-blocker. |
One approach to diagnosing this is to |
Great catch @michae2. How did you discover this one? I'm surprised that randomized tests haven't caught this—maybe there is a hole in our inverted expression index testing. |
I've bisected this to 85400d3. |
This one is pretty bad. We'll scan an inverted index on a column that's not even filtered in the query:
As I work on fixing this, I'll try to figure out why our randomized testing did not catch this. |
This commit fixes a bug introduced in cockroachdb#101178 that allows the optimizer to generated inverted index scans on columns that are not filtered by the query. For example, an inverted index over the column `j1` could be scanned for a filter involving a different column, like `j2 = '5'`. The bug is caused by a simple omission of code that must check that the column in the filter is an indexed column. Fixes cockroachdb#111963 There is no release note because this bug is not present in any releases. Release note: None
This commit makes `randgen` more likely to generate single-column indexes. It is motivated by the bug cockroachdb#111963, which surprisingly lived on the master branch for sixth months without being detected. It's not entirely clear why TLP or other randomized tests did not catch the bug, which has such a simple reproduction. One theory is that indexes tend to be multi-column and constrained scans on multi-column inverted indexes are not commonly planned for randomly generated queries because the set of requirements to generate the scan are very specific: the query must hold each prefix column constant, e.g. `a=1 AND b=2 AND j='5'::JSON`. The likelihood of randomly generating such an expression may be so low that the bug was not caught. By making 10% of indexes single-column, this bug may have been more likely to be caught because only the inverted index column needs to be constrained by an equality filter. Release note: None
This commit makes `randgen` more likely to generate single-column indexes. It is motivated by the bug cockroachdb#111963, which surprisingly lived on the master branch for sixth months without being detected. It's not entirely clear why TLP or other randomized tests did not catch the bug, which has such a simple reproduction. One theory is that indexes tend to be multi-column and constrained scans on multi-column inverted indexes are not commonly planned for randomly generated queries because the set of requirements to generate the scan are very specific: the query must hold each prefix column constant, e.g. `a=1 AND b=2 AND j='5'::JSON`. The likelihood of randomly generating such an expression may be so low that the bug was not caught. By making 50% of indexes single-column, this bug may have been more likely to be caught because only the inverted index column needs to be constrained by an equality filter. Release note: None
This commit makes `randgen` more likely to generate single-column indexes. It is motivated by the bug cockroachdb#111963, which surprisingly lived on the master branch for sixth months without being detected. It's not entirely clear why TLP or other randomized tests did not catch the bug, which has such a simple reproduction. One theory is that indexes tend to be multi-column and constrained scans on multi-column inverted indexes are not commonly planned for randomly generated queries because the set of requirements to generate the scan are very specific: the query must hold each prefix column constant, e.g. `a=1 AND b=2 AND j='5'::JSON`. The likelihood of randomly generating such an expression may be so low that the bug was not caught. By making 50% of indexes single-column, this bug may have been more likely to be caught because only the inverted index column needs to be constrained by an equality filter. Release note: None
111713: sql: fix nil-pointer error in local retry r=DrewKimball a=DrewKimball #### tree: return correct parse error for pg_lsn This patch changes the error returned upon failing to parse a PG_LSN value to match postgres. Previously, the error was an internal error. Informs #111327 Release note: None #### sql: fix nil-pointer error in local retry In #105451, we added logic to locally retry a distributed query after an error. However, the retry logic unconditionally updated a field of `DistSQLReceiver` that may be nil, which could cause a nil-pointer error in some code paths (e.g. apply-join). This patch adds a check that the field is non-nil, as is done for other places where it is updated. There is no release note because the change has not yet made it into a release. Fixes #111327 Release note: None 112654: opt: fix inverted index constrained scans for equality filters r=mgartner a=mgartner #### opt: fix inverted index constrained scans for equality filters This commit fixes a bug introduced in #101178 that allows the optimizer to generated inverted index scans on columns that are not filtered by the query. For example, an inverted index over the column `j1` could be scanned for a filter involving a different column, like `j2 = '5'`. The bug is caused by a simple omission of code that must check that the column in the filter is an indexed column. Fixes #111963 There is no release note because this bug is not present in any releases. Release note: None #### randgen: generate single-column indexes more often This commit makes `randgen` more likely to generate single-column indexes. It is motivated by the bug #111963, which surprisingly lived on the master branch for sixth months without being detected. It's not entirely clear why TLP or other randomized tests did not catch the bug, which has such a simple reproduction. One theory is that indexes tend to be multi-column and constrained scans on multi-column inverted indexes are not commonly planned for randomly generated queries because the set of requirements to generate the scan are very specific: the query must hold each prefix column constant, e.g. `a=1 AND b=2 AND j='5'::JSON`. The likelihood of randomly generating such an expression may be so low that the bug was not caught. By making 10% of indexes single-column, this bug may have been more likely to be caught because only the inverted index column needs to be constrained by an equality filter. Release note: None 112690: sql: disallow invocation of procedures outside of CALL r=mgartner a=mgartner #### sql: disallow invocation of procedures outside of CALL This commit adds some missing checks to ensure that procedures cannot be invoked in any context besides as the root expression in `CALL` statements. Epic: CRDB-25388 Release note: None #### sql: add tests with function invocation in procedure argument This commit adds a couple of tests that show that functions can be used in procedure argument expressions. Release note: None 112698: sql: clarify comments/naming of descriptorChanged flag r=rafiss a=rafiss fixes #110727 Release note: None 112701: sql/logictest: fix flakes in select_for_update_read_committed r=mgartner a=mgartner The `select_for_update_read_committed` tests were flaking because not all statements were being run under READ COMMITTED isolation. The logic test infrastructure does not allow fine-grained control of sessions, and setting the isolation level in one statement would only apply to a single session. Subsequent statements are not guaranteed to run in the same session because they could run in any session in the connection pool. This commit wraps each statement in an explicitly transaction with an explicit isolation level to ensure READ COMMITTED is used. In the future, we should investigate allowing fine-grained and explicit control of sessions in logic tests. Fixes #112677 Release note: None 112726: sql: make tests error if a leaf txn is not created when expected r=rharding6373 a=rharding6373 This adds a test-only error if a leaf transaction is expected to be used by a plan but a root transaction is used instead. Epic: none Informs: #111097 Release note: None 112767: log: fix stacktrace test goroutine counts r=rickystewart a=dhartunian Previously, we would use the count of the string `goroutine ` as a proxy for the number of goroutines in the stacktrace. This stopped working in go 1.21 due to this change: golang/go@51225f6 We should consider using a stacktrace parser in the future. Supports #112088 Epic: None Release note: None Co-authored-by: Drew Kimball <drewk@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com> Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
This commit fixes a bug introduced in #101178 that allows the optimizer to generated inverted index scans on columns that are not filtered by the query. For example, an inverted index over the column `j1` could be scanned for a filter involving a different column, like `j2 = '5'`. The bug is caused by a simple omission of code that must check that the column in the filter is an indexed column. Fixes #111963 There is no release note because this bug is not present in any releases. Release note: None
This commit makes `randgen` more likely to generate single-column indexes. It is motivated by the bug #111963, which surprisingly lived on the master branch for sixth months without being detected. It's not entirely clear why TLP or other randomized tests did not catch the bug, which has such a simple reproduction. One theory is that indexes tend to be multi-column and constrained scans on multi-column inverted indexes are not commonly planned for randomly generated queries because the set of requirements to generate the scan are very specific: the query must hold each prefix column constant, e.g. `a=1 AND b=2 AND j='5'::JSON`. The likelihood of randomly generating such an expression may be so low that the bug was not caught. By making 50% of indexes single-column, this bug may have been more likely to be caught because only the inverted index column needs to be constrained by an equality filter. Release note: None
Inverted indexes on JSON can be used both for (a) matching within the value and (b) matching the entire value. For some reason, though, it appears that inverted indexes on JSON expressions can only be used for (a) matching within the value and not (b) matching the entire value. And more alarming, sometimes we're constructing the inverted join to the inverted expression index incorrectly. Here's a demonstration using
v23.2.0-alpha.2-dev
:Jira issue: CRDB-32156
The text was updated successfully, but these errors were encountered: