Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: inverted-index accelerate queries with filters in the form json->'a' IN (...) #87856

Closed
Tracked by #59331
mgartner opened this issue Sep 12, 2022 · 0 comments · Fixed by #96471
Closed
Tracked by #59331

opt: inverted-index accelerate queries with filters in the form json->'a' IN (...) #87856

mgartner opened this issue Sep 12, 2022 · 0 comments · Fixed by #96471
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects

Comments

@mgartner
Copy link
Collaborator

mgartner commented Sep 12, 2022

For a query like SELECT * FROM t WHERE json_col->'field' IN ('"val1"', '"val2"'), the optimizer should explore using an inverted index scan on an INVERTED INDEX (json_col), if it one exists.

We should also handle filters like json_col->0 IN (...).

Epic: CRDB-24501
Jira issue: CRDB-19574

@mgartner mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Sep 12, 2022
@mgartner mgartner added this to Triage in SQL Queries via automation Sep 12, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 12, 2022
@mgartner mgartner moved this from Triage to Backlog in SQL Queries Sep 12, 2022
Shivs11 added a commit to Shivs11/cockroach that referenced this issue Feb 2, 2023
…>'field' IN (...)

Previously, the optimizer did not plan inverted index scans for queries consisting
of the IN operator.

To address this, we now build JSON arrays from fetch value expressions consisting
of both integers and strings. Each entity present inside of the tuple following the
IN operator has it's own seperate JSON array being created. From these JSON arrays,
inverted spans are built. Each inverted span generated is OR'ed together to replicate
the functionality of the IN operator. In other words, a filter of the form:
j->0 IN ('1', '2') is thought of as (j->0 = '1') OR (j->0 = '2') and the generated
inverted span replicates this behaviour.

Epic: CRDB-3301
Fixes: cockroachdb#87856

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->), consisting of both string and integer index values,
alongside the IN operator, e.g, json_col->0 IN ('1', '2') and
json_col->'a' IN ('1', '2')
Shivs11 added a commit to Shivs11/cockroach that referenced this issue Feb 6, 2023
…>'field' IN (...)

Previously, the optimizer did not plan inverted index scans for queries consisting
of the IN operator.

To address this, we now build JSON objects from fetch value expressions consisting
of both integers and strings. Each entity present inside of the tuple following the
IN operator has its own seperate JSON array being created. From these JSON arrays,
inverted spans are built. Each inverted span generated is OR'ed together to replicate
the functionality of the IN operator. In other words, a filter of the form:
j->0 IN ('1', '2') is thought of as (j->0 = '1') OR (j->0 = '2') and the generated
inverted span replicates this behaviour.

Epic: CRDB-3301
Fixes: cockroachdb#87856

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->), consisting of both string and integer index values,
alongside the IN operator, e.g, json_col->0 IN ('1', '2') and
json_col->'a' IN ('1', '2')
Shivs11 added a commit to Shivs11/cockroach that referenced this issue Feb 7, 2023
…>'field' IN (...)

Previously, the optimizer did not plan inverted index scans for queries consisting
of the IN operator.

To address this, we now build JSON objects from fetch value expressions consisting
of both integers and strings. Each entity present inside of the tuple following the
IN operator has its own seperate JSON array being created. From these JSON arrays,
inverted spans are built. Each inverted span generated is OR'ed together to replicate
the functionality of the IN operator. In other words, a filter of the form:
j->0 IN ('1', '2') is thought of as (j->0 = '1') OR (j->0 = '2') and the generated
inverted span replicates this behaviour.

Epic: CRDB-3301
Fixes: cockroachdb#87856

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->), consisting of both string and integer index values,
alongside the IN operator, e.g, json_col->0 IN ('1', '2') and
json_col->'a' IN ('1', '2')
Shivs11 added a commit to Shivs11/cockroach that referenced this issue Feb 8, 2023
…>'field' IN (...)

Previously, the optimizer did not plan inverted index scans for queries consisting
of the IN operator.

To address this, we now build JSON objects from fetch value expressions consisting
of both integers and strings. Each entity present inside of the tuple following the
IN operator has its own seperate JSON array being created. From these JSON arrays,
inverted spans are built. Each inverted span generated is OR'ed together to replicate
the functionality of the IN operator. In other words, a filter of the form:
j->0 IN ('1', '2') is thought of as (j->0 = '1') OR (j->0 = '2') and the generated
inverted span replicates this behaviour.

Epic: CRDB-3301
Fixes: cockroachdb#87856

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->), consisting of both string and integer index values,
alongside the IN operator, e.g, json_col->0 IN ('1', '2') and
json_col->'a' IN ('1', '2')
Shivs11 added a commit to Shivs11/cockroach that referenced this issue Feb 9, 2023
…>'field' IN (...)

Previously, the optimizer did not plan inverted index scans for queries consisting
of the IN operator.

To address this, we now build JSON objects from fetch value expressions consisting
of both integers and strings. Each entity present inside of the tuple following the
IN operator has its own seperate JSON array being created. From these JSON arrays,
inverted spans are built. Each inverted span generated is OR'ed together to replicate
the functionality of the IN operator. In other words, a filter of the form:
j->0 IN ('1', '2') is thought of as (j->0 = '1') OR (j->0 = '2') and the generated
inverted span replicates this behaviour.

Epic: CRDB-3301
Fixes: cockroachdb#87856

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->), consisting of both string and integer index values,
alongside the IN operator, e.g, json_col->0 IN ('1', '2') and
json_col->'a' IN ('1', '2')
@rytaft rytaft moved this from Backlog to Active in SQL Queries Feb 14, 2023
Shivs11 added a commit to Shivs11/cockroach that referenced this issue Feb 14, 2023
…>'field' IN (...)

Previously, the optimizer did not plan inverted index scans for queries consisting
of the IN operator.

To address this, we now build JSON objects from fetch value expressions consisting
of both integers and strings. Each entity present inside of the tuple following the
IN operator has its own seperate JSON array being created. From these JSON arrays,
inverted spans are built. Each inverted span generated is OR'ed together to replicate
the functionality of the IN operator. In other words, a filter of the form:
j->0 IN ('1', '2') is thought of as (j->0 = '1') OR (j->0 = '2') and the generated
inverted span replicates this behaviour.

Epic: CRDB-3301
Fixes: cockroachdb#87856

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->), consisting of both string and integer index values,
alongside the IN operator, e.g, json_col->0 IN ('1', '2') and
json_col->'a' IN ('1', '2')
craig bot pushed a commit that referenced this issue Feb 14, 2023
96471: opt: inverted-index accelerate queries with filters in the form `json_col ->'field' IN (...)` r=Shivs11 a=Shivs11

Previously, the optimizer did not plan inverted index scans for queries consisting of the IN operator.

To address this, we now build JSON arrays from fetch value expressions consisting of both integers and strings. Each entity present inside of the tuple following the IN operator has it's own seperate JSON array being created. From these JSON arrays, inverted spans are built. Each inverted span generated is `OR'ed` together to replicate the functionality of the `IN` operator. In other words, a filter of the form: `j->0 IN ('1', '2')` is thought of as `(j->0 = '1') OR (j->0 = '2')` and the generated inverted span replicates this behaviour.

Epic: CRDB-3301
Fixes: #87856

Release note (performance improvement): The optimizer now plans inverted index scans for queries that filter by JSON fetch value operators (->), consisting of both string and integer index values, alongside the IN operator, e.g, `json_col->0 IN ('1', '2')` and `json_col->'a' IN ('1', '2')`

96696: sql: handle implicit record return types better r=rharding6373 a=rharding6373

This PR validates the UDF return type at build time if it is a user-defined type. If the return type is no longer compatible with what the UDF body returns, we return an error instead. This is more in line with postgres behavior.

Fixes: 95558
Epic: CRDB-19496
Release note (sql change): UDFs with implicit record return types will return an error when called if the return type has been altered and is no longer compatible with the body of the UDF.

96698: sql: allow UDFs with SETOF return types r=mgartner a=mgartner

#### distsql: check projectSetNode exprs for distribution recommendation

Previously, the scalar expressions within a `projectSetNode` were not
checked for their ability to be distributed. This commit fixes this.

Release note: None

#### sql: use `eval.ValueGenerator` for evaluation of routines

This commit creates a new implementation of the `eval.ValueGenerator`
that is used to evaluate `tree.Routine` expressions. This will allow for
a more unified implementation of both scalar and set-returning UDFs in a
future commit.

Release note: None

#### sql: allow UDFs with SETOF return types

Set-returning UDFs can now be created by specifying a `SETOF` return
type. For example:

    CREATE FUNCTION fn() RETURNS SETOF INT LANGUAGE SQL AS $$
      SELECT a FROM abc;
    $$

Fixes #86391

Release note (sql change): Set-returning user-defined functions can now
be created by specific a `SETOF` return type.


Co-authored-by: Shivam Saraf <shivam.saraf@cockroachlabs.com>
Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
@craig craig bot closed this as completed in d5e25c3 Feb 14, 2023
SQL Queries automation moved this from Active to Done Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
2 participants