Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create execution plan for correlated subqueries with JOINs #14671

Closed
proddata opened this issue Sep 9, 2023 · 1 comment · Fixed by #14946
Closed

Can't create execution plan for correlated subqueries with JOINs #14671

proddata opened this issue Sep 9, 2023 · 1 comment · Fixed by #14946
Assignees
Labels
bug Clear identification of incorrect behaviour

Comments

@proddata
Copy link
Member

proddata commented Sep 9, 2023

CrateDB version

5.4

CrateDB setup information

No response

Problem description

When trying to connect to Beekeeper Studio the connection fails. This seems to be related to a 🐛 connected to correlated subqueries.

the query Beekeeper is trying to run

SELECT      n.nspname as schema, t.typname as typename, t.oid::int4 as typeid
      FROM        pg_type t
      LEFT JOIN   pg_catalog.pg_namespace n ON n.oid = t.typnamespace
      WHERE       (t.typrelid = 0 OR (SELECT c.relkind = 'c' FROM pg_catalog.pg_class c WHERE c.oid = t.typrelid))
      AND     NOT EXISTS(SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem AND el.typarray = t.oid)
      AND     n.nspname NOT IN ('pg_catalog', 'information_schema');

fails with

Couldn't create execution plan from logical plan because of: Couldn't find (SELECT 1 FROM (el)) in SourceSymbols{inputs={typelem=INPUT(4), typnamespace=INPUT(2), nspname=INPUT(5), oid=INPUT(1), typrelid=INPUT(3), typname=INPUT(0), oid=INPUT(6)}, nonDeterministicFunctions={}}:
Eval[nspname AS schema, typname AS typename, oid AS typeid] (rows=0)
  └ Filter[((typrelid = 0) OR (SELECT (relkind = 'c') FROM (c)))] (rows=0)
    └ CorrelatedJoin[typname, oid, typnamespace, typrelid, typelem, nspname, oid, (SELECT (relkind = 'c') FROM (c)), (SELECT 1 FROM (el))] (rows=0)
      └ CorrelatedJoin[typname, oid, typnamespace, typrelid, typelem, nspname, oid, (SELECT (relkind = 'c') FROM (c))] (rows=0)
        └ Filter[(NOT EXISTS (SELECT 1 FROM (el)))] (rows=0)
          └ HashJoin[(oid = typnamespace)] (rows=unknown)
            ├ Rename[typname, oid, typnamespace, typrelid, typelem] AS t (rows=unknown)
            │  └ Collect[pg_catalog.pg_type | [typname, oid, typnamespace, typrelid, typelem] | true] (rows=unknown)
            └ Rename[nspname, oid] AS n (rows=unknown)
              └ Collect[pg_catalog.pg_namespace | [nspname, oid] | (NOT (nspname = ANY(['pg_catalog', 'information_schema'])))] (rows=unknown)
        └ SubPlan
          └ Eval[(relkind = 'c')] (rows=unknown)
            └ Rename[(relkind = 'c')] AS c (rows=unknown)
              └ Limit[2::bigint;0::bigint] (rows=unknown)
                └ Collect[pg_catalog.pg_class | [(relkind = 'c')] | (oid = typrelid)] (rows=unknown)
      └ SubPlan
        └ Eval[1] (rows=unknown)
          └ Rename[1] AS el (rows=unknown)
            └ Limit[1;0] (rows=unknown)
              └ Collect[pg_catalog.pg_type | [1] | ((oid = typelem) AND (typarray = oid))] (rows=unknown)

Steps to Reproduce

CREATE TABLE t01 (a TEXT);
CREATE TABLE t02 (b TEXT);
SELECT *
    FROM t01
    JOIN t02 ON t01.a = t02.b
    WHERE NOT EXISTS (SELECT 1 FROM t02 WHERE t01.a = t02.b);

Actual Result

SQLParseException[Couldn't create execution plan from logical plan because of: Couldn't find (SELECT 1 FROM (doc.t02)) in SourceSymbols{inputs={b=INPUT(1), a=INPUT(0)}, nonDeterministicFunctions={}}:
Eval[a, b] (rows=0)
  └ CorrelatedJoin[a, b, (SELECT 1 FROM (doc.t02))] (rows=0)
    └ Filter[(NOT EXISTS (SELECT 1 FROM (doc.t02)))] (rows=0)
      └ HashJoin[(a = b)] (rows=unknown)
        ├ Collect[doc.t01 | [a] | true] (rows=unknown)
        └ Collect[doc.t02 | [b] | true] (rows=unknown)
    └ SubPlan
      └ Eval[1] (rows=unknown)
        └ Limit[1;0] (rows=unknown)
          └ Collect[doc.t02 | [1] | (b = a)] (rows=unknown)]

Expected Result

query works

@proddata proddata added the triage An issue that needs to be triaged by a maintainer label Sep 9, 2023
@matriv matriv self-assigned this Sep 21, 2023
@matriv matriv added bug Clear identification of incorrect behaviour and removed triage An issue that needs to be triaged by a maintainer labels Sep 26, 2023
@matriv
Copy link
Contributor

matriv commented Oct 9, 2023

If we change this line: https://github.com/crate/crate/blob/master/server/src/main/java/io/crate/planner/operators/LogicalPlanner.java#L399
with:

if (splitPoints.outerColumns().isEmpty())

we get a correct plan involving 2 joins but still execution plan cannot be created as the SelectSymbol modeling SELECT 1 FROM t02... is not used for join outputs therefore is not added to InputColumns and we end up not finding it in the InputColumns#visitSelectSymbol().

  • I've tried to debug and artificially add it but this didn't solve the issue as the Exists operator ends up receiving single value instead of array and thus failing.
  • I've tried to artificially return a Literal.of(new ArrayType(DataTypes.Integer), List.of(1) when the lookup in visitSelectSymbols fails (returns null), and we end up with a valid execution plan but then we get wrong results, as this SELECT 1 should return a value only if there is at least one row returned by the t01, t02 join, otherwise 0 rows, in order to satisfy correctly the EXISTS operator.

In my opinion we should just throw a nicer error for the time being, stating that this kind of correlated join is not supported, and implement this as a feature.

@mkleen mkleen self-assigned this Oct 9, 2023
@matriv matriv removed their assignment Oct 18, 2023
@mkleen mkleen removed their assignment Oct 23, 2023
@mfussenegger mfussenegger self-assigned this Oct 23, 2023
mfussenegger added a commit that referenced this issue Oct 23, 2023
In a query like:

    SELECT
        n.nspname AS schema,
        t.typname AS typename,
        t.oid::int4 AS typeid
    FROM
        pg_type t
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE
        EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem);

The filter `EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid =
t.typelem)` was placed in-between correlated-join and join:

    └ CorrelatedJoin[typname, oid, typnamespace, typelem, nspname, oid, (SELECT 1 FROM (el))] (rows=0)
      └ Filter[EXISTS (SELECT 1 FROM (el))] (rows=0)
        └ NestedLoopJoin[LEFT | (oid = typnamespace)] (rows=unknown)

Given that the filter uses the output of the `CorrelatedJoin`, it must
be placed above it, not below.

Fixes #14671
mfussenegger added a commit that referenced this issue Oct 23, 2023
In a query like:

    SELECT
        n.nspname AS schema,
        t.typname AS typename,
        t.oid::int4 AS typeid
    FROM
        pg_type t
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE
        EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem);

The filter `EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid =
t.typelem)` was placed in-between correlated-join and join:

    └ CorrelatedJoin[typname, oid, typnamespace, typelem, nspname, oid, (SELECT 1 FROM (el))] (rows=0)
      └ Filter[EXISTS (SELECT 1 FROM (el))] (rows=0)
        └ NestedLoopJoin[LEFT | (oid = typnamespace)] (rows=unknown)

Given that the filter uses the output of the `CorrelatedJoin`, it must
be placed above it, not below.

Fixes #14671
mfussenegger added a commit that referenced this issue Oct 23, 2023
In a query like:

    SELECT
        n.nspname AS schema,
        t.typname AS typename,
        t.oid::int4 AS typeid
    FROM
        pg_type t
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE
        EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem);

The filter `EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid =
t.typelem)` was placed in-between correlated-join and join:

    └ CorrelatedJoin[typname, oid, typnamespace, typelem, nspname, oid, (SELECT 1 FROM (el))] (rows=0)
      └ Filter[EXISTS (SELECT 1 FROM (el))] (rows=0)
        └ NestedLoopJoin[LEFT | (oid = typnamespace)] (rows=unknown)

Given that the filter uses the output of the `CorrelatedJoin`, that led
to an error:

    Couldn't create execution plan from logical plan because of: Couldn't find (SELECT 1 FROM (el)) in SourceSymbols

The filter must be placed above the correlated join, not below.

Fixes #14671
@mergify mergify bot closed this as completed in #14946 Oct 23, 2023
mergify bot pushed a commit that referenced this issue Oct 23, 2023
In a query like:

    SELECT
        n.nspname AS schema,
        t.typname AS typename,
        t.oid::int4 AS typeid
    FROM
        pg_type t
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE
        EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem);

The filter `EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid =
t.typelem)` was placed in-between correlated-join and join:

    └ CorrelatedJoin[typname, oid, typnamespace, typelem, nspname, oid, (SELECT 1 FROM (el))] (rows=0)
      └ Filter[EXISTS (SELECT 1 FROM (el))] (rows=0)
        └ NestedLoopJoin[LEFT | (oid = typnamespace)] (rows=unknown)

Given that the filter uses the output of the `CorrelatedJoin`, that led
to an error:

    Couldn't create execution plan from logical plan because of: Couldn't find (SELECT 1 FROM (el)) in SourceSymbols

The filter must be placed above the correlated join, not below.

Fixes #14671
mergify bot pushed a commit that referenced this issue Oct 23, 2023
In a query like:

    SELECT
        n.nspname AS schema,
        t.typname AS typename,
        t.oid::int4 AS typeid
    FROM
        pg_type t
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE
        EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem);

The filter `EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid =
t.typelem)` was placed in-between correlated-join and join:

    └ CorrelatedJoin[typname, oid, typnamespace, typelem, nspname, oid, (SELECT 1 FROM (el))] (rows=0)
      └ Filter[EXISTS (SELECT 1 FROM (el))] (rows=0)
        └ NestedLoopJoin[LEFT | (oid = typnamespace)] (rows=unknown)

Given that the filter uses the output of the `CorrelatedJoin`, that led
to an error:

    Couldn't create execution plan from logical plan because of: Couldn't find (SELECT 1 FROM (el)) in SourceSymbols

The filter must be placed above the correlated join, not below.

Fixes #14671

(cherry picked from commit fec2966)
mergify bot pushed a commit that referenced this issue Oct 23, 2023
In a query like:

    SELECT
        n.nspname AS schema,
        t.typname AS typename,
        t.oid::int4 AS typeid
    FROM
        pg_type t
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE
        EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem);

The filter `EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid =
t.typelem)` was placed in-between correlated-join and join:

    └ CorrelatedJoin[typname, oid, typnamespace, typelem, nspname, oid, (SELECT 1 FROM (el))] (rows=0)
      └ Filter[EXISTS (SELECT 1 FROM (el))] (rows=0)
        └ NestedLoopJoin[LEFT | (oid = typnamespace)] (rows=unknown)

Given that the filter uses the output of the `CorrelatedJoin`, that led
to an error:

    Couldn't create execution plan from logical plan because of: Couldn't find (SELECT 1 FROM (el)) in SourceSymbols

The filter must be placed above the correlated join, not below.

Fixes #14671

(cherry picked from commit fec2966)
mergify bot pushed a commit that referenced this issue Oct 23, 2023
In a query like:

    SELECT
        n.nspname AS schema,
        t.typname AS typename,
        t.oid::int4 AS typeid
    FROM
        pg_type t
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE
        EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem);

The filter `EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid =
t.typelem)` was placed in-between correlated-join and join:

    └ CorrelatedJoin[typname, oid, typnamespace, typelem, nspname, oid, (SELECT 1 FROM (el))] (rows=0)
      └ Filter[EXISTS (SELECT 1 FROM (el))] (rows=0)
        └ NestedLoopJoin[LEFT | (oid = typnamespace)] (rows=unknown)

Given that the filter uses the output of the `CorrelatedJoin`, that led
to an error:

    Couldn't create execution plan from logical plan because of: Couldn't find (SELECT 1 FROM (el)) in SourceSymbols

The filter must be placed above the correlated join, not below.

Fixes #14671

(cherry picked from commit fec2966)
mergify bot pushed a commit that referenced this issue Oct 23, 2023
In a query like:

    SELECT
        n.nspname AS schema,
        t.typname AS typename,
        t.oid::int4 AS typeid
    FROM
        pg_type t
        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
    WHERE
        EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid = t.typelem);

The filter `EXISTS (SELECT 1 FROM pg_catalog.pg_type el WHERE el.oid =
t.typelem)` was placed in-between correlated-join and join:

    └ CorrelatedJoin[typname, oid, typnamespace, typelem, nspname, oid, (SELECT 1 FROM (el))] (rows=0)
      └ Filter[EXISTS (SELECT 1 FROM (el))] (rows=0)
        └ NestedLoopJoin[LEFT | (oid = typnamespace)] (rows=unknown)

Given that the filter uses the output of the `CorrelatedJoin`, that led
to an error:

    Couldn't create execution plan from logical plan because of: Couldn't find (SELECT 1 FROM (el)) in SourceSymbols

The filter must be placed above the correlated join, not below.

Fixes #14671

(cherry picked from commit fec2966)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Clear identification of incorrect behaviour
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants