Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start erroring out for unsupported lateral subqueries #5753

Merged
merged 4 commits into from
Mar 11, 2022

Conversation

JelteF
Copy link
Contributor

@JelteF JelteF commented Mar 2, 2022

DESCRIPTION: Start erroring out for unsupported lateral subqueries

With the introduction of #4385 we inadvertently started allowing and
pushing down certain lateral subqueries that were unsafe to push down.
To be precise the type of LATERAL subqueries that is unsafe to push down
has all of the following properties:

  1. The lateral subquery contains some non recurring tuples
  2. The lateral subquery references a recurring tuple from
    outside of the subquery (recurringRelids)
  3. The lateral subquery requires a merge step (e.g. a LIMIT)
  4. The reference to the recurring tuple should be something else than an
    equality check on the distribution column, e.g. equality on a non
    distribution column.

Property number four is considered both hard to detect and probably not
used very often. Thus this PR ignores property number four and causes
query planning to error out if the first three properties hold.

Fixes #5327

@JelteF JelteF changed the title Error out for unsupported lateral subqueries Start erroring out for unsupported lateral subqueries Mar 2, 2022
continue;
}

/* TODO: What about others kinds? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool		lateral;		/* subquery, function, or values is LATERAL? */

From RangeTblEntry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to consider something like LEFT JOIN LATERAL fn((SELECT count(*) FROM dist)) ON (true)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, we replace FUNCTIONs in the JOIN trees with subqueries automatically: https://github.com/citusdata/citus/blob/master/src/backend/distributed/planner/recursive_planning.c#L1588

But seems to ignore LATERALS: https://github.com/citusdata/citus/blob/master/src/backend/distributed/planner/recursive_planning.c#L1803-L1812

So, definitely worth testing but I wouldn't mind if it fails

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is incorrectly allowed:

 CREATE OR REPLACE FUNCTION fn(x int)
RETURNS int
AS $$
DECLARE
BEGIN
        RETURN x + 1;
END;
$$ LANGUAGE plpgsql;

SELECT count(*)
FROM ref JOIN test on ref.b = test.x,
    LATERAL fn((
        SELECT
            test_2.x
        FROM test test_2
        WHERE
            test_2.x = test.x
        LIMIT 1
    )) q
;

Copy link
Contributor Author

@JelteF JelteF Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's not just this new code that doesn't handle it. It doesn't even get to the DeferredErrorIfUnsupportedLateralSubquery check. This is because RelationInfoContainsOnlyRecurringTuples is not handling this correctly. I discussed with Onder and the following query is also allowed incorrectly, which even uses a LEFT JOIN with a reference table on the left side.

SELECT count(*)
FROM ref LEFT JOIN GREATEST(
        0,
        (SELECT
            test.y
        FROM test
        WHERE
            test.y = ref.a
        LIMIT 1)
    ) ON TRUE;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the left join also returns incorrect results:

SELECT count(*)
FROM ref,
    LATERAL GREATEST(
        0,
        (SELECT
            test.y
        FROM test
        WHERE
            test.y = ref.a
        LIMIT 1)
    );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one that returns wrong results:

SELECT count(*)
FROM (select GREATEST(1, 3)) ref(a),
    LATERAL (
        SELECT
            test.y
        FROM test
        WHERE
            test.y = ref.a
        LIMIT 2
    ) q;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference: Our brief check with Jelte showed that there are SubPlans and they do not show up in plannerInfo->simple_rte_array[relationId]; So, we might need to include the Rtes of subplans on all the places we rely on plannerInfo->simple_rte_array[relationId];

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some elaboration on the TODO with this context would be helpful

@JelteF JelteF force-pushed the lateral-join-with-limit branch 12 times, most recently from 4a475b8 to 0a54f54 Compare March 4, 2022 15:32
@JelteF JelteF enabled auto-merge (squash) March 4, 2022 15:34
@JelteF JelteF disabled auto-merge March 4, 2022 16:04
plannerInfo, innerrelRelids);
bool outerrelOnlyRecurring = RelationInfoContainsOnlyRecurringTuples(
plannerInfo, outerrelRelids);
if (innerrelOnlyRecurring && !outerrelOnlyRecurring)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might error on , LATERAL (SELECT .. ) type queries, which work fine so we probably don't want to break those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's this query that looks fine, but also fails

SELECT count(*)
FROM test,
    LATERAL (
        SELECT
            test_2.y
        FROM test test_2
        WHERE
            test_2.x = test.x
        LIMIT 2
    ) q
    JOIN ref ON q.y = ref.a;

@metdos metdos added this to the 11.0 Release milestone Mar 7, 2022
@JelteF JelteF force-pushed the lateral-join-with-limit branch 3 times, most recently from bb047f8 to 989761c Compare March 7, 2022 15:11
@codecov
Copy link

codecov bot commented Mar 7, 2022

Codecov Report

Merging #5753 (989761c) into master (3f72cda) will decrease coverage by 0.12%.
The diff coverage is 84.61%.

❗ Current head 989761c differs from pull request most recent head 4af45fd. Consider uploading reports for the commit 4af45fd to get more accurate results

@@            Coverage Diff             @@
##           master    #5753      +/-   ##
==========================================
- Coverage   92.70%   92.58%   -0.13%     
==========================================
  Files         228      226       -2     
  Lines       47865    47426     -439     
==========================================
- Hits        44372    43907     -465     
- Misses       3493     3519      +26     

context, flags);
context->level -= 1;

return found;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to cover this code?

continue;
}

/* TODO: What about others kinds? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some elaboration on the TODO with this context would be helpful

@JelteF JelteF enabled auto-merge (squash) March 11, 2022 08:05
@JelteF
Copy link
Contributor Author

JelteF commented Mar 11, 2022

Merging this now so it's included for release testing. I'll add some more coverage somewhere next week.

With the introduction of #4385 we inadvertently started allowing and
pushing down certain lateral subqueries that were unsafe to push down.
To be precise the type of LATERAL subqueries that is unsafe to push down
has all of the following properties:
1. The subquery references a recurring tuple (e.g. a reference table) from
   outside of the subquery
2. The subquery requires a merge step (e.g. a LIMIT)
3. The reference to the recurring tuple should be something else than an
   equality check on the distribution column, e.g. equality on a non
   distribution column.

Property number three is considered both hard to detect and probably not
used very often. Thus this PR ignores property number three and causes
query planning to error out if the first two properties hold.

Fixes #5327
@JelteF JelteF merged commit e5d5c7b into master Mar 11, 2022
@JelteF JelteF deleted the lateral-join-with-limit branch March 11, 2022 10:59
@onderkalaci
Copy link
Member

should we backport this to 10.2?

@marcocitus
Copy link
Member

No, some queries might start erroring out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LATERAL JOIN with reference table on left side and LIMIT and/or OFFSET returns incorrect results
4 participants