Skip to content

Fix segfault on execution of multilevel correlated queries.#204

Merged
Stolb27 merged 2 commits into6.16.3_arenadata22from
ADBDEV-1729
Jul 9, 2021
Merged

Fix segfault on execution of multilevel correlated queries.#204
Stolb27 merged 2 commits into6.16.3_arenadata22from
ADBDEV-1729

Conversation

@InnerLife0
Copy link

Execution of multilevel correlated queries with high level of nesting can cause segfault(when using array_agg, json_agg) or can provide wrong results (when using classic aggs like sum()). Due to some GP limitations, correlated subqueries with skip-level correlations are not supported. Additional check condition is provided to prevent such queries from planning. QueryHasDistributedRelation function, used by this check, doesn't recurse over subplans and may return wrong results for distributed RTE_RELATION entries hided by RTE_SUBQUERY entries.
Commit fixes such behavior by adding optional recursion to QueryHasDistributedRelation function. Additional regression test is included. Additional information can be found at issue #12054.

Execution of multilevel correlated queries with high level of nesting can cause segfault(when using array_agg, json_agg) or can provide wrong results (when using classic aggs like sum()). Due to some GP limitations, correlated subqueries with skip-level correlations are not supported. Additional check condition is provided to prevent such queries from planning. QueryHasDistributedRelation function, used by this check, doesn't recurse over subplans and may return wrong results for distributed RTE_RELATION entries hided by RTE_SUBQUERY entries.
Commit fixes such behavior by adding optional recursion to QueryHasDistributedRelation function. Additional regression test is included. Additional information can be found at issue #12054.
@InnerLife0
Copy link
Author

Still waiting pivotal for some GP limitations explanation.

@InnerLife0
Copy link
Author

Got explanation and PR pre-approval from pivotal.

…l regression test. Unnecessary space removed.

(cherry picked from commit 0cc2fb5)
@InnerLife0
Copy link
Author

Cherry picked additional changes proposed by pivotal developer.

Copy link

@darthunix darthunix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we treat this PR as a hot fix, then LGTM. But I agree with @guofengrichard - we should think how to build a correct plan for correlated subqueries with SubPlans instead of InitPlans (surely, it is a separate task). @Stolb27 what do you think?

@Stolb27
Copy link
Collaborator

Stolb27 commented Jun 28, 2021

what do you think?

@darthunix, I've tried to reproduce the mentioned case on 6X_STABLE and got the next results.

test=# set optimizer = off;                                                                                                                                                                     
SET
test=# explain (costs off) select * from a where a.i in (select j from b where b.j = (select count(*) from c where c.i = a.i));                                                                 
ERROR:  correlated subquery with skip-level correlations is not supported
test=# set optimizer=on;
SET
test=# explain (costs off) select * from a where a.i in (select j from b where b.j = (select count(*) from c where c.i = a.i));
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Result
   Filter: (SubPlan 2)
   ->  Gather Motion 3:1  (slice1; segments: 3)
         ->  Seq Scan on a
   SubPlan 2  (slice0)
     ->  Result
           Filter: (b.j = (SubPlan 1))
           ->  Materialize
                 ->  Gather Motion 3:1  (slice2; segments: 3)
                       ->  Seq Scan on b
           SubPlan 1  (slice0)
             ->  Aggregate
                   ->  Result
                         Filter: (c.i = a.i)
                         ->  Materialize
                               ->  Gather Motion 3:1  (slice3; segments: 3)
                                     ->  Seq Scan on c
 Optimizer: Pivotal Optimizer (GPORCA)
(18 rows)

There are correlated SubPlans only. Can you explain for me mentioned problem?

@darthunix
Copy link

@Stolb27 it is legal to use nested correlated SubPlans - we have only a problem with InitPlans. It is good that Orca can build a correct plan with SubPlans instead of InitPlans (but I have no idea, does ORCA always use SubPlans instead of InitPlans where needed). Current PR fixes only Postgres optimizer.

@Stolb27 Stolb27 changed the base branch from adb-6.x to 6.16.3_arenadata22 July 9, 2021 08:29
@Stolb27 Stolb27 merged commit a70454d into 6.16.3_arenadata22 Jul 9, 2021
@Stolb27 Stolb27 deleted the ADBDEV-1729 branch July 9, 2021 08:41
InnerLife0 pushed a commit that referenced this pull request Aug 6, 2021
Execution of multilevel correlated queries with high level of nesting can cause segfault(when using array_agg, json_agg) or can provide wrong results (when using classic aggs like sum()). Due to some GP limitations, correlated subqueries with skip-level correlations are not supported. Additional check condition is provided to prevent such queries from planning. QueryHasDistributedRelation function, used by this check, doesn't recurse over subplans and may return wrong results for distributed RTE_RELATION entries hided by RTE_SUBQUERY entries.
Commit fixes such behavior by adding optional recursion to QueryHasDistributedRelation function. Additional regression test is included. Additional information can be found at issue #12054.

(cherry picked from commit a70454d)
InnerLife0 added a commit that referenced this pull request Aug 17, 2021
…#204) (#239)

Execution of multilevel correlated queries with high level of nesting can cause segfault(when using array_agg, json_agg) or can provide wrong results (when using classic aggs like sum()). Due to some GP limitations, correlated subqueries with skip-level correlations are not supported. Additional check condition is provided to prevent such queries from planning. QueryHasDistributedRelation function, used by this check, doesn't recurse over subplans and may return wrong results for distributed RTE_RELATION entries hided by RTE_SUBQUERY entries.
Commit fixes such behavior by adding optional recursion to QueryHasDistributedRelation function. Additional regression test is included. Additional information can be found at issue #12054.

(cherry picked from commit a70454d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants