Skip to content

ADBDEV-2885-3: Postgres planner produces bogus plan for query to replicated table with SIRV function#378

Closed
RekGRpth wants to merge 5 commits into
adb-6.x-devfrom
ADBDEV-2885-3
Closed

ADBDEV-2885-3: Postgres planner produces bogus plan for query to replicated table with SIRV function#378
RekGRpth wants to merge 5 commits into
adb-6.x-devfrom
ADBDEV-2885-3

Conversation

@RekGRpth

@RekGRpth RekGRpth commented Aug 23, 2022

Copy link
Copy Markdown
Member

Query with subplan containing volatile functions on distributed replicated tables does not make motion gather. This produces wrong plan and leads to segfault.
Steps to reproduce this problem:

create table t (i int) distributed replicated;
insert into t select 1;
create function f(i int) returns int language sql security definer as $$ select i; $$;
select (select f(i) from t); -- <-- segfault here!!!
select (select f(i) from t group by f(i)); -- <-- segfault here!!!
select (select i from t group by i having f(i) > 0); -- <-- segfault here!!!

Wrong plans:

explain (costs off, verbose) SELECT (select f(i) from t);
             QUERY PLAN              
-------------------------------------
 Result
   Output: $0
   InitPlan 1 (returns $0)
     ->  Seq Scan on public.t
           Output: f(i)
 Optimizer: Postgres query optimizer
(6 rows)
explain (costs off, verbose) SELECT (select f(i) from t group by f(i));
             QUERY PLAN              
-------------------------------------
 Result
   Output: $0
   InitPlan 1 (returns $0)
     ->  HashAggregate
           Output: (f(i))
           Group Key: f(t.i)
           ->  Seq Scan on public.t
                 Output: i, f(i)
 Optimizer: Postgres query optimizer
(9 rows)
explain (costs off, verbose) SELECT (select i from t group by i having f(i) > 0);
             QUERY PLAN              
-------------------------------------
 Result
   Output: $0
   InitPlan 1 (returns $0)
     ->  HashAggregate
           Output: i
           Group Key: t.i
           Filter: (f(t.i) > 0)
           ->  Seq Scan on public.t
                 Output: i
 Optimizer: Postgres query optimizer
(10 rows)

Solving problem includes #355 and make motion gather for such subplan included replication tables with volatile functions.
Right plans:

explain (costs off, verbose) SELECT (select f(i) from t);
                    QUERY PLAN                    
--------------------------------------------------
 Result
   Output: $0
   InitPlan 1 (returns $0)  (slice2)
     ->  Gather Motion 1:1  (slice1; segments: 1)
           Output: (f(i))
           ->  Seq Scan on public.t
                 Output: f(i)
 Optimizer: Postgres query optimizer
(8 rows)
explain (costs off, verbose) SELECT (select f(i) from t group by f(i));
                    QUERY PLAN                    
--------------------------------------------------
 Result
   Output: $0
   InitPlan 1 (returns $0)  (slice2)
     ->  Gather Motion 1:1  (slice1; segments: 1)
           Output: (f(i))
           ->  HashAggregate
                 Output: (f(i))
                 Group Key: f(t.i)
                 ->  Seq Scan on public.t
                       Output: i, f(i)
 Optimizer: Postgres query optimizer
(11 rows)
explain (costs off, verbose) SELECT (select i from t group by i having f(i) > 0);
                    QUERY PLAN                    
--------------------------------------------------
 Result
   Output: $0
   InitPlan 1 (returns $0)  (slice2)
     ->  Gather Motion 1:1  (slice1; segments: 1)
           Output: i
           ->  HashAggregate
                 Output: i
                 Group Key: t.i
                 Filter: (f(t.i) > 0)
                 ->  Seq Scan on public.t
                       Output: i
 Optimizer: Postgres query optimizer
(12 rows)

Andrey Sokolov and others added 4 commits July 10, 2022 22:37
…sequence generation

The last query from the series
"set optimizer=off;
create table t_replicate_dst(id serial, i integer) distributed replicated;
create table t_replicate_src(i integer) distributed replicated;
insert into t_replicate_src select i from generate_series(1, 5) i;
insert into t_replicate_dst (i) select i from t_replicate_src;
select distinct id from gp_dist_random('t_replicate_dst');"
returned 15 lines (not 5) on cluster with 3 segments.
The plan of the second insert was:
 Insert on t_replicate_dst
   ->  Seq Scan on t_replicate_src

The error is not reproduced on master, because when volatile functions are
detected in the query at the stage of adding the Insert node,
the Broadcast Motion node will be added. The plan on master is:
 Insert on t_replicate_dst
   ->  Broadcast Motion 1:3  (slice1; segments: 1)
         ->  Seq Scan on t_replicate_src

After adding a check for the absence of volatile functions to the condition for
refusing to insert the Broadcast Motion node, the query plan became the same as
on master.
@RekGRpth RekGRpth marked this pull request as ready for review August 23, 2022 04:17
@RekGRpth RekGRpth requested a review from a team August 23, 2022 04:18
@RekGRpth RekGRpth changed the title ADBDEV-2885-3: Postgres planner produces bogus plan for query to replicated table with SIRV function ADBDEV-2885: Postgres planner produces bogus plan for query to replicated table with SIRV function Aug 23, 2022
Output: $0
InitPlan 1 (returns $0) (slice2)
-> Gather Motion 1:1 (slice1; segments: 1)
Output: i

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RekGRpth, can you explain how this patch makes this plan correct? At first glance, there is no volatile function in the target list.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since make_subplan does not set CdbLocusType_SingleQE and FLOW_SINGLETON for CdbLocusType_SegmentGeneral, then in ParallelizeSubplan value of containingPlanDistributed became false and plan is focused.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containingPlanDistributed is calculated for outer for subplan plan. subPlanDistributed described subplan distribution.

@Stolb27

Stolb27 commented Aug 24, 2022

Copy link
Copy Markdown
Collaborator

The current solution breaks the next queries:

create table t (i int) distributed replicated;
create table t1 (a int);
explain select * from t1 where a in (select random() from t where i=a);

Plan without the patch:

 Gather Motion 3:1  (slice2; segments: 3)
   ->  Seq Scan on t1
         Filter: (SubPlan 1)
         SubPlan 1  (slice2; segments: 3)
           ->  Result
                 Filter: (t.i = t1.a)
                 ->  Materialize
                       ->  Broadcast Motion 1:3  (slice1; segments: 1)
                             ->  Seq Scan on t

Plan with patch:

 Gather Motion 3:1  (slice1; segments: 3)
   ->  Seq Scan on t1
         Filter: (SubPlan 1)
         SubPlan 1  (slice1; segments: 1)
           ->  Result
                 Filter: (t.i = t1.a)
                 ->  Materialize
                       ->  Seq Scan on t

One more similar correlated subquery, that not worked properly and without patch:

test=# explain (costs off) select * from t1 where a in (select random() from t where i=a group by i);
                  QUERY PLAN                   
-----------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Seq Scan on t1
         Filter: (SubPlan 1)
         SubPlan 1  (slice1; segments: 1)
           ->  GroupAggregate
                 Group Key: t.i
                 ->  Result
                       Filter: (t.i = t1.a)
                       ->  Materialize
                             ->  Seq Scan on t

@Stolb27

Stolb27 commented Aug 24, 2022

Copy link
Copy Markdown
Collaborator

Example of broken query with UNcorrelated subquery:

test=# explain (costs off, verbose)select 1 as mrs_t1 from t1 where 1 <= ALL (select i from t group by i having random() > 0);
                         QUERY PLAN                         
------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: 1
   ->  Result
         Output: 1
         One-Time Filter: (SubPlan 1)
         ->  Seq Scan on public.t1
               Output: t1.a
         SubPlan 1  (slice1; segments: 1)
           ->  HashAggregate
                 Output: t.i
                 Group Key: t.i
                 Filter: (random() > '0'::double precision)
                 ->  Seq Scan on public.t
                       Output: t.i
 Optimizer: Postgres query optimizer
 Settings: enable_bitmapscan=off, optimizer=off

As we discussed a few days ago, we should handle all broadcastPlan usages properly.

@Stolb27

Stolb27 commented Aug 25, 2022

Copy link
Copy Markdown
Collaborator

Query with subplan containing volatile functions on distributed replicated tables does not make motion gather. This produces wrong plan and leads to segfault.

Why does query with subplan containing volatile functions on distributed replicated tables not make motion gather? Why is it incorrect to replace SegmentGeneral locus with SingleQE on top of subplan? Where will the presence of volatile functions be checked after your patch?

  1. Provide a complete commit message for your patch, too.
  2. Change base branch to adb-6.x-dev for this PR according to instruction.
  3. Update your PR with changes from adb-6.x-dev (e.g. using GitHub UI)

@RekGRpth RekGRpth changed the base branch from adb-6.x to adb-6.x-dev August 25, 2022 03:07
@RekGRpth

Copy link
Copy Markdown
Member Author

2. Change base branch to adb-6.x-dev for this PR according to instruction.

done

@RekGRpth

Copy link
Copy Markdown
Member Author

3. Update your PR with changes from adb-6.x-dev (e.g. using GitHub UI)

done

@RekGRpth RekGRpth changed the title ADBDEV-2885: Postgres planner produces bogus plan for query to replicated table with SIRV function ADBDEV-2885-3: Postgres planner produces bogus plan for query to replicated table with SIRV function Aug 25, 2022
@RekGRpth RekGRpth marked this pull request as draft August 25, 2022 03:22
@RekGRpth

Copy link
Copy Markdown
Member Author

explain select * from t1 where a in (select random() from t where i=a);

it covers by first version of #365

@RekGRpth

Copy link
Copy Markdown
Member Author

explain (costs off, verbose)select 1 as mrs_t1 from t1 where 1 <= ALL (select i from t group by i having random() > 0);

and it covers by same #365

@RekGRpth

RekGRpth commented Aug 25, 2022

Copy link
Copy Markdown
Member Author

explain (costs off) select * from t1 where a in (select random() from t where i=a group by i);

yes, but I think it is other issue than segfault? And even #355 does not solve this

@Stolb27

Stolb27 commented Aug 25, 2022

Copy link
Copy Markdown
Collaborator

it covers by first version of #365
and it covers by same #365

Besides that fact, that #365 implement focus_plan and changes only focusPlan function, but affects broadcastPlan somehow (Broadcast Motion created).

The main problem of existing implementation is using locus SingleQE that semantically doesn't fit replicated table with volatile function case (there is no data on QD in this case):

    CdbLocusType_SingleQE,      /* a single backend process on any db: the
                                 * qDisp itself, or a qExec started by a
                                 * segment postmaster or the entry postmaster.
                                 */

And you suggest adding one more kludge to this behavior in #365.

yes, but I think it is other issue than segfault? And even #355 does not solve this

Maybe. BUT, the main reason for this problem is altering locus to SingleQE only for topmost node in subplans in make_subplan function. So, with #365 we will need one more kluge in the same place later.

@RekGRpth

Copy link
Copy Markdown
Member Author

Besides that fact, that #365 implement focus_plan and changes only focusPlan function, but affects broadcastPlan somehow (Broadcast Motion created).

No, it does not does not affect broadcast! And it does not break these two cases!

@Stolb27

Stolb27 commented Aug 25, 2022

Copy link
Copy Markdown
Collaborator

No, it does not does not affect broadcast! And it does not break these two cases!

My bad. It's still work because topmost subplan node has SingleQE locus before broadcastPlan. But remaining comments are actual.

@RekGRpth

Copy link
Copy Markdown
Member Author

It's still work because topmost subplan node has SingleQE locus before broadcastPlan.

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants