Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-4345: Fix wrong allocation of writer slice in case of modifying shared CTE #660

Merged
merged 17 commits into from
Jun 6, 2024

Conversation

trexxet2
Copy link

@trexxet2 trexxet2 commented Dec 11, 2023

Fix wrong allocation of writer slice in case of modifying shared CTE

Initially, when the CTE was first encountered, the create_ctescan_plan()
function constructed a subplan, and its slice was assigned the
GANGTYPE_PRIMARY_WRITER gang type. However, for subsequent occurrences of the
CTE, the subplan construction was bypassed. If these occurrences were located in
different slices and the CTE involved DML, their slices would sometimes be
assigned the GANGTYPE_PRIMARY_WRITER gang type. This lead to issues when the
ShareInputScans were assigned producer and consumer roles by the
apply_shareinput_dag_to_tree(), which uses a specific tree traversal order, and
in some cases, there was a scenario when the consumer was assigned to a
GANGTYPE_PRIMARY_WRITER slice, while the producer was placed in a
GANGTYPE_PRIMARY_READER slice, resulting in an execution error when the reader
gang attempted to execute a ModifyTable node.

To resolve this issue, the patch introduces a check for the gang type of
root->curSlice when creating the shared plan for the first time. If the gang
type has changed compared to its previous value during planning, the
ShareInputScan is marked as a writer slice maker by setting the
rootSliceIsWriter flag to true. This state is also applied to all subsequent
occurrences of the CTE. After assigning consumer and producer roles to each
ShareInputScan, the correct gang type is set for the producer
(GANGTYPE_PRIMARY_WRITER) and for the consumers (GANGTYPE_PRIMARY_READER, if the
consumer is in a different slice from the producer) within the
shareinput_mutator_xslice_2() function. This ensures that the gang type is
correctly assigned, preventing the execution error encountered in the original
scenario.

Co-authored-by: Alexander Kondakov <a.kondakov@arenadata.io>

When the shared CTE with modifying DML inside is planned, at the first
occurrence of the CTE create_ctescan_plan function constructs the subplan, so
it's slice gets the gang type GANGTYPE_PRIMARY_WRITER. For the other occurrences
the subplan construction is being skipped, so if they are located in other
slices, and the CTE is modifying, their slices may not have
GANGTYPE_PRIMARY_WRITER gang type. Later, the ShareInputScans are assigned the
producer and consumer roles by apply_shareinput_dag_to_tree function, which
uses tree walker with specific traverse order. There was a case when the
consumer was created at GANGTYPE_PRIMARY_WRITER slice, while the producer were
put to the GANGTYPE_PRIMARY_READER slice. This lead to execution error when the
reader gang attempted to execute ModifyTable node.

This patch fixed the issue the following way. When creating the shared plan at
the first time, the gangType of root->curSlice is checked. If it has switched
compared to the value before the planning, we mark ShareInputScan as writer
slice maker (the rootSliceIsWriter flag is set to true). At other occurences of
CTE their ShareInputScan nodes are also marked with this state. After the
consumer, producer roles are assigned to each ShareInputScan, we set the
correct gang type to the producer (GANGTYPE_PRIMARY_WRITER), and to the
consumers (GANGTYPE_PRIMARY_READER, if the consumer is in different slice from
producer) in shareinput_mutator_xslice_2() function.
src/backend/cdb/cdbmutate.c Outdated Show resolved Hide resolved
src/backend/cdb/cdbmutate.c Outdated Show resolved Hide resolved
src/backend/optimizer/plan/createplan.c Show resolved Hide resolved
src/backend/optimizer/plan/createplan.c Show resolved Hide resolved
src/include/nodes/pathnodes.h Show resolved Hide resolved
@Stolb27 Stolb27 changed the base branch from adb-7.0.0 to adb-7.1.0 March 1, 2024 12:37
@RekGRpth

This comment was marked as resolved.

@bimboterminator1
Copy link
Member

bimboterminator1 commented May 29, 2024

Why does analyze show never executed?

Probably because of the portal strategy:
#557

@bandetto
Copy link
Member

Why does analyze show never executed?

EXPLAIN (ANALYZE, SLICETABLE)
WITH cte AS (
        INSERT INTO with_test VALUES (1) RETURNING *
)
SELECT i FROM cte;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..0.07 rows=3 width=4) (actual time=9.510..9.514 rows=1 loops=1)
   ->  Insert on with_test  (cost=0.00..0.03 rows=1 width=4) (never executed)
         ->  Result  (cost=0.00..0.01 rows=1 width=4) (never executed)
 Optimizer: Postgres-based planner
 Planning Time: 5.980 ms
 Slice 0: Dispatcher; root 0; parent -1; gang size 0
 Slice 1: Primary Writer; root 0; parent 0; gang size 1
   (slice0)    Executor memory: 36K bytes.
   (slice1)    
 Memory used:  128000kB
 Execution Time: 10.138 ms
(11 rows)

(never executed) shows up even without the patch on working DML queries. With diff:

diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 251860682e..48b17d4ded 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -375,10 +375,12 @@ ChoosePortalStrategy(List *stmts)
 					pstmt->copyIntoClause == NULL &&
 					pstmt->refreshClause == NULL)
 				{
-					if (pstmt->hasModifyingCTE)
-						return PORTAL_ONE_MOD_WITH;
-					else
+					if (!pstmt->hasModifyingCTE)
 						return PORTAL_ONE_SELECT;
+					else if (Gp_role != GP_ROLE_EXECUTE)
+						return PORTAL_ONE_MOD_WITH;
+
+					return PORTAL_MULTI_QUERY;
 				}
 				if (pstmt->commandType == CMD_UTILITY)
 				{

Stats are back:

EXPLAIN (ANALYZE, SLICETABLE)
WITH cte AS (
        INSERT INTO with_test VALUES (1) RETURNING *
)
SELECT i FROM cte;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..0.07 rows=3 width=4) (actual time=2.382..2.386 rows=1 loops=1)
   ->  Insert on with_test  (cost=0.00..0.03 rows=1 width=4) (actual time=0.213..0.214 rows=1 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.012..0.012 rows=1 loops=1)
 Optimizer: Postgres-based planner
 Planning Time: 26.801 ms
 Slice 0: Dispatcher; root 0; parent -1; gang size 0
 Slice 1: Primary Writer; root 0; parent 0; gang size 1
   (slice0)    Executor memory: 36K bytes.
   (slice1)    Executor memory: 38K bytes (seg1).
 Memory used:  128000kB
 Execution Time: 3.348 ms
(11 rows)

Please look at this patch:

Probably because of the portal strategy:
#557

@bandetto bandetto changed the title ADBDEV-4345 Fix wrong allocation of writer slice in case of modifying shared CTE ADBDEV-4345: Fix wrong allocation of writer slice in case of modifying shared CTE Jun 5, 2024
@andr-sokolov andr-sokolov merged commit 908a851 into adb-7.1.0 Jun 6, 2024
5 checks passed
@andr-sokolov andr-sokolov deleted the ADBDEV-4345 branch June 6, 2024 07:09
KnightMurloc pushed a commit that referenced this pull request Jul 4, 2024
…660)

Initially, when the CTE was first encountered, the create_ctescan_plan()
function constructed a subplan, and its slice was assigned the
GANGTYPE_PRIMARY_WRITER gang type. However, for subsequent occurrences of the
CTE, the subplan construction was bypassed. If these occurrences were located in
different slices and the CTE involved DML, their slices would sometimes be
assigned the GANGTYPE_PRIMARY_WRITER gang type. This lead to issues when the
ShareInputScans were assigned producer and consumer roles by the
apply_shareinput_dag_to_tree(), which uses a specific tree traversal order, and
in some cases, there was a scenario when the consumer was assigned to a
GANGTYPE_PRIMARY_WRITER slice, while the producer was placed in a
GANGTYPE_PRIMARY_READER slice, resulting in an execution error when the reader
gang attempted to execute a ModifyTable node.

To resolve this issue, the patch introduces a check for the gang type of
root->curSlice when creating the shared plan for the first time. If the gang
type has changed compared to its previous value during planning, the
ShareInputScan is marked as a writer slice maker by setting the
rootSliceIsWriter flag to true. This state is also applied to all subsequent
occurrences of the CTE. After assigning consumer and producer roles to each
ShareInputScan, the correct gang type is set for the producer
(GANGTYPE_PRIMARY_WRITER) and for the consumers (GANGTYPE_PRIMARY_READER, if the
consumer is in a different slice from the producer) within the
shareinput_mutator_xslice_2() function. This ensures that the gang type is
correctly assigned, preventing the execution error encountered in the original
scenario.

Co-authored-by: Alexander Kondakov <a.kondakov@arenadata.io>
KnightMurloc pushed a commit that referenced this pull request Jul 11, 2024
…660)

Initially, when the CTE was first encountered, the create_ctescan_plan()
function constructed a subplan, and its slice was assigned the
GANGTYPE_PRIMARY_WRITER gang type. However, for subsequent occurrences of the
CTE, the subplan construction was bypassed. If these occurrences were located in
different slices and the CTE involved DML, their slices would sometimes be
assigned the GANGTYPE_PRIMARY_WRITER gang type. This lead to issues when the
ShareInputScans were assigned producer and consumer roles by the
apply_shareinput_dag_to_tree(), which uses a specific tree traversal order, and
in some cases, there was a scenario when the consumer was assigned to a
GANGTYPE_PRIMARY_WRITER slice, while the producer was placed in a
GANGTYPE_PRIMARY_READER slice, resulting in an execution error when the reader
gang attempted to execute a ModifyTable node.

To resolve this issue, the patch introduces a check for the gang type of
root->curSlice when creating the shared plan for the first time. If the gang
type has changed compared to its previous value during planning, the
ShareInputScan is marked as a writer slice maker by setting the
rootSliceIsWriter flag to true. This state is also applied to all subsequent
occurrences of the CTE. After assigning consumer and producer roles to each
ShareInputScan, the correct gang type is set for the producer
(GANGTYPE_PRIMARY_WRITER) and for the consumers (GANGTYPE_PRIMARY_READER, if the
consumer is in a different slice from the producer) within the
shareinput_mutator_xslice_2() function. This ensures that the gang type is
correctly assigned, preventing the execution error encountered in the original
scenario.

Co-authored-by: Alexander Kondakov <a.kondakov@arenadata.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants