forked from greenplum-db/gpdb-archive
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fallback to PO when a CTE consumer under hazard motion is found #302
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hughcapet
force-pushed
the
ADBDEV-2411-2
branch
3 times, most recently
from
February 11, 2022 07:14
f3ae5d6
to
8630664
Compare
maksm90
reviewed
Feb 16, 2022
src/backend/gporca/libgpopt/src/translate/CTranslatorExprToDXL.cpp
Outdated
Show resolved
Hide resolved
src/backend/gporca/libgpopt/src/translate/CTranslatorExprToDXL.cpp
Outdated
Show resolved
Hide resolved
InnerLife0
reviewed
Feb 22, 2022
src/backend/gporca/libgpopt/src/translate/CTranslatorExprToDXL.cpp
Outdated
Show resolved
Hide resolved
InnerLife0
reviewed
Feb 24, 2022
hughcapet
force-pushed
the
ADBDEV-2411-2
branch
2 times, most recently
from
February 24, 2022 07:56
be50421
to
4f35f81
Compare
InnerLife0
previously approved these changes
Feb 24, 2022
InnerLife0
previously approved these changes
Feb 28, 2022
maksm90
reviewed
Mar 10, 2022
This commit introduces a temporary fix for greenplum-db/gpdb#13039 by forcing a fallback to Postgres Optimizer in the Expr2DXL translation, when a CTE Consumer without the necessary CTE Producer is found under a duplicate-hazard motion. Orca currently doesn't handle CTEs over replicated tables properly: while translating Expressions to DXL it simply cuts off all but one input segment (changes the Motion subtree's ExecLocalityType from EeltSegments to EeltSingleton) for a Motion, that has a child with Strict/Tainted Replicated/Universal distribution without taking into account the possible CTE Consumers underneath. This current logic actually violates results of the check previously made in CUtils::ValidateCTEProducerConsumerLocality(). This can result in a hanging query because of the broken Producer-Consumer locality.
hughcapet
force-pushed
the
ADBDEV-2411-2
branch
from
March 11, 2022 07:19
4048f57
to
517aa03
Compare
maksm90
approved these changes
Mar 11, 2022
InnerLife0
approved these changes
Mar 15, 2022
Stolb27
pushed a commit
that referenced
this pull request
Jun 7, 2023
…#302) This commit introduces a temporary fix for greenplum-db/gpdb#13039 by forcing a fallback to Postgres Optimizer in the Expr2DXL translation, when a CTE Consumer without the necessary CTE Producer is found under a duplicate-hazard motion. Orca currently doesn't handle CTEs over replicated tables properly: while translating Expressions to DXL it simply cuts off all but one input segment (changes the Motion subtree's ExecLocalityType from EeltSegments to EeltSingleton) for a Motion, that has a child with Strict/Tainted Replicated/Universal distribution without taking into account the possible CTE Consumers underneath. This current logic actually violates results of the check previously made in CUtils::ValidateCTEProducerConsumerLocality(). This can result in a hanging query because of the broken Producer-Consumer locality. Cherry-picked from: 51fe92e to solve conflicts with 68a9baf and incorporate formatting modifications from #490
5 tasks
HustonMmmavr
added a commit
that referenced
this pull request
Jul 17, 2023
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e)
Stolb27
pushed a commit
that referenced
this pull request
Jul 19, 2023
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e)
Stolb27
pushed a commit
that referenced
this pull request
Jul 26, 2023
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: 2de3b73 to reapply above 781663c
Stolb27
pushed a commit
that referenced
this pull request
Oct 2, 2023
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478
Stolb27
pushed a commit
that referenced
this pull request
Mar 4, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a)
Stolb27
pushed a commit
that referenced
this pull request
Mar 14, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a)
red1452
pushed a commit
that referenced
this pull request
May 28, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
red1452
pushed a commit
that referenced
this pull request
May 28, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
red1452
pushed a commit
that referenced
this pull request
May 29, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
red1452
pushed a commit
that referenced
this pull request
May 30, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
red1452
pushed a commit
that referenced
this pull request
May 30, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
red1452
pushed a commit
that referenced
this pull request
May 30, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
red1452
pushed a commit
that referenced
this pull request
May 31, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
Stolb27
pushed a commit
that referenced
this pull request
May 31, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
red1452
pushed a commit
that referenced
this pull request
Jun 3, 2024
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Orca currently doesn't handle CTEs over replicated tables properly:
while translating Expressions to DXL it simply cuts off all but one input segment
(changes the Motion subtree's ExecLocalityType from EeltSegments to EeltSingleton)
for a Motion, that has a child with Strict/Tainted Replicated/Universal distribution
without taking into account the possible CTE Consumers underneath.
This current logic actually violates results of the check previously made in
CUtils::ValidateCTEProducerConsumerLocality().
This can result in a hanging query because of the broken Producer-Consumer locality.
See: greenplum-db#13039 and ADBDEV-2351 ticket
This PR introduces a temporary fix for this issue by forcing a fallback
to Postgres Optimizer in the Expr2DXL translation, when a CTE Consumer without
the necessary CTE Producer is found under a duplicate-hazard motion.