Conversation
In DXL to PlStmt translation ORCA decides whether to translate a motion node
into a redistribute motion or a result hash filter node. In the case of tainted
replicated the wrong translation can lead to incorrect results.
Following demonstrates the issue of result hash filter node:
```
CREATE TABLE replicated_table(v VARCHAR) DISTRIBUTED REPLICATED;
CREATE TABLE distributed_table(id BIGSERIAL, v VARCHAR) DISTRIBUTED BY (id);
EXPLAIN (COSTS OFF) INSERT INTO distributed_table (v) SELECT v FROM replicated_table;
QUERY PLAN
------------------------------------------------
Insert on distributed_table
-> Result
-> Result
-> Seq Scan on replicated_table
Optimizer: Pivotal Optimizer (GPORCA)
```
In this plan the replicated table is scanned on each segment and generates a
unique sequence value. Since the sequence value is unique across all segments
we now have a tainted replicated distribution. And because each tuple on all
segments is unique, we not guaranteed to generate a single row in the
replicated table that hashes to exactly one segment. Instead we can get missing
and even duplicate rows.
Fix is to force any motion produced because of a tainted replicated
distribution to translate into a redistribute motion during DXL to PlStmt
translation. Now we will generate the following plan:
```
QUERY PLAN
------------------------------------------------------------------------------
Insert on distributed_table
-> Result
-> Redistribute Motion 1:3 (slice1; segments: 1)
Hash Key: (nextval('distributed_table__id_seq'::regclass))
-> Seq Scan on replicated_table
Optimizer: Pivotal Optimizer (GPORCA)
```
Reported-by: Oleg Skrobuk <oskrobuk@gmail.com>
Fixes #12219
(cherry picked from commit 717df62)
In case of insert replicated distribution into randomly distributed table we must ensure there is a random/redistribute motion to prevent skew to a single segment. This patch builds on commit d14e61d by adding the necessary logic to handle tainted and strict replicated distributions. Co-authored-by: Bhuvnesh Chaudhary <bchaudhary@vmware.com>
Consider the below example:
EXPLAIN WITH CTE AS
(SELECT c::int8, b FROM badestimate1)
SELECT 1 FROM CTE WHERE c <> 1 OR c = 1 AND b <> 101;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.13 rows=2 width=4)
-> Result (cost=0.00..431.13 rows=1 width=4)
-> Result (cost=0.00..431.13 rows=1 width=1)
Filter: ((c::bigint) <> 1 OR ((c::bigint) = 1 AND b <> 101)) AND ((c::bigint) <> 1 OR (c::bigint) = 1)
-> Result (cost=0.00..431.13 rows=1 width=12)
-> Table Scan on badestimate1 (cost=0.00..431.08 rows=3334 width=8)
Optimizer status: PQO version 3.116.0
This histogram for c::int8 is not well defined, so while merging the
histograms due to other predicates on c::int8, we cannot create a
correctly defined histogram. But currently, we still merge 2 not well
defined histograms and generate a well defined histogram in
CHistogram::MakeUnionHistogramNormalize.
This commits adds the condition to not merge the histograms if the
histograms are not well defined.
Co-authored-by: Bhuvnesh Chaudhary <bchaudhary@vmware.com>
(cherry picked from commit 8e7b0fc)
The branch as it stood was dead code due to a prior mismerge. Co-authored-by: Kalen Krempely <kkrempely@vmware.com>
This ensures that when pg_dump is invoked in binary_upgrade mode against a 5X cluster during pg_upgrade, it collects and dumps the functions under pg_catalog belonging to an extension. We found this bug when we tried to upgrade PXF and PXF's functions did not get dumped in binary upgrade mode. (PXF's functions are created under pg_catalog) This change in 6X should have accompanied the backport of the extension machinery into 5X. Co-authored-by: Kalen Krempely <kkrempely@vmware.com>
The typo was introduced in bdad273 Co-authored-by: Kalen Krempely <kkrempely@vmware.com>
After optimizing with the "natural" distribution spec of a CTE producer, we try to translate the query's distribution spec to the column of the producer and do another round of optimization. This could lead to incorrect results when the query had a distribution spec (including equivalent distribution specs) that came from multiple CTE consumers, with some of these columns being equivalent because of "=" predicates between them. For example: with cte as (select a,b from foo where b<10) select * from cte x1 join cte x2 on x1.a=x2.b On the query side, columns x1.a and x2.b are equivalent, but we should NOT treat columns a and b of the producer as equivalent.
Authored-by: Bradford D. Boyle <bradfordb@vmware.com>
-t and -f flags are deprecated and gpcheckperf kept printing NOTICE: -t is deprecated, and has no effect NOTICE: -f is deprecated, and has no effect Co-authored-by: M Hari Krishna hmaddileti@vmware.com
When writing server log, GPDB always writes out the statement string and ignores the GUC log_min_error_statement. So fix it.
gpinitsystem currently creates backout script on coordinator and on each segment, and then appends all the segment backout scripts to the coordinator backout script and we ask the user to run only that script. If some of the gpcreateseg fail, the segment backout scripts aren't appended correctly in some cases, and the user has to manually cleanup those directories. This change creates a single backout script early in gpinitsystem run, so even if anything fails later, we would always be able to cleanup all segment directories. To simplify, we are using 'echo' to add lines into backout script instead of using BACKOUT_COMMAND Co-authored-by: Divyesh Vanjare <vanjared@vmware.com> (cherry picked from commit b0493ce)
…ported tarball. The bytecode is compiled and installed during package installation. Vendor pkgconfig files for libzstd, libuv, quicklz. Authored-by: Brent Doil <bdoil@vmware.com>
We should check unique hostnames instead of checking for hostname associated with each dbid. Each hostname has multiple dbids associated with it. In larger clusters it led to 100s of liveness checks and failed with filedescriptor out of range in select() error. In future we should move from `select` to `poll`.
… in Orca In CPhysicalInnerHashJoin::PppsRequired, we support 2 different partitioning requests: 1 with DPE and 1 without DPE. Currently, we only generate 1 partitioning request by default, but generate 2 if the join order GUC is set to greedy/mincard/query. This was done to allow for more alternatives without increasing the optimization time significantly. Now, we generate 2 requests for the greedy xform regardless of the GUC setting. We saw that this helps generate some lower cost alternatives with a small increase in optimization time (~8%). If we always generated 2 requests, the optimization time would increase by a much larger amount. Some optimization time performance numbers for running the set of tpcds queries with EXPLAIN (~100 queries) when `optimizer_join_order` is set to `exhaustive`: tpcds base optimization time: 76s tpcds optimization time with this change: 82s (~8% increase) Co-authored-by: Chris Hajas <chajas@vmware.com> Co-authored-by: Domino Valdano <dvaldano@vmware.com>
darthunix
approved these changes
Aug 23, 2021
darthunix
left a comment
There was a problem hiding this comment.
- 0ef47d0 remind me about @Stolb27 's catch that we pack python bytecode files from the bundle.
- https://github.com/greenplum-db/gpdb/pull/12318 is one more nice case about replicated tables and serial/volatile functions (our custom fallback #108 doesn't fix this case)
maksm90
approved these changes
Aug 23, 2021
Collaborator
|
066d975 doesn't fix gpload tests |
hilltracer
pushed a commit
that referenced
this pull request
Mar 6, 2026
Remove deprecated mode 'rU' for Python 3 Python 2 has a file open mode 'U' that enables universal end of line. This mode was deprecated in Python 3 and removed in Python 3.11. Instead, `newline` argument is used, and its default value `newline=None` is equivalent to 'U' mode. So, it's sufficient to use 'r' mode on Python 3 everywhere 'rU' was used on Python 2. Also make sure subprocess32 module is disabled in Python 3. Ticket: GG-182
Stolb27
pushed a commit
that referenced
this pull request
Mar 10, 2026
Remove deprecated mode 'rU' for Python 3 Python 2 has a file open mode 'U' that enables universal end of line. This mode was deprecated in Python 3 and removed in Python 3.11. Instead, `newline` argument is used, and its default value `newline=None` is equivalent to 'U' mode. So, it's sufficient to use 'r' mode on Python 3 everywhere 'rU' was used on Python 2. Also make sure subprocess32 module is disabled in Python 3. Ticket: GG-182
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here are some reminders before you submit the pull request
make installcheck