ADBDEV-1975 ADB 6.17.2 sync by deart2k · Pull Request #245 · arenadata/gpdb

deart2k · 2021-08-23T08:26:39Z

Here are some reminders before you submit the pull request

Add tests for the change
Document changes
Communicate in the mailing list if needed
Pass make installcheck
Review a PR in return to support the community

In DXL to PlStmt translation ORCA decides whether to translate a motion node into a redistribute motion or a result hash filter node. In the case of tainted replicated the wrong translation can lead to incorrect results. Following demonstrates the issue of result hash filter node: ``` CREATE TABLE replicated_table(v VARCHAR) DISTRIBUTED REPLICATED; CREATE TABLE distributed_table(id BIGSERIAL, v VARCHAR) DISTRIBUTED BY (id); EXPLAIN (COSTS OFF) INSERT INTO distributed_table (v) SELECT v FROM replicated_table; QUERY PLAN ------------------------------------------------ Insert on distributed_table -> Result -> Result -> Seq Scan on replicated_table Optimizer: Pivotal Optimizer (GPORCA) ``` In this plan the replicated table is scanned on each segment and generates a unique sequence value. Since the sequence value is unique across all segments we now have a tainted replicated distribution. And because each tuple on all segments is unique, we not guaranteed to generate a single row in the replicated table that hashes to exactly one segment. Instead we can get missing and even duplicate rows. Fix is to force any motion produced because of a tainted replicated distribution to translate into a redistribute motion during DXL to PlStmt translation. Now we will generate the following plan: ``` QUERY PLAN ------------------------------------------------------------------------------ Insert on distributed_table -> Result -> Redistribute Motion 1:3 (slice1; segments: 1) Hash Key: (nextval('distributed_table__id_seq'::regclass)) -> Seq Scan on replicated_table Optimizer: Pivotal Optimizer (GPORCA) ``` Reported-by: Oleg Skrobuk <oskrobuk@gmail.com> Fixes #12219 (cherry picked from commit 717df62)

In case of insert replicated distribution into randomly distributed table we must ensure there is a random/redistribute motion to prevent skew to a single segment. This patch builds on commit d14e61d by adding the necessary logic to handle tainted and strict replicated distributions. Co-authored-by: Bhuvnesh Chaudhary <bchaudhary@vmware.com>

Consider the below example: EXPLAIN WITH CTE AS (SELECT c::int8, b FROM badestimate1) SELECT 1 FROM CTE WHERE c <> 1 OR c = 1 AND b <> 101; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..431.13 rows=2 width=4) -> Result (cost=0.00..431.13 rows=1 width=4) -> Result (cost=0.00..431.13 rows=1 width=1) Filter: ((c::bigint) <> 1 OR ((c::bigint) = 1 AND b <> 101)) AND ((c::bigint) <> 1 OR (c::bigint) = 1) -> Result (cost=0.00..431.13 rows=1 width=12) -> Table Scan on badestimate1 (cost=0.00..431.08 rows=3334 width=8) Optimizer status: PQO version 3.116.0 This histogram for c::int8 is not well defined, so while merging the histograms due to other predicates on c::int8, we cannot create a correctly defined histogram. But currently, we still merge 2 not well defined histograms and generate a well defined histogram in CHistogram::MakeUnionHistogramNormalize. This commits adds the condition to not merge the histograms if the histograms are not well defined. Co-authored-by: Bhuvnesh Chaudhary <bchaudhary@vmware.com> (cherry picked from commit 8e7b0fc)

The branch as it stood was dead code due to a prior mismerge. Co-authored-by: Kalen Krempely <kkrempely@vmware.com>

This ensures that when pg_dump is invoked in binary_upgrade mode against a 5X cluster during pg_upgrade, it collects and dumps the functions under pg_catalog belonging to an extension. We found this bug when we tried to upgrade PXF and PXF's functions did not get dumped in binary upgrade mode. (PXF's functions are created under pg_catalog) This change in 6X should have accompanied the backport of the extension machinery into 5X. Co-authored-by: Kalen Krempely <kkrempely@vmware.com>

The typo was introduced in bdad273 Co-authored-by: Kalen Krempely <kkrempely@vmware.com>

After optimizing with the "natural" distribution spec of a CTE producer, we try to translate the query's distribution spec to the column of the producer and do another round of optimization. This could lead to incorrect results when the query had a distribution spec (including equivalent distribution specs) that came from multiple CTE consumers, with some of these columns being equivalent because of "=" predicates between them. For example: with cte as (select a,b from foo where b<10) select * from cte x1 join cte x2 on x1.a=x2.b On the query side, columns x1.a and x2.b are equivalent, but we should NOT treat columns a and b of the producer as equivalent.

Authored-by: Bradford D. Boyle <bradfordb@vmware.com>

-t and -f flags are deprecated and gpcheckperf kept printing NOTICE: -t is deprecated, and has no effect NOTICE: -f is deprecated, and has no effect Co-authored-by: M Hari Krishna hmaddileti@vmware.com

When writing server log, GPDB always writes out the statement string and ignores the GUC log_min_error_statement. So fix it.

gpinitsystem currently creates backout script on coordinator and on each segment, and then appends all the segment backout scripts to the coordinator backout script and we ask the user to run only that script. If some of the gpcreateseg fail, the segment backout scripts aren't appended correctly in some cases, and the user has to manually cleanup those directories. This change creates a single backout script early in gpinitsystem run, so even if anything fails later, we would always be able to cleanup all segment directories. To simplify, we are using 'echo' to add lines into backout script instead of using BACKOUT_COMMAND Co-authored-by: Divyesh Vanjare <vanjared@vmware.com> (cherry picked from commit b0493ce)

…ported tarball. The bytecode is compiled and installed during package installation. Vendor pkgconfig files for libzstd, libuv, quicklz. Authored-by: Brent Doil <bdoil@vmware.com>

We should check unique hostnames instead of checking for hostname associated with each dbid. Each hostname has multiple dbids associated with it. In larger clusters it led to 100s of liveness checks and failed with filedescriptor out of range in select() error. In future we should move from `select` to `poll`.

… in Orca In CPhysicalInnerHashJoin::PppsRequired, we support 2 different partitioning requests: 1 with DPE and 1 without DPE. Currently, we only generate 1 partitioning request by default, but generate 2 if the join order GUC is set to greedy/mincard/query. This was done to allow for more alternatives without increasing the optimization time significantly. Now, we generate 2 requests for the greedy xform regardless of the GUC setting. We saw that this helps generate some lower cost alternatives with a small increase in optimization time (~8%). If we always generated 2 requests, the optimization time would increase by a much larger amount. Some optimization time performance numbers for running the set of tpcds queries with EXPLAIN (~100 queries) when `optimizer_join_order` is set to `exhaustive`: tpcds base optimization time: 76s tpcds optimization time with this change: 82s (~8% increase) Co-authored-by: Chris Hajas <chajas@vmware.com> Co-authored-by: Domino Valdano <dvaldano@vmware.com>

darthunix

0ef47d0 remind me about @Stolb27 's catch that we pack python bytecode files from the bundle.
https://github.com/greenplum-db/gpdb/pull/12318 is one more nice case about replicated tables and serial/volatile functions (our custom fallback #108 doesn't fix this case)

Stolb27 · 2021-08-24T09:07:22Z

066d975 doesn't fix gpload tests

Remove deprecated mode 'rU' for Python 3 Python 2 has a file open mode 'U' that enables universal end of line. This mode was deprecated in Python 3 and removed in Python 3.11. Instead, `newline` argument is used, and its default value `newline=None` is equivalent to 'U' mode. So, it's sufficient to use 'r' mode on Python 3 everywhere 'rU' was used on Python 2. Also make sure subprocess32 module is disabled in Python 3. Ticket: GG-182

dgkimura and others added 18 commits July 20, 2021 10:51

Fix version check mismerge in pg_dump.c:getFuncs()

eaea917

The branch as it stood was dead code due to a prior mismerge. Co-authored-by: Kalen Krempely <kkrempely@vmware.com>

pg_dump: Fix typo in #if-0ed version check

8228692

The typo was introduced in bdad273 Co-authored-by: Kalen Krempely <kkrempely@vmware.com>

Set statistics parameter for alter table was incorrect (#12261)

e365eb4

Added references (#12324)

63793b3

Fix multiple typos

b13fa83

Authored-by: Bradford D. Boyle <bradfordb@vmware.com>

Do not pass deprecated flags to gpnetbenchClient

cb429ee

-t and -f flags are deprecated and gpcheckperf kept printing NOTICE: -t is deprecated, and has no effect NOTICE: -f is deprecated, and has no effect Co-authored-by: M Hari Krishna hmaddileti@vmware.com

Make the GUC log_min_error_statement working for GPDB

098d517

When writing server log, GPDB always writes out the statement string and ignores the GUC log_min_error_statement. So fix it.

Vendor pkgconfig files, do not include compiled python bytecode in ex…

0ef47d0

…ported tarball. The bytecode is compiled and installed during package installation. Vendor pkgconfig files for libzstd, libuv, quicklz. Authored-by: Brent Doil <bdoil@vmware.com>

fix gpload cannot deal with column name in upper case (#12416)

066d975

ADBDEV-1975 ADB 6.17.2 sync

0ac518a

deart2k requested review from Stolb27, darthunix and maksm90 August 23, 2021 08:26

Merge branch 'adb-6.x' into 6.17.2-sync

1d82be8

darthunix approved these changes Aug 23, 2021

View reviewed changes

maksm90 approved these changes Aug 23, 2021

View reviewed changes

Stolb27 merged commit 903cc2d into adb-6.x Aug 24, 2021

Stolb27 deleted the 6.17.2-sync branch August 24, 2021 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADBDEV-1975 ADB 6.17.2 sync#245

ADBDEV-1975 ADB 6.17.2 sync#245
Stolb27 merged 19 commits intoadb-6.xfrom
6.17.2-sync

deart2k commented Aug 23, 2021

Uh oh!

darthunix left a comment

Uh oh!

Stolb27 commented Aug 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

deart2k commented Aug 23, 2021

Here are some reminders before you submit the pull request

Uh oh!

darthunix left a comment

Choose a reason for hiding this comment

Uh oh!

Stolb27 commented Aug 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants