Skip to content

ADBDEV-1905 6.17.1 sync#232

Merged
deart2k merged 152 commits intoadb-6.xfrom
6.17.1-sync
Aug 4, 2021
Merged

ADBDEV-1905 6.17.1 sync#232
deart2k merged 152 commits intoadb-6.xfrom
6.17.1-sync

Conversation

@deart2k
Copy link
Member

@deart2k deart2k commented Jul 22, 2021

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

paul-guo- and others added 30 commits June 3, 2021 15:34
Some tests enable fsync temporarily, but disable fsync incorrectly finally
since they use "gpconfig -r". "gpconfig -r" comments out the guc setting so
finally the system uses the default guc value but for guc fsync the default
value is on. Fixing this by explicitly set fsync off with gpconfig in those
tests.

This change should be able to speed up testing a lot since all subsequent tests
could be accelerated.

Reviewed by: Ashwin Agrawal <aashwin@vmware.com>

(cherry picked from commit b43ffa0)
NOT IN is equivalent to "<> ALL" SQL expression and should be replaced
with LAS-Apply-NotIn in CSubqueryHandler::FRemoveAllSubquery().

Earlier, ORCA transformed "<> ALL" subquery to a LAS-Apply-NotIn with the apply
predicate as a filter over it's inner child (See CXformSelect2Apply). This is
problematic when the subquery contains NULLs, as the generated filter is NULL
rejecting. Thus, although LAS-Apply-NotIn correctly handles the NULL-semantics
of a NOT IN subquery, an incorrect NULL rejecting filter on inner side will
prevent the Apply from seeing any NULLs, resulting in wrong results.

For example: A LOJ inside the NOT IN subquery (see issue #10967)

create table t1(a int) distributed by (a);
insert into t1 values (1), (2), (3);
set optimizer_join_order='query';
select * from t1 where a not in (
    select b.a from t1 a left join t1 b on false);

ORCA transforms the following SubqueryAll operator:

+--CScalarSubqueryAll(<>)["a" (16)]
    |--CLogicalLeftOuterJoin
    |  |--CLogicalGet "t1" ("t1")
    |  |--CLogicalConstTableGet
    |  +--CScalarConst (1)
    +--CScalarIdent "a" (0)

to a LAS-Apply-NotIn, with a filter over it's inner child:

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16))
    |--CLogicalGet "t1" ("t1")
    |--CLogicalSelect
    |  |--CLogicalLeftOuterJoin
    |  |  |--CLogicalGet "t1" ("t1"),
    |  |  |--CLogicalConstTableGet
    |  |  +--CScalarConst (1)
    |  +--CScalarCmp (=)
    |     |--CScalarIdent "a" (0)
    |     +--CScalarIdent "a" (16)
    +--CScalarConst (1)

The filter is NULL rejecting, and thus the inner Left Outer Join is
normalized to an Inner Join:

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "b" (16))
   |--CLogicalGet "t1" ("t1")
   |--CLogicalInnerJoin
   |  |--CLogicalGet "t2" ("t2")
   |  |--CLogicalConstTableGet
   |  +--CScalarCmp (=)
   |     |--CScalarIdent "a" (0)
   |     +--CScalarIdent "b" (16)
   +--CScalarConst (1)

The Apply operator sees an empty set on the inner side, thus produce the
incorrect output of : (1), (2), (3).

This commit fixes issue #10967, by placing the predicate in the
LAS-Apply-NotIn, instead of generating a NULL rejecting filter:

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "b" (16))
   |--CLogicalGet "t1" ("t1")
   |--CLogicalLeftOuterJoin
   |  |--CLogicalGet "t2" ("t2")
   |  |--CLogicalConstTableGet
   |  +--CScalarConst (1)
   +--CScalarCmp (=)
      |--CScalarIdent "a" (0)
      +--CScalarIdent "b" (16)
This commit introduces a pg_upgrade check to look for views referencing
deprecated tables in a 5X cluster.

This is accompanied by 5X PR: #12072, which introduces a
pg_upgrade_support UDF to detect such views.
Currently, when recovering segments gprecoverseg updates the pg_hba.conf
entries on every acting primary regardless of whether those primaries
have mirrors to be recovered.  This commit restricts the update to only
those primaries that need to be updated.

Co-authored-by: Jamie McAtamney <jmcatamney@vmware.com>
Co-authored-by: Nikhil Kak <nkak@vmware.com>
This fixes the upgrade path for 6X->6X upgrades which was broken by
67df554 with the following error during a 6X->6X upgrade:

Checking for views referencing deprecated tables            SQL command failed
CREATE OR REPLACE FUNCTION view_references_deprecated_tables(OID) RETURNS BOOL AS '$libdir/pg_upgrade_support' LANGUAGE C STRICT;
ERROR:  could not find function "view_references_deprecated_tables" in file "/usr/local/greenplum-db-devel/lib/postgresql/pg_upgrade_support.so"

For upgrades from 6X->6X, we should not have to run this check as
deprecated object detection only applies to major version upgrades.
The SQL 2003 standard specified 4 different levels of "SQL data access
indication":

1. NO SQL
2. CONTAINS SQL
3. READS SQL DATA
4. MODIFIES SQL DATA

Upstream PostgreSQL doesn't support any of the above. Greenplum has had
syntactic support for this as part of `CREATE FUNCTION` command, but so
far we only store that in the catalog (and possibly dump back out),
without taking it into consideration in our planning or execution
process. Except ORCA does, for a seemingly unnecessary purpose: ORCA
opts out of various optimizations when it sees a function that is not
"NO SQL" (that's a mouthful). The intent, it seems, is we really want to
execute such functions on the QD. But that seems questionable for two
fundamental reasons:

1. Forgoing the various optimizations (as evidenced by the test cases in
   this patch) does NOT ensure we execute these functions on the QD
   node.
1. "executing on QD" is only an approximate intent, the real desired
   outcome is that we want to enable a *dispatch* from within functions
   that contain SQL. But we don't have any control over that, either:
   often an active dispatch is already in-flight and the plan is
   effectively non-executable.
1. There is a small niche that *does* match the "execute on QD" intent,
   namely, querying a "master-only" table (such tables also exist on the
   segments, but they contain no data) in a SQL function. That's where
   the `EXECUTE ON COORDINATOR` syntax is the best signal.

TL;DR: we are effectively pessimizing for any function that possibly
contains SQL, without ensuring that we can achieve the intended goal
(dispatch nested queries from QD). We should rethink whether the SQL
data access indication is even the right signal for us to achieve such
goals (I don't think so).

This patch still keeps the data access attribute in metadata (DXL), but
removes it from the rest of ORCA, including conditions that used to
consider them alongside volatile. As a result, we can expect various
more optimal plans in different situations. See the patch for details.

One thing that this commit differs from 384e5f8 is that we need
to revert to using C++98 syntaxes (default initializers are out).

Co-authored-by: Ekta Khanna <ekhanna@pivotal.io>
(cherry picked from commit 384e5f8)
* Putting recovermatrix.png in correct subdir.

* Additional small fixes
…ld is External Scan (#11970) (#12080)

- For the child of UNION ALL, based on the distribution spec of the outer
  child, the distribution spec is requested on the inner child.  In this
  specific case, the outer child had EdtExternal distribution spec, and it
  requested Any Distribution spec from the inner child.
- UNION ALL derives Replicated Distribution Spec if any of the child has
  Replicated Distribution Spec.

So, for the query:
explain select * from ext_tbl union all select * from replicated_tbl
where ext_tbl derives External Distribution Spec, and replicated_tbl has
Replicated distribution spec.
UNION ALL derived Replicated Distribution Spec, and it was requested Singleton
Distribution Spec, so a Gather Motion from only 1 segment was added on top of
Union ALL, due to which data was missing from the other segments for the
external table. The resulting bad plan is below:
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..914.99 rows=1000001 width=4)
   ->  Append  (cost=0.00..879.07 rows=2000002 width=4)
         ->  Foreign Scan on ext_tbl  (cost=0.00..440.35 rows=1000000 width=4)
         ->  Seq Scan on replicated_tbl  (cost=0.00..431.00 rows=2 width=4)

Ideally, it should have pulled data from all the segments for the external
table, and for the replicated table it should have filtered out data from all
segments except one.

In order to achive the same, this commit replaces External Distribution Spec
with Random Distribution Spec because External Distribution spec implies Random
Distribution. Now, UNION ALL will request Non Singleton Distribution spec from
it's inner child and place a filter on top of the scan over the replicated
table. The Derived Spec of UNION ALL will be NULL, and a Gather Motion from 3
segments will be placed on top of it resulting in a plan like below:
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..895.03 rows=1000001 width=4)
   ->  Append  (cost=0.00..877.07 rows=500001 width=4)
         ->  Foreign Scan on ext_tbl  (cost=0.00..440.35 rows=500000 width=4)
         ->  Result  (cost=0.00..431.00 rows=1 width=4)
               One-Time Filter: (gp_execution_segment() = 0)
               ->  Seq Scan on replicated_tbl  (cost=0.00..431.00 rows=1 width=4)

Co-authored-by: Bhuvnesh Chaudhary <bchaudhary@vmware.com>
Co-authored-by: David Kimura <dkimura@vmware.com>

(cherry picked from commit 794cd4e)
CreateReplicationSlot() and DropReplicationSlot() were not cleaning up
the query buffer in some cases (mostly error conditions) which meant a
small leak.  Not generally an issue as the error case would result in an
immediate exit, but not difficult to fix either and reduces the number
of false positives from code analyzers.

In passing, also add appropriate PQclear() calls to RunIdentifySystem().

Pointed out by Coverity.

(cherry picked from commit 273b29d)
AIX has never been supported for GP6. Remove code supporting AIX in the
enterprise build system.

Authored-by: Brent Doil <bdoil@vmware.com>
* Fix typos

* Correct command option

* Update link

* Correct example insert statement

* Add note: run commands as root user

* Redact info

Co-authored-by: David Yozie <dyozie@pivotal.io>
* Adding missing option and its explanation

* Changing order of options, per reviewer feedback.
…ides (#11903)

* First pass at synching up backup/restore content across both guides

* Adding new command to menu
gpmovemirrors and gprecoverseg do no spawn the correct number of
parallel processes as intended by -B option.
This change fixes that behavior
We would be using two options for finer contol.
'-B' option limits the number of hosts working in parallel for recovery
'-b' option limits the number of segments working in parallel on a host

Co-authored-by: David Krieger <dkrieger@vmware.com>
(cherry picked from commit 695a73a)
Adding test steps and tests to check if the -B and -b options passed in
to gpmovemirrors and gprecoverseg are used as intended

(cherry picked from commit 25e463d)
gprecoverseg -r didn't pick up the -B/-b options for spawning correct
number of parallel processes.
This change fixes that behaviour.

Co-authored-by: Orhan Kislal <okislal@vmware.com>
(cherry picked from commit 00d691d)
gpaddmirrors currently does not spawn correct number of parallel
processes as intended by -B option.
This change fixes that behavior.
We would be using two options -B/-b for finer contol.

(cherry picked from commit 9c399a7)
* Change GPText to Tanzu Greenplum Text in GPDB content.

* Reviewer feedback
* added x-log

* Added refernce to Xlog
If one (or more) of the hosts are unreachable, gprecoverseg -p
showed an error (ssh: unreachable host) even though the operation
continued and finalized successfully. This commit adds checks to
eliminate these redundant ssh calls.

Co-authored-by: Orhan Kislal <okislal@vmware.com>
At the moment, an error will always generate a debug_query_string
exception message in the logs even when log_min_messages is set to
PANIC. This can be a security concern if the query string is not meant
to be logged (e.g. a failed pgcrypto decrypt query is logged which can
be equivalent to logging a plaintext password).

Fix the issue by changing the call function from write_stderr to elog.
With elog, the log message will be controlled by log_min_messages and
may enhance the original debugging intent (the debug_query_string log
will always show the user-executed query along with the actual query
that failed).

GPDB references:
https://github.com/greenplum-db/gpdb/pull/1456
https://github.com/greenplum-db/gpdb/commit/bd7f682392b30b859588d0d60f2625095f1cd63f

Backport of https://github.com/greenplum-db/gpdb/commit/e1c553bb9e4772a4124f7c2a41c552998f14cf88.
(Jerome)Junfeng Yang and others added 5 commits July 20, 2021 09:59
This is a regression caused by commit: 6ae9179.
The refactor forget to consider agg on entryDB.
So fix the issue and add a test case to cover it.
some problem may caused by standard_conforming_strings,
we check the standard_conforming_strings in gpdb and give some advice for it
grps_tlist for MppGroupContext may contains extra vars with the
group expression in some cases, for example subplan in group expression.
But for normal case it only contains group expression in grps_tlist.
We used to sure that there could only be one target entry in grps_tlist
for a group.
When we build the 3 stage agg and try to split the Aggref and find or
add target list into different Aggref stages with function `split_aggref`,
it'll match to wrong Vars in `AGGSTAGE_INTERMEDIATE` and `AGGSTAGE_FINAL`
stage.

Reviewed-by: Zhenghua Lyu <zlyu@vmware.com>
Reviewed-by: Wen Lin <linw@vmware.com>
Previous commit d87d5e6 use GUC
optimizer_enable_groupagg to force orca generate same plan with
planner for a test, but seems it can not guarantee this right now.
So disable orca for this case.
@deart2k deart2k changed the title 6.17.1 sync ADBDEV-1905 6.17.1 sync Jul 22, 2021
@Stolb27
Copy link
Collaborator

Stolb27 commented Jul 23, 2021

GPORCA conflicts are caused by the mismatch of our commit and upstream backport. Upstream backport contains only part of our changes. Moreover, we can't create a minidump for AllSubqueryWithSubqueryInScalar with the upstream version of the patch, because ORCA can't produce a valid plan:

test=#      EXPLAIN SELECT * FROM foo WHERE (SELECT a FROM foo limit 1) = ALL(SELECT b FROM bar);
LOG:  statement: EXPLAIN SELECT * FROM foo WHERE (SELECT a FROM foo limit 1) = ALL(SELECT b FROM bar);
LOG:  2021-07-23 03:19:50:933783 UTC,THD000,TRACE,"[OPT]: Number of plan alternatives: 12
",
2021-07-23 03:19:50:942197 UTC,THD000,ERROR,"DXL-to-PlStmt Translation: Attribute number 22661984 not found in project list",
LOG:  Pivotal Optimizer (GPORCA) failed to produce plan (unexpected)

We don't which query creates AllSubqueryWithSubqueryInScalar.mdp but it passes the test successfully.
After discussion with @darthunix, we decided to revert the upstream patch and open an issue about the problem.

Stolb27 added 3 commits July 23, 2021 05:05
Upstream implementation break down minidump creation
for AllSubqueryWithSubqueryInScalar.mdp (see #232)

This reverts commit af7c79b.
we remove such functionality previously 89329d8
but it was added again in 1e375f0
@Stolb27
Copy link
Collaborator

Stolb27 commented Jul 27, 2021

69f0824 implements new gpload test suite that requires pytest in the environment.

1e375f0 removes gcc sourcing from test pipelines
Upstream are going to use docker environment variables to inject PATH and LD_LIBRARY_PATH
But this variables is applied only for root, but not for gpadmin user that run test too.
But gpadmin uses gcc to build debug extensions during test run.
As a result, gpadmin uses system gcc with pg_config flags from GCC 7
@Stolb27 Stolb27 force-pushed the 6.17.1-sync branch 3 times, most recently from 2cad289 to d231860 Compare July 28, 2021 03:12
69f0824 introduced pytest tests for gpload
its backport from master and currently it doestn't work correctly
and is disabled for different distributions (centos6, redhat, ubuntu).
I've decided to disable this tests and open ADBDEV-1915
to resolve this problem later.
@Stolb27
Copy link
Collaborator

Stolb27 commented Jul 28, 2021

@maksm90, @darthunix, @leskin-in this PR is ready for review. Please, take a look.

Copy link

@darthunix darthunix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a series of potentially dangerous commits about fsync on startup (I wish we would not be in trouble one day because if them). It was also funny to see that we still have logical errors in resource queues.

Thanks @Stolb27 for his bug fixes in a current release.

@Stolb27
Copy link
Collaborator

Stolb27 commented Aug 3, 2021

  1. 894d868 removes ORCA's usages for prodataaccess attributes
  2. c99892a by @InnerLife0. We need to rebase and correct Check if subpath can do motion. #219 after merging this one

@deart2k deart2k merged commit f215ca3 into adb-6.x Aug 4, 2021
@deart2k deart2k deleted the 6.17.1-sync branch August 4, 2021 06:09
hilltracer pushed a commit that referenced this pull request Mar 6, 2026
In python 3.12+, adding a backslash before a non-special character results in
SyntaxWarning. This escaping is often used in regex strings. This patch fixes
string literals that cause such a warning. Regex strings are now raw strings. In
the other strings, escaping is fixed.
Stolb27 pushed a commit that referenced this pull request Mar 10, 2026
In python 3.12+, adding a backslash before a non-special character results in
SyntaxWarning. This escaping is often used in regex strings. This patch fixes
string literals that cause such a warning. Regex strings are now raw strings. In
the other strings, escaping is fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.