Skip to content

ADBDEV-2124 ADB 6.18.0 sync#269

Merged
Stolb27 merged 109 commits intoadb-6.xfrom
6.18.0-sync
Oct 18, 2021
Merged

ADBDEV-2124 ADB 6.18.0 sync#269
Stolb27 merged 109 commits intoadb-6.xfrom
6.18.0-sync

Conversation

@deart2k
Copy link
Member

@deart2k deart2k commented Oct 11, 2021

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

kainwen and others added 30 commits August 10, 2021 09:29
…n Statement.

Greenplum has a specific logic to fix up unknown vars during parse-analyzing
set-operation SQL statement. It will invoke fixup_unknown_vars_in_setop to do
the job. A set-operation SQL statement, like Q: q1 union all q2, when fixing up
Q, we are at level 0. Parse-analyze q1 will be in a sub parse-state of Q and that
state will be free-ed when finishing parsing q1. So when we are at Q's context
and decide to fix up unknown type vars in q1 or q2, we need to shift the context
level by 1. See Github Issue greenplum-db/gpdb#12407
for details.

This commit fixes the issue.
dispatchResult->resultbuf returned by cdbdisp_makeResult() may return NULL due to OOM.
We should check the pointer when Gang cleanup in cdbdisp_resetResult().
Set a FAULT_INJECTOR(make_dispatch_result_error) to simulate this scenario.

Github issue greenplum-db/gpdb#12399

Co-authored-by: Zhenghua Lyu <kainwen@gmail.com>
Co-authored-by: Mingli Zhang <avamingli@gmail.com>
* Docs: added reference to extension pljavat

* modifications
Greenplum has a specical method to implement semjoin: it
might create unique rowid path to first do inner join and
then de-duplicated based on ctid. The method was added to
Greenplum long before but after merging so many commits
from upstream, some new logic might fail when applying this
method. IndexOnlyScan is a case that the de-dup logic did
not consider. IndexOnlyScan's var is pointing to the Index
Rel, not the Heap Rel, to totally fix the logic seems risky
on a stable branch like 6X. So this commit just disallows
the unique rowid method when seeing indexonly scan to avoid
error-out. Index paths are set during base rel, and planner
then generates join rels so that the disallow-loigc works.
Greenplum has a special path to handle semjoin,
planner might add unique_row_id path to first inner join
and then de-duplicate. The logic is added into
Greenplum long before, but when merging more code from
upstream, the old logic does not consider new paths,
there are two relevant issues:

  * https://github.com/greenplum-db/gpdb/issues/3719
  * https://github.com/greenplum-db/gpdb/issues/12402

issue-12402 is that while creating a plan for the second append rel,
it will add pseudo column in rtable of subroot, however, do not
add it to final_rtable (subroot->rtable of the first append rel),
so failed to find the corresponding pseudo column in cdbpath_dedup_fixup.

issue-3719 is that the pseudocols of root->rtable is not null,
while creating a plan for first append rel, in build_simple_rel
'Assert(!rte->pseudocols)' rel isn't expected to have any pseudo columns yet.

maybe we should enhance the old logic in cdbpath_dedup_fixup,
but at a stable branch 6X, it seems risky. Here we introduce
a switch to turn off it in inheritance_planner.
by disalow unique_rowid_path, the above two issues is ok.

Co-authored-by: Zhenghua Lyu <kainwen@gmail.com>
Before current commit we initailized pg_aocsseg entries with frozen
tuples to make these entries implicitly visible even after rollback.
It is a working approach for AO tables, but AOC tables store additional
"vpinfo" structure with serialized information about every column EOF.
This can cause inconsistency between "vpinfo" structure in pg_aocsseg
entries and the actual number of columns in a table when we modify the
amount of columns under explicit transaction and rollback it later.
As a result we fail on asserts in "GetAllAOCSFileSegInfo_pg_aocsseg_rel()"
or (in a case of a build without asserts) in some unpredictable places
in the code when retrieve metadata from pg_aocsseg.

As a fix this commit inserts simple tuples to pg_aocsseg but still insert
frozen tuples to gp_fastsequence (as it generates row numbers for segment
files and should never be rollbacked to avoid inconsistency in index pointers
of the AO/AOC tables).

Co-authored-by: Vasiliy Ivanov <ivi@arenadata.io>
* Changed default values for gucs

* Modified default for unix_socket_permissions and added log_file_mode

* modification
This is a backport of PR: #12465

This commit ensures that the following lock table corruption scenario is
avoided:

Consider two sessions S1 and S2.

1. S1 is currently stuck in ResProcSleep(), waiting for a resource queue
lock (one possible reason being S1 has hit the active_statements limit
on the resource queue)

2. We cancel S1 with pg_cancel_backend() or kill SIGINT.

3. S1 exits ResLockAcquire() with an ERROR due to the cancel (from
withing the ProcessInterrupts() call within PGSemaphoreLock() within
ResProcSleep()).

4. S1 catches the ERROR in the PG_CATCH in ResLockPortal() and calls
ResLockWaitCancel(), which removes the proclock and lock shmem hash
table entries (with call to ResRemoveFromWaitQueue()). However, we don't
remove the locallock entry from the locallock hash table.

Note: When a hash table entry is removed, the shmem chunk used for it is
recycled into the hash table level freeList.

5. S1 now calls ResLockRelease(). Since we haven't removed the locallock
in step 4, S1's subsequent search for the locallock in the locallock
hash table (with the same locallocktag from step 1) will be successful.
Let's say execution proceeds and halts on the line just *after* the
branch with the check:

* Verify that our LOCALLOCK still matches the shared tables.

6. S2 now acquires a relation lock and the same shmem chunks that were
sent to the freeList in step 4 are used for the lock/proclock hash table
entries in LockAcquire(). Also, further consider that the partition# for
the lock/proclock is different from the lock/proclock from step 1-4.

7. Now S1 proceeds to call ResCleanupLock(). Note that its locallock's
lock/proclock pointers point to the lock/proclock entries for the
relation lock grabbed by S2! These pointers are dangling! So when S1
will call ResCleanupLock(), it will clean S2's lock/proclock state. We
will thus end up with a situation where ResCleanupLock() cleans
non-resource-queue locks (in this case a relation lock)!

8. S2 now tries to release the relation lock and cores out in
CleanupLock() as it tries to access memory freed by S1 in step 7.

The fix is simply to remove the locallock at the end of step 4. It
shouldn't exist if its shmem overlords cease to exist.
Note that LockWaitCancel() does not do this as the same locallocktag is
not reused in this manner in PG locking code.
We also added a sanity check and PANIC in ResCleanupLock() to scream if
we ever try to clean up a non-resource queue lock.

IMP note: If in step 2, S1 errors out with a deadlock report, as opposed
to being cancelled, it can give rise to the same corruption. So, we
have to clean up the locallock after the ResRemoveFromWaitQueue() call
in CheckDeadLock() also.

Steps to manually reproduce the corruption scenario:

1. Create a resource queue with active_statements=1 and hold the active
slot forever with a query. (See isolation2 test resource_queue_deadlock
to see how to do this with before_auto_stats suspend fault)

2. In another psql session, run a SELECT query against the resource
queue (e.g. SELECT version()). This will block because step 1 is holding
the resource queue active slot. This SELECT query will act as S1.

3. Run a concurrent workload which will simulate S2. (e.g. 10 concurrent
CTAS/DROP TABLE). Do this in a long-running/infinite loop.

4. In another psql session, run pg_cancel_backend() on the blocked
SELECT query (S1) in an infinite loop.

5. Eventually, a segfault and resultant PANIC will be observed in S1 or
S2.

Co-authored-by: Yao Wang <wayao@vmware.com>
Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
Co-authored-by: Kate Dontsova <edontsova@pivotal.io>
Co-authored-by: Jimmy Yih <jyih@vmware.com>
(cherry picked from commit 2fa7c06)
When creating a plan with subplan in ParallelizeSubplan,it use
the wrong flow to decide motion type for subplan. And the wrong
flow will cause subplan set wrong motionType and cause assertion
failure Assert(recvSlice->gangSize == 1) in nodeMotion.c.

When ParallelizeSubplan(), if the subplan is uncorrelated, multi-row
subquery,then it either focuses or broadcasts the subplan based on the
flow which describe the containing plan node's slice execution position.
Actually the flow should be the top-level flow for the corresponding slice
instead of the containing plan node's flow. To select the correct flow during
the plan tree iteration, we only set currentPlanFlow when jump into a new slice.

This fixed issue: #12371
…12452)

Previously, `IS DISTINCT FROM FALSE` was simplified to checking if it is true.
However, due to the three valued logic of SQL, these are not equivalent
statements. This change changes the direct equality check to an OR between the
equality check and `IS NULL`.

```
CREATE TABLE tt1 (a int, b int);
CREATE TABLE tt2 (c int, d int);

INSERT INTO tt1 VALUES (1, NULL), (2, 2), (3, 4), (NULL, 5);
INSERT INTO tt2 VALUES (1, 1), (2, NULL), (4, 4), (NULL, 2);

EXPLAIN SELECT b FROM tt1 WHERE NOT EXISTS (SELECT * FROM tt2 WHERE (tt2.d = tt1.b) IS DISTINCT FROM false);
```

Expected/Patched:
```
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Result  (cost=0.00..1324032.17 rows=1 width=4)
   Filter: (SubPlan 1)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=4)
         ->  Seq Scan on tt1  (cost=0.00..431.00 rows=1 width=4)
   SubPlan 1
     ->  Result  (cost=0.00..431.00 rows=1 width=4)
           Filter: ((tt2.d = tt1.b) IS DISTINCT FROM false)
           ->  Materialize  (cost=0.00..431.00 rows=1 width=8)
                 ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..431.00 rows=1 width=8)
                       ->  Seq Scan on tt2  (cost=0.00..431.00 rows=1 width=8)
 Optimizer: Pivotal Optimizer (GPORCA)
(11 rows)

RESET
 b
---
(0 rows)
```

Actual:
```
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..862.00 rows=1 width=4)
   ->  Hash Anti Join  (cost=0.00..862.00 rows=1 width=4)
         Hash Cond: (tt1.b = tt2.d)
         ->  Seq Scan on tt1  (cost=0.00..431.00 rows=1 width=4)
         ->  Hash  (cost=431.00..431.00 rows=1 width=4)
               ->  Broadcast Motion 3:3  (slice2; segments: 3)  (cost=0.00..431.00 rows=1 width=4)
                     ->  Seq Scan on tt2  (cost=0.00..431.00 rows=1 width=4)
 Optimizer: Pivotal Optimizer (GPORCA)
(8 rows)

RESET
 b
---
 5

(2 rows)
```
…a (#12491)

In some edge cases in Orca, groups may not have had statistics derived
during the exploration stage. This is ok in many cases, as these groups
may not be used during implementation or their stats may not be accessed
during implementation. However, there are some cases where we do need
these stats, and trying to access them resulted in a crash.

This commit adds a defensive check to fall back to planner if we
encounter such a scenario, similar to checks added in
a6f92be1b8327aadc7l.

Co-authored-by: Chris Hajas <chajas@vmware.com>
Co-authored-by: Shreedhar Hardikar <shardikar@vmware.com>
Retrieve the correct pstate for GroupingFunc according to varlevelsup. With assert enabled, 
it will hit assert failure; without assert, It will lead to Null Pointer reference and PANIC the QD.

This commit fix github issue: https://github.com/greenplum-db/gpdb/issues/12343
…irected releases.

Add SERVER_SRC_RC_PREFIX to server git repo tarball.

Authored-by: Bhanu Kiran Atturu <batturu@vmware.com>
…pability when CREATE/ALTER resouce group. (#12476)

In some scenarios, the AccessExclusiveLock for table pg_resgroupcapability may cause database setup/recovery pending. Below is why we need change the AccessExclusiveLock to ExclusiveLock.

This lock on table pg_resgroupcapability is used to concurrent update this table when run "Create/Alter resource group" statement. There is a CPU limit, after modify one resource group, it has to check if the whole CPU usage of all resource groups doesn't exceed 100%.

Before this fix, AccessExclusiveLock is used. Suppose one user is running "Alter resource group" statement, QD will dispatch this statement to all QEs, so it is a two phase commit(2PC) transaction. When QD dispatched "Alter resource group" statement and QE acquire the AccessExclusiveLock for table pg_resgroupcapability. Until the 2PC distributed transaction committed, QE can release the AccessExclusiveLock for this table.

In the second phase, QD will call function doNotifyingCommitPrepared to broadcast "commit prepared" command to all QEs, QE has already finish prepared, this transation is a prepared transaction. Suppose at this point, there is a primary segment down and a mirror will be promoted to primary.

The mirror got the "promoted" message from coordinator, and will recover based on xlog from primary, in order to recover the prepared transaction, it will read the prepared transaction log entry and acquire AccessExclusiveLock for table pg_resgroupcapability. The callstack is:

After that, the database instance will start up, all related initialization functions will be called. However, there is a function named "InitResGroups", it will acquire AccessShareLock for table pg_resgroupcapability and do some initialization stuff. The callstack is:

The AccessExclusiveLock is not released, and it is not compatible with any other locks, so the startup process will be pending on this lock. So the mirror can't become primary successfully.

Even users run "gprecoverseg" to recover the primary segment. the result is similar. The primary segment will recover from xlog, it will recover prepared transactions and acquire AccessExclusiveLock for table pg_resgroupcapability. Then the startup process is pending on this lock. Unless users change the resource type to "queue", the function InitResGroups will not be called, and won't be blocked, then the primary segment can startup normally.

After this fix, ExclusiveLock is acquired when alter resource group. In above case, the startup process acquires AccessShareLock, ExclusiveLock and AccessShareLock are compatible. The startup process can run successfully. After startup, QE will get RECOVERY_COMMIT_PREPARED command from QD, it will finish the second phase of this distributed transaction and release ExclusiveLock on table pg_resgroupcapability. The callstack is:

The test case of this commit simulates a repro of this bug.

Backport of 3f58c32
…ashrc banner present on

Management utils are fails when user configured banner through bashrc
gprecoverseg and gpinitstandby are addressed as part of this PR. Most of
the changes are related to parsing failures which are addresed below.

Info : Backporting changes from master to 6X with minor additional
changes
pg_ctl supports two internal cmdline arguments, --wrapper and
--wrapper-args, they are gpdb specific arguments for debugging purpose,
however we lost them since gpdb 5 during the postgres merging. The
arguments are still accepted, just not being used.

Now we bring the function back.

Co-Authored-By: Ning Yu <nyu@pivotal.io>
Co-Authored-By: Adam Lee <adlee@vmware.com>
Commit ee37736 pulled in part of PR #10672.  That uncovered a bug
in gpinistandby where dbid and numcontents were being set during
pg_ctl startup.  This is no longer necessary, dbid is set through
internal.postgresql.conf.  Let's stop setting dbid and numsegments.
Without this change, the incorrect use of --wrapper option cause
pg_ctl to fail with error such as:

   /bin/sh: line 0: exec: 3: not found
Authored-by: Shaoqi Bai <bshaoqi@vmware.com>
02e054c tried to improve gpload merge performance by delete the rows in the staging table which has already existed and then insert new rows.
This change has regression when orca is on, so we revert this change. And still use left outer join to realize merge.
Seems left over debug message by commit 22c1ec8. It floods the
log files unnecessarily, when using bitmap index scans.

Reviewed-by: Bhuvnesh Chaudhary <bhuvnesh2703@gmail.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
(cherry picked from commit 2ce4111)
log_lock_waits was never officially supported and it was missing from
our official documentation. This commit disallows the GUC from being
set. Setting the GUC to on will be a no-op with a WARNING emitted.

The reason why we can't support this GUC boils down to inconsistencies
between ResProcSleep() and its upstream equivalent ProcSleep().
ResProcSleep() clearly did not get the memo about log_lock_waits.
(introduced in e52c4a6, much after resource queues were incepted)

The way the GUC works is simple: When anything other than a hard
deadlock is detected in CheckDeadlock(), the waiting process wakes up
inside ProcSleep(), logs the lock wait, and then goes back to sleep.

Unlike ProcSleep(), if a process is waiting on a resource queue in
ResProcSleep(), after waking up, we don't log the lock wait. That's not
all - the code flow leads to a spurious and possibly empty (no errdetail
capturing deadlock info) deadlock report. This is because
MyProc->waitStatus is still set to STATUS_ERROR after the wakeup in
ResProcSleep().

The spurious deadlock report has further ramifications.

1) Unlike we would in a hard deadlock report, we don't call
ResRemoveFromWaitQueue(). This has adverse consequences:

a) The PGPROC entry for the backend is not removed from the resource
queue lock's wait queue (waitProcs). Now, the same PGPROC entry will be
recycled and used for another backend after this backend exits. So, the
waitProcs link will be dangling and segfaults can result whenever the
waitProcs list is traversed (such as ResProcLockRemoveSelfAndWakeup(),
FindLockCycleRecurse() etc).

b) The portal increment is not removed. This can lead to:
WARNING: duplicate portal id <> for proc <>
and a misleading ERROR which follows immediately after:
ERROR: out of shared memory adding portal increments
for the same process, on a subsequent ResLockAcquire().

2) We don't clean up the locallock, opening up a possibility for the
same memory corruption scenario as the one fixed in f8348f9.

Note: Fixing the misleading out of shared memory message is left for a
later commit.
Historically, GP hasn't supported non-standby recovery (i.e. recovery
with standby_mode=off). However, with the PITR functionality introduced
in 6X, that can no longer hold. This is a follow-up commit which
ensures that a GP cluster can be started in non-standby mode.

A TAP test is added along with basic infrastructure to support TAP tests
under src/test/recovery. 010_logical_decoding_timelines.pl is removed,
since it doesn't pass and 6X does not support logical decoding anyway.
This has been causing Segmentation Faults while trying to write error
messages to the logs.

Also:  fix logic for detecting errors while calling addr2line or atos
 (previously, addr2line_ok was always set to true, even after a failure)
* Docs: updated gp_vmem calculation

* Few mods

* Small typo fix

* small edit

* Same fixes in different file

* Porting edits to another file

* Porting edits to file.

Co-authored-by: David Yozie <dyozie@pivotal.io>
Ashwin Agrawal and others added 13 commits September 24, 2021 18:16
Bitmap indexes bitmap pages don't use standard buffer page
structure. These store the hwords and cwords as part of page content,
which forms the full page content of the
page. `XLOG_BITMAP_UPDATEWORD` and `XLOG_BITMAP_UPDATEWORDS` WAL
records incorrectly set `buffer_std to true` when writing WAL. This
caused backup block created to be empty for these pages as it conveys
hole start as 24 and length as 32728 (which essentially maps to entire
page to skipped from backed up). We should be copying full 32K page as
backup block for bitmap pages.

Co-authored-by: Jimmy Yih <jyih@vmware.com>
Test validates meta-page is read from disk during bitmpa LOV item
insert replay. There used to be bug such that if the metapage is not
present in shared buffers, it will not be fetched from disk. Instead,
a zeroed out page will be returned from memory. A subsequent flush of
the metapage will lead to an inadvertent permanent overwrite.

Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
when the environment does not has a $HOSTNAME database present,
gpstate command throwing FATAL error on master logs

solution: Try to utilize the default DB defined in PGDATABASE from
gpstate
… line

The copy command read data from a file and process data once a character.
In non-CSV mode, anything after a backslash is skipped over, so that EOL
character after a backslash can not be recognized.

An example file with content:
	1, message1
	2, message2\
	3, message3

The copy from the command will fail because line2 and line3 will be regarded as one line.
This pr also fix other EOL: '\r', '\r\n', keep the same behavior with 5x
This pr cherry-pick from commit e07d849, but fix test
data produce issue on 6x.

(cherry picked from commit e07d849)

Co-authored-by: Mingli Zhang <avamingli@gmail.com>
Co-authored-by: zhaorui <zhaoru@vmware.com>
The script fails on the pipeline when downloading postgres source from
the URL for testing postgres_fdw tests. This is because there are new
updated certificates.
This commit adds a temporary workaround to skip the certificate check
for now.
…ap page"

This reverts commit 3420e77.

We found a regression with incorrect results being returned when
multiple bitmap indexes are used in a plan (reproduced with both ORCA
and planner). See repro below.

This regression was introduced in the reverted commit because of the
following reasons:

1. In the reverted commit, we extended the use of
BMBatchWords->firstTid, but we did not update the firstTid in
_bitmap_union().

2. Furthermore, the logic for _bitmap_catchup_to_next_tid is not correct in
it should check the number of remaining words.

We need to revisit the fix.

Repro with ORCA:

CREATE TABLE public.sales (
    txn_id integer,
    qty integer,
    year integer
)
WITH (appendonly='true', compresslevel='5')
 DISTRIBUTED BY (txn_id) PARTITION BY RANGE(year)
          (
          START (2008) END (2009) EVERY (1) WITH (tablename='sales_1_prt_2', appendonly='true', compresslevel='5' ),
          START (2009) END (2010) EVERY (1) WITH (tablename='sales_1_prt_3', appendonly='true', compresslevel='5' ),
          START (2010) END (2011) EVERY (1) WITH (tablename='sales_1_prt_4', appendonly='true', compresslevel='5' ),
          START (2011) END (2012) EVERY (1) WITH (tablename='sales_1_prt_5', appendonly='true', compresslevel='5' ),
          START (2012) END (2013) EVERY (1) WITH (tablename='sales_1_prt_6', appendonly='true', compresslevel='5' ),
          START (2013) END (2014) EVERY (1) WITH (tablename='sales_1_prt_7', appendonly='true', compresslevel='5' ),
          START (2014) END (2015) EVERY (1) WITH (tablename='sales_1_prt_8', appendonly='true', compresslevel='5' ),
          START (2015) END (2016) EVERY (1) WITH (tablename='sales_1_prt_9', appendonly='true', compresslevel='5' ),
          DEFAULT PARTITION outlying_years  WITH (tablename='sales_1_prt_outlying_years', appendonly='true', compresslevel='5' )
          );
CREATE INDEX b1 ON public.sales USING bitmap (txn_id);
CREATE INDEX sales_indx ON public.sales USING bitmap (qty);
insert into sales SELECT generate_series (1,10000000), (random()*2000::int), 2013 FROM generate_series (1,1) AS x(n) ;
insert into sales SELECT generate_series (1,100000), (random()*2000::int), 2012 FROM generate_series (1,1) AS x(n) ;
update sales set txn_id = 1;

and the following query:

set optimizer=on;
set optimizer_enable_tablescan=off;
set optimizer_enable_dynamictablescan=off;
select count(1) from sales where txn_id =1 and qty>1998 and year = 2013;

Co-authored-by: Chris Hajas <chajas@vmware.com>
Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
Co-authored-by: Jimmy Yih <jyih@vmware.com>
Co-authored-by: Ekta Khanna <ekhanna@vmware.com>
Specifically, this blocks DECLARE ... WITH HOLD and firing of deferred
triggers within index expressions and materialized view queries.  An
attacker having permission to create non-temp objects in at least one
schema could execute arbitrary SQL functions under the identity of the
bootstrap superuser.  One can work around the vulnerability by disabling
autovacuum and not manually running ANALYZE, CLUSTER, REINDEX, CREATE
INDEX, VACUUM FULL, or REFRESH MATERIALIZED VIEW.  (Don't restore from
pg_dump, since it runs some of those commands.)  Plain VACUUM (without
FULL) is safe, and all commands are fine when a trusted user owns the
target object.  Performance may degrade quickly under this workaround,
however.  Back-patch to 9.5 (all supported versions).

Reviewed by Robert Haas.  Reported by Etienne Stalmans.

Security: CVE-2020-25695
(cherry picked from commit ff3de4c)
…r code

This is GPDB specific additional change and test to upstream fix for
"In security-restricted operations, block enqueue of at-commit user
code" in separate commit.

This commit adds GPDB specific logic to serialization and
deserialization code to dispatch security context from QD to
QEs. Otherwise we face a problem that QD's backend has security
context flags in static memory while QEs don't know anything about it.

Commit adds test to demonstrate CVE-2020-25695 on Greenplum. The idea
of CVE-2020-25695 is to make superuser fire a defered trigger with
security invoker function that executes some malicious code with
superuser privileges. We create a two-stap trap for a superuser
running a maintenance operation (ANALYZE) on a "trap" table. First of
all we create an index with stable function and replace it with
volatile one after creation. This index function is invoked by ANALYZE
and acts as a vulnerability firestarter - it inserts data to some
table aside ("executor"). "Executor" table has a defered insert
trigger with security invoker function that executes malicious code
under superuser. In a current example we give superuser to our
unprivileged user on segments (not on coordinator node). Current test
should fail and demonstrate that unprivileged user becomes superuser
on segments.

Co-authored-by: Denis Smirnov <sd@arenadata.io>
If an interactive psql session used \gset when querying a compromised
server, the attacker could execute arbitrary code as the operating
system account running psql.  Using a prefix not found among specially
treated variables, e.g. every lowercase string, precluded the attack.
Fix by issuing a warning and setting no variable for the column in
question.  Users wanting the old behavior can use a prefix and then a
meta-command like "\set HISTSIZE :prefix_HISTSIZE".  Back-patch to 9.5
(all supported versions).

Reviewed by Robert Haas.  Reported by Nick Cleaton.

Security: CVE-2020-25696
(cherry picked from commit 12fd81c)
While we were (mostly) careful about ensuring that the dimensions of
arrays aren't large enough to cause integer overflow, the lower bound
values were generally not checked.  This allows situations where
lower_bound + dimension overflows an integer.  It seems that that's
harmless so far as array reading is concerned, except that array
elements with subscripts notionally exceeding INT_MAX are inaccessible.
However, it confuses various array-assignment logic, resulting in a
potential for memory stomps.

Fix by adding checks that array lower bounds aren't large enough to
cause lower_bound + dimension to overflow.  (Note: this results in
disallowing cases where the last subscript position would be exactly
INT_MAX.  In principle we could probably allow that, but there's a lot
of code that computes lower_bound + dimension and would need adjustment.
It seems doubtful that it's worth the trouble/risk to allow it.)

Somewhat independently of that, array_set_element() was careless
about possible overflow when checking the subscript of a fixed-length
array, creating a different route to memory stomps.  Fix that too.

Security: CVE-2021-32027
(cherry picked from commit 0c1caa4)
def _get_segment_status(segment):
cmd = base.Command('pg_isready for segment',
PGDATABASE = os.getenv('pgdatabase')
if (PGDATABASE == 'None'):
Copy link
Collaborator

@Stolb27 Stolb27 Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this condition is always false. I have no idea, how this test may be passed.

PS. Because regexp in Then command should print "pg_isready -q -h .* -p .*" to stdout too wide

maksm90
maksm90 previously approved these changes Oct 11, 2021
…resent"

We implemented better solution in #123.
Moreover it contains mistakes and useless test.

This reverts commit da43f86.
@Stolb27
Copy link
Collaborator

Stolb27 commented Oct 11, 2021

  1. I've reverted da43f86 because we have own implementation in ADBDEV-869 #123. Moreover, this commit contains a mistake and useless test.
  2. There was conflict between 19ef688 (cherry pick (partial?) from bugfix branch) and 9f5b3ea
  3. Thhere were conflicts between ADB is affected to cve-2020-25695 (sandbox escape) #226 implemented by @darthunix and e5050d6 (spaces vs tabs). See 0248396

Commit e417327 fixed three valued
logic for IS DISTINCT FROM FALSE transformation of the LASHJ to HJ.

The problem was, that this commit created too general restriction
for the inner and outer sides of the LASJ: they had to be exactly
NOT NULL for any type of LASJ. But it is redundant for LASJ Not-In
case, as it returns nothing if the inner-outer tuple match status
is NULL.

The reason we have to care about this clarification is that we have
our own implementation of a preprocessing fix for NOT NULL in LASJ
filter (90afebc). It differs form
the Pivotal's realization (ee9dc5d)
and adds IS DISTINCT FROM FALSE for joins during transformation.
Before current patch some joins produced non-effective plans due
to e417327.
@Stolb27
Copy link
Collaborator

Stolb27 commented Oct 18, 2021

It seems that walreceiver test blinked on bbd42d3:

--- \/home\/gpadmin\/gpdb_src\/src\/test\/walrep\/expected\/walreceiver\.out	2021-10-18 01:11:03.034359576 +0000
+++ \/home\/gpadmin\/gpdb_src\/src\/test\/walrep\/results\/walreceiver\.out	2021-10-18 01:11:03.034359576 +0000
@@ -119,7 +119,7 @@
 SELECT test_receive();
  test_receive 
 --------------
- f
+ t
 (1 row)
 
 SELECT test_disconnect();

@Stolb27 Stolb27 merged commit 2d3b2e0 into adb-6.x Oct 18, 2021
@Stolb27 Stolb27 deleted the 6.18.0-sync branch October 18, 2021 08:51
hilltracer pushed a commit that referenced this pull request Mar 6, 2026
* Move PyGreSQL code to submodule

It would be nice to avoid patching this module. Also this patch fixes
Greengage installation scripts for PyGreSQL to support non-root
release builds over DESTDIR. It was a problem of Greengage, non PyGreSQL.
Additionally, include the new PyGreSQL license to the NOTICE file
Stolb27 added a commit that referenced this pull request Mar 10, 2026
The following points have been fixed:
1. PyGreSQL 5 has added support for converting additional data types.
Analyzedb: Converting datetime to a string for correct comparison with the value
saved in the file.
el8_migrate_localte.py, gparray.py, gpcatalog.py and gpcheckcat: using the Bool
type instead of comparing with a string.
gpcheckcat, repair_missing_extraneous.py and unique_index_violation_check:
using python list instead of string parsing.
2. PyGreSQL 5 added support for closing a connection when using the with
construct. Because of this, in a number of places, reading from the cursor took
place after the connection was closed.
3. PyGreSQL 5 does not end the transaction if an error occurs, which leads to a
possible connection leak if an error occurs in the connect function. So catch
errors that happen in the connect function.
4. Add closure of the connection saved in context after the scenario in behave
tests.
5. Add closure to the connection if it does not return from the function.
6. Use the python wrapper for the connect function instead of C one.
7. Use a custom cursor to disable row postprocessing to avoid correcting a large
amount of code.
8. Fix the bool and array format in isolation2 tests.
9. Add notifications processing to isolation2 tests.
10. Also fix the notifications processing in the resgroup_query_mem test.
11. Fix the notifications processing in gpload.
12. Fix pg_config search when building deb packages.
13. Fix gpexpand behave tests (#176) The previous commit added a few
    regressions. The regression was related to replacing the comparison
    condition of the comparison with 't' with a truth check. This change
    is due to the fact that in PyGreSQL 5, unlike the 4th version, it
    converts the bool values. But it was not taken into account that
    such values can be set in Python code. The error of calling verify
    in TestDML has also been fixed. The verify method was called without
    passing a connection, and although the verify implementation in the
    class itself does not require a connection, this function may be
    overloaded in a child class.
14. Fix PyGreSQL install to be compatible with both python versions
    (#183) PyGreSQL install works in Python 2 but breaks in Python 3
    because the _pg extension must be importable as a top-level module
    (e.g. from _pg import *). Python 3 resolves extension modules via
    sys.path, so _pg*.so has to be located at the sys.path root, not
    only inside the pygresql/ package directory. Move _pg*.so from
    pygresql directory to the top-level, so the same install layout
    works for both Python versions. Update _pg*.so RPATH to match its
    installed location so dpkg-shlibdeps can resolve libpq.so during
    Debian packaging.
15. Fix Python unit tests after PyGreSQL update (#222)
    - test_it_retries_the_connection: use mock object that support
      context managment
    - GpArrayTestCase: use bool type instead str 't'/'f'
    - GpCheckCatTestCase: check connection in DbWrapper.
    - DifferentialRecoveryClsTestCase and GpStopSmartModeTestCase: mock
      GgdbCursor to return connection.
    - RepairMissingExtraneousTestCase and
      UniqueIndexViolationCheckTestCase: use python arrays instead of
      string representation of Postgres arrays. Also fix seg ids set in
      get_segment_to_oid_mapping. Since seg ids in issues are now ints,
      we do not need to cast all_seg_ids array elements to strings.
16. Move PyGreSQL code to submodule (#269)
    It would be nice to avoid patching this module. Also this patch
    fixes Greengage installation scripts for PyGreSQL to support
    non-root release builds over DESTDIR. It was a problem of Greengage,
    non PyGreSQL. Additionally, include the new PyGreSQL license to the
    NOTICE file
17. Fix minirepro and gpsd utility for PyGreSQL-5.2.5 (#291)
    Both utils used outdated version of method pgdb.connect(). The patch
    changes the way pgdb.connect() is used by avoiding usage of
    parameter which later gets parsed. Instead both utils now use
    parameters of the same names.

Co-authored-by: Denis Garsh <d.garsh@arenadata.io>
Co-authored-by: Vasiliy Ivanov <ivi@arenadata.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.