Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6.27.1 sync #956

Merged
merged 155 commits into from
Jun 3, 2024
Merged

6.27.1 sync #956

merged 155 commits into from
Jun 3, 2024

Conversation

red1452
Copy link

@red1452 red1452 commented May 27, 2024

Conflicts:

New commit:

Reverted:

New commit:

  • fix panic when concurrent update with before row update trigger. commit fe2aa2f

Reverted:

New commit:

  • Keep partition child lock until end of select commit 4ac7227

Reverted:

New commit:

  • [6X Backport] Fix issue with WAL archiving in standby. commit 5e3e0f7

Reverted:

New commit:

  • Fix Orca to not generate unnecessary slices for plans with Subplans commit 6b7a830

Reverted:

New commit:

  • Add multi-thread compression for gpfdist in gpdb6 (#16914) commit 54a7f13

Reverted:

New commit:

  • Avoid replay dtx info in checkpoint for newly expanded segments (#17408) commit ab5a612

Reverted:

New commit:

  • Reject substituting extension schemas or owners matching ["$']. commit 482d4e8

Reverted:

liuxueyang and others added 30 commits January 23, 2024 13:51
The prove_installcheck recipe in src/Makefile.global.in was emitting
bogus paths for a couple of elements when used with PGXS. Here we create
a separate recipe for the PGXS case that does it correctly. We also take
the opportunity to make the file more readable by breaking up
the prove_installcheck and prove_check recipes across several lines, and
to remove the setting for REGRESS_SHLIB to src/test/recovery/Makefile,
which is the only set of tests that actually need it.

Backpatch to all live branches

Discussion: https://postgr.es/m/f2401388-936b-f4ef-a07c-a0bcc49b3300@dunslane.net

Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
ABI problems with the symbol ConfigureNamesInt_gp:

> Size of this global data has been changed from 18880 bytes to 19040 bytes.

The reason is that an additional GUC is added in this PR #16957 since 6.26.1

Currently, we can ignore this symbol in the ABI check-ignore list.

Similar PR: #16992
Backport of c270dae.

Previously, BitmapIndexScan create and free the bitmap memory. When
BitmapHeapScan retrieves all tuples, it will free the bitmapstate's
memory(tbmiterator, don't free bitmap), and the next relation's (of
the same dynamic scan) BitmapHeapScan will first rescan the child
node BitmapIndexScan which will free the bitmap memory.

But if QueryFinishPending is true when iteration over the bitmap, it
will return null without free the bitmapstate, and next bitmapheap
scan still need to do the rescan and free the bitmap memory. Finally
when ExecEndBitmapHeapScan free bitmap iterator will access to a freed
bitmap.

We don't have need to let BitmapIndexScan free bitmap memory, free
bitmap iterator and bitmap memory together.
…& agg(distinct).

GP makes optimization for distinct. When each agg is initially grouped, 
some sortstate need to be deduplicated which is belong to distinct , 
which can reduce the memory footprint of the sorted sortstate and
improve the performance of deduplicating in the later stage. However,
due to the mistake of the code, it may occur that when there are multiple
agg in the SQL, and one of them needs to be deduplicated, but more than
one agg needs to be sorted, all sortstate are identified as need to be
deduplicated. This will affect the results of non-distinct agg, because it is
mistakenly overweight, so from this point of view, this is actually a
general problem.

Fix Github Issue #17028

Authored-by: erchuan.ty <erchuan.ty@alibaba-inc.com>
Co-authored-by: Zhenghua Lyu <kainwen@gmail.com>
Backport of f03f27e and a42848c

Current behaviour: gprecoverseg -o outputs a config file containing the
rows for each failed segment. But skips to add row for segment if failed
segment is unreachable as it can't be recovered during the recovery
process.

Requested behavior: gprecoverseg -o should generate config file which
contains row for each failed segment from gp_segment_configuration even
if the failed segment host is unreachable.

To inform the user about failed segment entry with unreachable
hosts, the entry should be commented in the output config file.
gprecoverseg skips the recovery of a segment if the host is
unreachable so if the user wants to recover it to another host,
they can do so by uncommenting the line and adding the failover d
etails.

Fix: While creating the list of triplets (which will considered
for recovery) a triplet is not added to list if failed host is unreachble.
this triplet should not skipped if we just want to generate an output
config file. So added check for -o option with the reachablity check of
failed segment host. This way the triplet will be added to final list
even if failed host is unreachable.

Testing: Have added unit tests and a behave test
This is required for us to locate the Postgres configuration in workload
pipelines and append necessary environment settings, such as number of segments
and optional GUC's.
backport of ea553f6.
Note that f86af5e introduces `gpconntype` in `pg_conn`, but this
field won't be freed in `freePGconn`.

Authored-by: dreamedcheng <drec.wu@gmail.com>
…7017)

* fix gpload create staging table with right distributyion key order
…#17048)

* Revert "Queries on Distributed Replicated tables hangs when using optimizer"

This reverts commit cdd532f.

This commit is reverted for two reasons:

1. It causes certain queries involving CTE of replicated table to fall back.
2. It introduced a flag "allow enforcement" to Non-Singleton spec to prohibit an
optimization context, so the distribution request is abandoned. It was used to
indicate an illegitimate request, but the implementation results in ambiguity.

* Fix query hang / fallback if involving CTE of replicated

**Motivation**

This subject has been revisited on several occasions, as seen in the merged PR
13833, PR 14896, and closed PR 13728. However, sporadically, we still encounter
unexpected fallbacks. The hang in the CTE query is a result of a mismatch
between the number of CTE producers and consumers. This mismatch can occur when
there are either more producers than consumers or more consumers than producers,
leading to unconsumed producers or starved consumers that cause the query to
hang.

Historically, we have addressed query hangs by resorting to intentional
fallbacks. While ORCA falling back on the planner is not an ideal solution, it
is preferable to the query not completing. The intentional fallback was
implemented in PR 13833 by restricting the optimization of potential
alternatives that could result in a producer/consumer number mismatch.

In this PR, we reevaluate several test cases previously examined, aiming to find
a solution that avoids hanging or falling back.

**RCA**

The hang in a CTE query commonly arises when the CTE is of a replicated nature,
either tainted or strict. Such occurrences are less frequent if the CTE is hash
or randomly distributed. In the latter case, both producer and consumer
naturally execute on all segments. However, given a replicated input, producers
and consumers can choose to execute on either all segments and then deduplicate,
or, just on one segment. This flexibility results in potential mismatch between
the numbers of producers and consumers. This PR tackles the problem by sending
appropriate distribution requests on both the producer and consumer. Note that a
query hang is usually not a concern with universal input, despite the presence
of duplicate tuples. Universal producer guarantees universal data availability,
and the consumer execution operates based on necessity. This avoids any mismatch
between producer and consumer numbers.

**Implementation**

Historically, we initiate two sets of distribution requests from the physical
sequence operator to its two children. In the first request, we specify both
children to be `Non-Singleton`, aiming for the producer and consumer to execute
across all segments. In the second request, we request `Any` on the producer,
allowing it to retain its inherent distribution property. We align the
consumer's request accordingly. Specifically, if the producer is universal, the
consumer's locality can be `Any`. If the producer is replicated, we forego the
request, which occasionally results in fallbacks due to missing plan alternative
that satisfies all the required properties. In all the other scenarios, we
request `Non-Singleton`, mirroring the first request to spread out data
consumption. Note if the producer is a singleton, we always request the consumer
to be a matching singleton, either on the coordinator or one segment.

As mentioned earlier, query hang arises when dealing with replicated CTEs. In
response, we modify the first distribution request to be `Non-Singleton
Non-Replicated`. This request exclusively handles non-replicated input, where
both the producer and consumer naturally execute across all segments. For
replicated input, a hash filter is applied to enforce the producer into a random
distribution, allowing the consumer to still execute across all segments. In the
second request, we maintain the request for the producer to be `Any` to preserve
its original property. If the original property is replicated, we request the
consumer to be replicated as well. This avoids abandoning the optimization
context.

It's important to clarify that the terms "producer" and "consumer" in the above
requests pertain to the producer and consumer **sides** of the physical sequence
operator. The requests don't directly impact the actual producer, i.e., the
shared scan, or the actual consumer, i.e., consumption of the shared scan.
Therefore, the updated requests don't eliminate the possibility of a number
mismatch between the producer and consumer. However, it does significantly
minimize the likelihood. This is because the plan with the minimal cost is
unlikely to incorporate motions that alter the locality of the actual producer
or consumer from that is requested on their respective sides in an asymmetric
manner that would cause number mismatch.

Interestingly, the physical sequence operator receives a third distribution
request originating from the CTE consumer. This bottom-up request stands out as
an exception in ORCA's top-down optimization framework. The methodology is
documented in this [VLDB paper, section
7.1](https://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf), with the goal of
minimizing unnecessary motions, hash redistributionin particular. It was
implemented by PR 11904. What occurs is the casting of the consumer's derived
property (post partial optimization) as a request back onto the producer.

Currently, this request does not specifically address a replicated CTE input.
However, based on the analysis presented earlier, it becomes evident that this
bottom-up request should not override the producer's duplicate sensitivity.
Specifically, if the CTE starts as replicated but is then converted to random by
the hash filter, we should not allow a duplicate-insensitive request to be sent
to the producer, as it may lead to duplicate hazards and query hang. In this PR,
we assess the derived duplicate sensitivity of the producer and consumer,
preventing mismatching requests.

**Tests**

This PR reverts commit b350160, which solved the query hang by intentional fall
back, while retaining the tests introduced by that commit. Now we no longer
encounter query hang or fallback.

The MDP test, NLJ-Rewindability-CTAS.mdp, reveals a plan diff in which the
duplicate sensitive random distribution request is no more sent to the physical
filter on the CTE consumer side. Pushing a duplicate sensitive random motion
below the physical filter is a conservative maneuver used to handle the scenario
where a CTE is inlined by a filter. In this context, one CTE is referenced
within another CTE, and the results of the first CTE contribute to the filtering
criteria in the second CTE.  While the physical filter is unaware of whether it
references an inlined CTE, the above restriction on replicated distribution
eliminates the possibility of costly query hangs. In this case, the cost model
determines it's less expensive to enforce the duplicate sensitive random
property after the CTE consumption instead of before.

(cherry picked from commit 3e37d72)

---------

Co-authored-by: Jingyu Wang <wjingyu@vmware.com>
Introduce a new GUC gp_keep_partition_children_locks that preserves
relation locks on AO partition leaves, that are taken during planning.

This fixes concurrency issues such as #16545, where a SELECT pre-dating
a concurrent lazy vacuum can read past eof of a segfile, since the
vacuum would "drop" the segment. Adequate locking during planning is one
way to ensure that vacuumStatement_Relation() can't acquire the
AccessExclusiveLock on the coordinator. This would cause it to give up
on the drop, and the concurrent SELECT isn't compromised.

Unfortunately, we can't quite mark this GUC on by default, as it has
consequences on the amount of locks consumed:

Setting this GUC on means that we keep the acquired locks in
expand_inherited_rtentry() around longer now, up until the end of the
transaction. This can have consequences for transactions dealing with a
large number of partitions, leading us to inch closer to the
max_locks_per_transaction limit. Joins/unions etc with large partition
tables now consume more locks than before.

Notes:
(0) This conforms locking behavior to upstream behavior, that we have
in 7X (for append-optimized relations).

(1) For ORCA, we take the locks during DXL to planned statement
translation, borrowing from expand_inherited_rtentry().

(2) Unlike 7X, neither ORCA nor planner considers the statically pruned
list of partitions for locking.

(3) Currently these locks are taken only for append-optimized leaves,
but the GUC can easily be extended to cover heap tables as well, as
similar concurrency issues do exist.

Co-authored-by: Divyesh Vanjare <vanjared@vmware.com>
Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
Backported from 0ce4c65 with minor
conflicts in test output (DECLARE vs DECLARE CURSOR etc). Original
commit message follows:

This test exposes accounting for locks when we have multiple active
portals in the same session. It also shows that we display the
session_user in the gp_locks_on_resqueue view, even though we use the
current_user to grab the resource queue lock.

Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 37bb438 with minor conflict in the schedule file and
test output (CREATE ROLE vs CREATE etc).

Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from a78e33d with minor conflict in isolation2 schedule
and test output (CREATE QUEUE vs CREATE etc).

Original commit message follows:

Some of the behavior for ALTER RESOURCE QUEUE with respect to holders
and waiters can be somewhat non-intuitive. So, add a test to document it
and add coverage.

Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 2365410 with minor conflict in isolation2 schedule
and in test output (DECLARE CURSOR vs DECLARE etc).

It is possible to leak a slot in a resource queue in the face of
concurrent self-deadlock, query cancellation/deadlock and an ALTER
QUEUE operation that bumps the resource queue limit. This can be
demonstrated with a retail build (asserts-build trips various
assertions):

1. Session 1: Has tripped the self-deadlock check while trying to lock
say portal C2, and is currently ERRORing out (it has already been
granted the extra grant that follows a positive self-deadlock check). We
haven't reached cleanup yet.

2. Session 2: Is waiting on the same resource queue as Session 1. We
then induce a query cancel and let's say Session 2 is in the middle of
ERRORing out and is at the very top of ResLockWaitCancel().

3. Session 3: Bumps the queue limit with ALTER such that portal C2 from
Session 1 will now fit into the queue.

4. Session 2: Resumes in ResLockWaitCancel() and encounters Session1/C2
by looking at lock->waitProcs and proc->waitPortalId in:
ResLockWaitCancel()->ResRemoveFromWaitQueue()->ResCleanupLock()->ResProcLockRemoveSelfAndWakeup()
It then does an extra grant and limit update for C2, before completing.

5. Session 1: Now proceeds and cleans up the extra grant for C2 from
Step 1 and errors out.

6. Running gp_resqueue_status shows that there is still 1 active
statement. This is the leak.

Observation:
The extra grant and limit update in Step 4 is never cleaned up, which
causes the leak. Note that this extra grant and limit update could also
have been caused by a regular deadlock check (instead of query cancellation)

The main reason why we have the leak is other backends can find the
self-deadlocking process/portal in LOCK->waitProcs and
PGPROC->waitPortalId.

There is no real need to start waiting before conducting the
self-deadlock check. Thus, we now do the self-deadlock check much
earlier.

Reviewed-by: Ashwin Agrawal <aashwin@vmware.com>
Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from e8bae53 with minor conflicts in regress/init_file
and a LOG vs DEBUG1 conflict in ResLockRelease's "proclock not held"
message.

Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 8a8a203 with minor conflict in isolation2 schedule
and minor conflicts in test output (CREATE QUEUE vs CREATE etc).

Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 3d981ee with very minor conflicts in code and in
test output (DECLARE CURSOR vs DECLARE etc).

Original commit message follows:

We fix a leak of a resouce queue slot that can be caused by the
following sequence of events:

1. A backend waiting on a resource queue lock receives the TERM signal
and is just about to perform cleanup in:
AbortOutOfAnyTransaction() -> .. -> ResLockWaitCancel()

2. Another backend releases its position in the queue, finds the 1st
backend in the wait queue, evicts it and grants it a slot in the
resource queue.

3. ResLockWaitCancel() resumes for the 1st backend, doesn't find itself
on the wait queue and is a no-op. Thereafter, the backend finishes
termination without cleaning up it's increment set, causing the leak.

Note:
Termination is special in the sense that we don't call ResLockRelease()
in the PG_CATCH() in ResLockPortal()/ResLockUtilityPortal(), as we do
for other exceptional happenstances such as cancellation/deadlock. This is
because we can't catch FATAL errors (like the one that arises from
termination). We never get to clean up the external grant in this case.

To solve this issue, we clean up such dangling increment sets/locks during
PortalDrop() and in AtExitCleanup_ResPortals().

PS: The fact that we have to recover in this fashion is due to our
TRY/CATCH cleanup workflow, something that upstream code advises against
in WaitOnLock(). Fixing that is non-trivial and is another matter.

Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Actually use the return value of ResLockRelease()!

Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
(cherry picked from commit 1ad11b8)
Backported from 5bcfe1b with very minor conflicts.

Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 168bcb0 with very minor conflicts.

Reviewed-by: Xuejing Zhao <zxuejing@vmware.com>
Reviewed-by: Jimmy Yih <jyih@vmware.com>
Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
This is a backport of 12e28aa with the
file conflict resolved, and the is_set output preserved. It is safe to
PANIC here, as ERRORing out here in InitProcess() -> OwnLatch() leads to
an elevated FATAL, and since
PMSignalState->PMChildFlags[slot] = PM_CHILD_ACTIVE at this time, the
postmaster considers this failure to warrant a reset
(ReleasePostmasterChildSlot() returns false and it calls
HandleChildCrash()). We now get the benefit of a core file, which we
didn't get before, as a result of this change.

Original commit message follows:

Build farm animal gharial recently failed a few times in a parallel
worker's call to OwnLatch() with "ERROR:  latch already owned".  Let's
turn that into a PANIC and show the PID of the owner, to try to learn
more.

Discussion: https://postgr.es/m/CA%2BhUKGJ_0RGcr7oUNzcHdn7zHqHSB_wLSd3JyS2YC_DYB%2B-V%3Dg%40mail.gmail.com
Backported from 0709b7e with minor
conflict that arose due to the absence of sparcv8 as a platform.

Original commit message follows:

Previously, they functioned as barriers against CPU reordering but not
compiler reordering, an odd API that required extensive use of volatile
everywhere that spinlocks are used.  That's error-prone and has negative
implications for performance, so change it.

In theory, this makes it safe to remove many of the uses of volatile
that we currently have in our code base, but we may find that there are
some bugs in this effort when we do.  In the long run, though, this
should make for much more maintainable code.

Patch by me.  Review by Andres Freund.
Buildfarm member castoroides is unhappy with this, for entirely
understandable reasons.
Yet another silly mistake in 0709b7e,
again found by buildfarm member castoroides.
Buildfarm member anole caught this one.
Failing to do so can lead to a race condition between an exiting backend
(Y) and another backend (X) undergoing init. If the act of returning the PGPROC
to the freelist for Y gets reordered before the store for procLatch->pid
= 0, then there is a chance that X can grab the same PGPROC object from
the freelist and can still see that procLatch->pid = Y. This can lead to
the PANIC inside OwnLatch(). This can only happen if X's OwnLatch() is
scheduled before Y's store for pid = 0.

Using a memory barrier in DisownLatch() is sufficient to avoid this
issue. Technically, the SpinLockAcquire() and Release() that succeeds
the DisownLatch() call should act as compiler barriers and should
suffice on x86, we still take this precautionary measure to be doubly
sure. The memory barrier inside OwnLatch() is an extension of that
paranoia. Since OwnLatch() and DisownLatch() aren't in hot code paths,
the added memory barriers should not be a performance sink.

Co-authored-by: Brent Doil <bdoil@vmware.com>
Co-authored-by: Jimmy Yih <jyih@vmware.com>
Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
Issue: gpexpand --help was showing wrong for -n option

-n <parallel_processes>
The number of tables to redistribute simultaneously. Valid values
are 1 - 16. Each table redistribution process requires two database
connections: one to alter the table, and another to update the table's
status in the expansion schema. Before increasing -n, check the current
value of the server configuration parameter max_connections and make
sure the maximum connection limit is not exceeded.
instead of 1 - 16 it should be 1 - 96 as per the code MAX_PARALLEL_EXPANDS = 96

Fix: updated max range value in the help file.
We have a new release tag for
[6.26.3](https://github.com/greenplum-db/gpdb/releases/tag/6.26.3) that
doesn't have the commit [4ac7227](
greenplum-db/gpdb@4ac7227)
Adding the symbol that was added as part of the above mentioned commit to be ignored for
6.26.3 release to avoid abi check failures for new PR's on 6X.
red1452 and others added 22 commits June 3, 2024 16:51
to solve conflicts with 482d4e8

This reverts commit 10ec9c7.
Unlike TOAST tables, AO aux tables are linked to extension during its creation.
When table is dropped from extension using ALTER, aux tables are left as
extension dependencies. It means we can't DROP a table after that, because AO
aux tables are still used by extension. In other words, this behavior makes
impossible to DROP AO table that was created by extension without extension
DROP.

The goal of the patch is to change this behavior, allowing to DROP a table.
Unnecessary dependencies are now not created in heap_create_with_catalog(). The
behavior is the same as for TOAST tables, which are aux tables too. The
dependency created in CreateAOAuxiliaryTable() is enough.

The patch contains a test that describes target behavior.

This is a backport of dfe49e3. The test was
changed as 6x has no test_ext* extensions. To achieve the same result, new test
extension was made.

(cherry picked from commit 10ec9c7)
This commit is a cherry-pick of 6b7e4c3.
In summary `gpexpand redistribution` earlier used to error out when the table
would have been already expanded manually by the user or the table would have
been damaged. This commit changes this behaviour so that any expanded/broken
tables are skipped without affecting the redistribution for other tables.

Error message was changed due two difference in messages of 6x and 7x.
execSingleton was changed to execSQLForSingleton in mgmt_utils.py to make it
work on 6x. Also, gpexpand.feature test was brought to a common standard by
adding 'rm -rf ~/gpAdminLogs/gpinitsystem*'. Without this line a failing
test would cause other tests to fail.

Co-authored-by: Dmitry Smal <mialinx@yandex-team.ru>
Co-authored-by: Nihal Jain <jnihal@vmware.com>
(cherry picked from commit cfb97d1)
Currently, gpfdist transformation can only be applied to table source on
external table level. It can be done by specifying #transform tag. Sometimes it
may be not handy to specify the same transformation for all source files. The
goal of the patch is to implement new flags with the default transformation name
for input and output transformations. The default transformation(s) can be set
at gpfdist start and applied only if table level transformation is not
specified.

New flags are a subject of only basic check. The idea was to keep the same
behavior and check transformation existence only at runtime.
The test shows a behavior described in patch goal.

(cherry picked from commit 2f6e3ef)
We assume that plan before hazard-motion transformation was validated
(such check exists) and CTE producers/consumers are consistent. The
next cases may appear after hazard-motion transformation:
1. all slices, which contains shared scans, was transformed
2. the slices, which contains CTE producers, and some CTE consumers
   (the plan may contains other slice with consumers of these
   producers) was transformed
3. the slice, which contains only producer/s, was transformed
4. the slice, which contains only consumer/s, was transformed
5. the slice, without CTE producers/consumers, was transformed

This patch is a follow-up for #302. Now detection of unpaired CTE's is
performed on the current slice (current motion) instead of recursing
into it's children slices (motions), also validation of unpaired CTE
was changed: previous approach collected all CTE consumers into array
and CTE producers was collected into set at the end each element of CTE
producer array was probed at producers hash-set (this may lead to a
false-negative cases when the slice contains two different  producers,
but only a single consumer). Now the sets of CTE consumers and
producers compares.

Current approach correctly validates 3, 4, 5 cases, while 1 case may
give a false-positive result. The 2 case maybe dangerous: there is a
possible situation, when the hazard-motion (slice) contains a CTE
producer and it's consumer, but the plan may contain other unmodified
motion (slice) with the CTE consumer of the producer from modified
slice, thus this approach won't reconnize that there are unpaired CTE
producers and consumers, because it checks only the consistency of
producers/consumers within the hazzard motion (one slice), on the other
side, this situation is possible if the replicated distribution
requested for the Sequence node, but due to this patch
(https://github.com/greenplum-db/gpdb/pull/15124) such situation
shouldn't appear.

Follow-up for #302 (51fe92e)

Cherry-picked-from: a5b5c04
to reapply above 9e03478

(cherry picked from commit 5a10d0a)
(cherry picked from commit 54ee058)
Geenplum python scrips try to parse system command’s STDOUT and STDERR
in English and may fail if locale is different from en_US. This patch
adds tests that cover this case

Also, this patch reinstalls glibc-common in docker container. This is
necessary to get langpacks in docker because docker images don't contain
them.

Cherry-picked-from: 6298e77
(cherry picked from commit fc2aade)
(cherry picked from commit a3a94c5)
…es_now_fast external table (#636)

gpperfmon writes a string to queries_now.dat in the expectation that it will be
processed by gpmon_catqrynow.py, but the queries_now_fast external table reads
from the file directly. This leads to an error, because queries_now.dat contains
the query_hash column that is not declared in the queries_now_fast table. Also
queries_now.dat contains empty columns at the end that should be filled by
python script. These columns are not declared in the table too.

This patch fixes this issue by removing extra columns from queries_now.dat and
adding them directly in the python script and in the write_qlog_full (write into
queries_tail.dat) function.

Co-authored-by: Andrey Sokolov <a.sokolov@arenadata.io>
(cherry picked from commit 742bf6f)
Missing DROP echo was added to the end of the
src/test/isolation2/expected/concurrent_select_vacuum_ao_partitioned_table.out
src/test/isolation2/expected/concurrent_select_vacuum_ao_partitioned_table_optimizer.out
files.

Initially these changes were incorporated in a897532
…#295)

Commit fa287d0 had introduced early lock releasing on partitions within
operations with partitioned parent table. This was done to save memory
in lock manager I suppose. But such behavior incurs some issues, e.g.,
possible segfault when we perform drop of partitioned table and
concurrent `pg_total_relation_size()` call on some AO partition.

Obviously, we have to restore full locking for inheritance hierarchy as
it's done in vanilla PostgreSQL. To eliminate exhaustion of lock memory
we need to increase max_locks_per_transactions setting. Memory is cheap,
and if we're working with tens of thousands of partitions, we can afford
reserving some memory for the lock manager.

Partially for DML operations the problem of partitions locking was
resolved in commit 99fbc8e. Therefore in current work we have to complete
that fix to support partitions locking for DDL operations and other cases.
And it's in fact backport of similar commit to master 86ded36 with small
additions. Also there have been added test scenario to reproduce noted
above segfault case and related assert check inside
`calculate_table_size()` function.

Backport of PR https://github.com/greenplum-db/gpdb/pull/10468.
Fixes https://github.com/greenplum-db/gpdb/issues/12962

-------

Changes from 4ac7227 in src/backend/optimizer/prep/prepunion.c were overwritten
with changes from this commit.

(cherry-picked from commit 2763ff1)

(cherry picked from commit 74fb2b3)
In a897532 the new GUC
gp_keep_partition_children_locks was introduced. This setting made ORCA take
locks on child AO partitions during planning of SELECT statement. This behaviour
secures the concurrent execution of SELECT with root partition from other
command like VACUUM modifying leaf AO partition. In this case without locks at
QD the SELECT could dispatch the invalid snapshot for accessing AO table
metadata, because the VACUUM could already commit and break the leaf partition
access for SELECT on certain segment. The GUC is disabled by default on the
grounds that taking locks on all the partitions (the mentioned commit takes
locks on whole partition hierarchy in ORCA) can exhaust the available
lock amount and waste memory of lock manager. The commit above changes the
planner's behaviour in the same way, but during of its backport the planner's
previous behaviour remained unchanged (74fb2b3)
Planner takes locks unconditionally on all partitions of various storage types.

The described commit challenges mainly the issue with AO tables and concurrent
VACUUM. However, similar issues for heap tables exist. Consider the scenario
when SELECT from root partition and TRUNCATE on leaf partition are performed
concurrently. And without locks on leaf partitions at QD SELECT could return
data of leaf partition only from one segment, because at other segments TRUNCATE
had succeed to truncate the relation files before the SELECT started to scan
those files of leaf partition. And at first segment TRUNCATE went after the
SELECT, resulting in broken atomicity of SELECT.

Therefore, this patch addresses the issue about concurrency for SELECT in a more
generalized way. First of all, the GUC is enabled now by default, because, when
VACUUM causes SELECT to throw just an error and do not return incorrect results,
TRUNCATE can break SELECT's atomicity and return unexpected data, what could
lead to inconsistent behaviour. Secondly, the locks are acquired on both AO and
heap leaf partitions, protecting the SELECT execution from any destructive
operations. Additionally, in order to reduce the amount of locks, this patch
makes ORCA take locks only for statically selected partitions when possible.
In other cases ORCA behaves like planner, which takes locks on all partitions.
The locktype is defined in EvalLockModeForChildRelations() and
expand_inherited_rtentry() respectively.

(cherry picked from commit c336098)
…es to ORCA (#746)

Problem description: ORCA fails to handle queries with columns in the target
list, not listed in GROUP BY clause, if grouping is done by a primary key
(meaning that these columns have a functional dependency on grouping columns).
In such a case, it falls back to the standard planner. Expected behavior - ORCA
generates a plan for a query with columns in target list, if these columns have
a functional dependency on columns in GROUP BY list.

Root cause: Function CQueryMutators::ShouldFallback during query normalization
finds out that there is a target list entry, that is not a grouping column, and
it triggers fallback to the standard planner.

Fix: All columns from the target list with functional dependency on grouping
columns are added explicitly to group by clause at the query normalization stage
(at start of groupby normalization, before checking if fallback is required)
before the translation to DXL. It requires the following steps:
1) Extract such columns from target list expressions, and if they do not have a
relevant target list entry, add resjunk target list entries for them.
2) Store current unique target list entry references in groupClause - they will
be required at step 4.
3) Update all grouping sets that contain the primary key - add the functionally
dependent columns from the target list.
4) Update arguments of GROUPING functions (based on target list entry references
stored on step 2 and updated target list entry references), because they could
be shifted after step 3.

Plus, new test cases were added to the functional_deps test.

-------

An #include clause was moved in src/include/gpopt/gpdbwrappers.h to resolve
conflicts.

(cherry-picked from commit b1373f8)

(cherry picked from commit 89430f7)
Otherwise the detection can spuriously detect symbol as available,
because the compiler may just emits reference to non-existant symbol.

(cherry picked from commit 85abb5b)
On some systems the results of 64 bit __builtin_mul_overflow()
operations can be computed at compile time, but not at runtime. The
known cases are arm buildfar animals using clang where the runtime
operation is implemented in a unavailable function.

Try to avoid compile-time computation by using volatile arguments to
__builtin_mul_overflow(). In that case we hopefully will get a link
error when unavailable, similar to what buildfarm animals dangomushi
and gull are reporting.

Author: Andres Freund
Discussion: https://postgr.es/m/20171213213754.pydkyjs6bt2hvsdb@alap3.anarazel.de
(cherry picked from commit c04d35f)
Commit c04d35f didn't quite do the job here, because it still allowed
the compiler to deduce that the function call could be optimized away.
Prevent that by putting the arguments and results in global variables.

Discussion: https://postgr.es/m/20171213213754.pydkyjs6bt2hvsdb@alap3.anarazel.de
(cherry picked from commit c6d21d5)
At commit 0cb79ba dumping datadir was added
support to include datadir in segment configuration dump. At the unit test
the gpsegconfig_dump was changed according to this commit.

This patch adds datadir path to each segment configuration at the
gpsegconfig_dump file.
commit 708502f enables GUC
optimizer_trace_fallback at the test bfv_cte. It leads to printing error message
at Orca for query:
create table rep1 (id bigserial not null, isc varchar(15) not null,iscd
varchar(15) null) distributed replicated;
create table rep2 (id numeric null, rc varchar(255) null,ri numeric null)
distributed replicated;
insert into rep1 (isc,iscd) values ('cmn_bin_yes', 'cmn_bin_yes');

Error message looks:
INFO:  GPORCA failed to produce a plan, falling back to planner
DETAIL:  Feature not supported: Volatile functions with replicated relations

Orca never produces a plan for this query because of e749b2d, 
but error message is printed only if GUC optimizer_trace_fallback is enabled.

This patch adds missed error message.
In 'always' mode, the standby independently archives all files it receives
from the primary.

Original patch by Fujii Masao, docs and review by me.

(cherry picked from commit fa759d1)

Resolved conflicts:
In 5e3e0f7 close logic was moved to a new
function: `XLogWalRcvClose()`. As upstream has no archive_mode `always` in 6x, the
branch for `XLogArchiveMode != ARCHIVE_MODE_ALWAYS` was removed by them during
back-port. It was restored later in 4f2db19,
but in a modified way according to commit needs. According to our needs,
`XLogArchiveNotify()` should be called for `ARCHIVE_MODE_ALWAYS` too. The
condition was slightly reworked to back-port current patch. The same was done
for changes in `WalReceiverMain()`.
A restart point or a checkpoint recycling WAL segments treats segments
marked with neither ".done" (archiving is done) or ".ready" (segment is
ready to be archived) in archive_status the same way for archive_mode
being "on" or "always".  While for a primary this is fine, a standby
running a restart point with archive_mode = on would try to mark such a
segment as ready for archiving, which is something that will never
happen except after the standby is promoted.

Note that this problem applies only to WAL segments coming from the
local pg_wal the first time archive recovery is run.  Segments part of a
self-contained base backup are the most common case where this could
happen, however even in this case normally the .done markers would be
most likely part of the backup.  Segments recovered from an archive are
marked as .ready or .done by the startup process, and segments finished
streaming are marked as such by the WAL receiver, so they are handled
already.

Reported-by: Haruka Takatsuka
Author: Michael Paquier
Discussion: https://postgr.es/m/15402-a453c90ed4cf88b2@postgresql.org
Backpatch-through: 9.5, where archive_mode = always has been added.

(cherry picked from commit 62ab24a)
…very.

After the archiver dies, postmaster tries to start a new one immediately.
But previously this could happen only while server was running normally
even though archiving was enabled always (i.e., archive_mode was set to
always). So the archiver running during recovery could not restart soon
after it died. This is an oversight in commit ffd3774.

This commit changes reaper(), postmaster's signal handler to cleanup
after a child process dies, so that it tries to a new archiver even during
recovery if necessary.

Patch by me. Review by Alvaro Herrera.

(cherry picked from commit 71a4062)
@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/72189

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1471305

@Stolb27 Stolb27 enabled auto-merge June 3, 2024 18:12
@Stolb27 Stolb27 merged commit baa5f69 into adb-6.x Jun 3, 2024
5 checks passed
@Stolb27 Stolb27 deleted the 6.27.1-sync branch June 3, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet