forked from greenplum-db/gpdb-archive
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6.27.1 sync #956
Merged
Merged
6.27.1 sync #956
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The prove_installcheck recipe in src/Makefile.global.in was emitting bogus paths for a couple of elements when used with PGXS. Here we create a separate recipe for the PGXS case that does it correctly. We also take the opportunity to make the file more readable by breaking up the prove_installcheck and prove_check recipes across several lines, and to remove the setting for REGRESS_SHLIB to src/test/recovery/Makefile, which is the only set of tests that actually need it. Backpatch to all live branches Discussion: https://postgr.es/m/f2401388-936b-f4ef-a07c-a0bcc49b3300@dunslane.net Co-authored-by: Andrew Dunstan <andrew@dunslane.net>
ABI problems with the symbol ConfigureNamesInt_gp: > Size of this global data has been changed from 18880 bytes to 19040 bytes. The reason is that an additional GUC is added in this PR #16957 since 6.26.1 Currently, we can ignore this symbol in the ABI check-ignore list. Similar PR: #16992
Backport of c270dae. Previously, BitmapIndexScan create and free the bitmap memory. When BitmapHeapScan retrieves all tuples, it will free the bitmapstate's memory(tbmiterator, don't free bitmap), and the next relation's (of the same dynamic scan) BitmapHeapScan will first rescan the child node BitmapIndexScan which will free the bitmap memory. But if QueryFinishPending is true when iteration over the bitmap, it will return null without free the bitmapstate, and next bitmapheap scan still need to do the rescan and free the bitmap memory. Finally when ExecEndBitmapHeapScan free bitmap iterator will access to a freed bitmap. We don't have need to let BitmapIndexScan free bitmap memory, free bitmap iterator and bitmap memory together.
…& agg(distinct). GP makes optimization for distinct. When each agg is initially grouped, some sortstate need to be deduplicated which is belong to distinct , which can reduce the memory footprint of the sorted sortstate and improve the performance of deduplicating in the later stage. However, due to the mistake of the code, it may occur that when there are multiple agg in the SQL, and one of them needs to be deduplicated, but more than one agg needs to be sorted, all sortstate are identified as need to be deduplicated. This will affect the results of non-distinct agg, because it is mistakenly overweight, so from this point of view, this is actually a general problem. Fix Github Issue #17028 Authored-by: erchuan.ty <erchuan.ty@alibaba-inc.com> Co-authored-by: Zhenghua Lyu <kainwen@gmail.com>
Backport of f03f27e and a42848c Current behaviour: gprecoverseg -o outputs a config file containing the rows for each failed segment. But skips to add row for segment if failed segment is unreachable as it can't be recovered during the recovery process. Requested behavior: gprecoverseg -o should generate config file which contains row for each failed segment from gp_segment_configuration even if the failed segment host is unreachable. To inform the user about failed segment entry with unreachable hosts, the entry should be commented in the output config file. gprecoverseg skips the recovery of a segment if the host is unreachable so if the user wants to recover it to another host, they can do so by uncommenting the line and adding the failover d etails. Fix: While creating the list of triplets (which will considered for recovery) a triplet is not added to list if failed host is unreachble. this triplet should not skipped if we just want to generate an output config file. So added check for -o option with the reachablity check of failed segment host. This way the triplet will be added to final list even if failed host is unreachable. Testing: Have added unit tests and a behave test
This is required for us to locate the Postgres configuration in workload pipelines and append necessary environment settings, such as number of segments and optional GUC's.
…7017) * fix gpload create staging table with right distributyion key order
…#17048) * Revert "Queries on Distributed Replicated tables hangs when using optimizer" This reverts commit cdd532f. This commit is reverted for two reasons: 1. It causes certain queries involving CTE of replicated table to fall back. 2. It introduced a flag "allow enforcement" to Non-Singleton spec to prohibit an optimization context, so the distribution request is abandoned. It was used to indicate an illegitimate request, but the implementation results in ambiguity. * Fix query hang / fallback if involving CTE of replicated **Motivation** This subject has been revisited on several occasions, as seen in the merged PR 13833, PR 14896, and closed PR 13728. However, sporadically, we still encounter unexpected fallbacks. The hang in the CTE query is a result of a mismatch between the number of CTE producers and consumers. This mismatch can occur when there are either more producers than consumers or more consumers than producers, leading to unconsumed producers or starved consumers that cause the query to hang. Historically, we have addressed query hangs by resorting to intentional fallbacks. While ORCA falling back on the planner is not an ideal solution, it is preferable to the query not completing. The intentional fallback was implemented in PR 13833 by restricting the optimization of potential alternatives that could result in a producer/consumer number mismatch. In this PR, we reevaluate several test cases previously examined, aiming to find a solution that avoids hanging or falling back. **RCA** The hang in a CTE query commonly arises when the CTE is of a replicated nature, either tainted or strict. Such occurrences are less frequent if the CTE is hash or randomly distributed. In the latter case, both producer and consumer naturally execute on all segments. However, given a replicated input, producers and consumers can choose to execute on either all segments and then deduplicate, or, just on one segment. This flexibility results in potential mismatch between the numbers of producers and consumers. This PR tackles the problem by sending appropriate distribution requests on both the producer and consumer. Note that a query hang is usually not a concern with universal input, despite the presence of duplicate tuples. Universal producer guarantees universal data availability, and the consumer execution operates based on necessity. This avoids any mismatch between producer and consumer numbers. **Implementation** Historically, we initiate two sets of distribution requests from the physical sequence operator to its two children. In the first request, we specify both children to be `Non-Singleton`, aiming for the producer and consumer to execute across all segments. In the second request, we request `Any` on the producer, allowing it to retain its inherent distribution property. We align the consumer's request accordingly. Specifically, if the producer is universal, the consumer's locality can be `Any`. If the producer is replicated, we forego the request, which occasionally results in fallbacks due to missing plan alternative that satisfies all the required properties. In all the other scenarios, we request `Non-Singleton`, mirroring the first request to spread out data consumption. Note if the producer is a singleton, we always request the consumer to be a matching singleton, either on the coordinator or one segment. As mentioned earlier, query hang arises when dealing with replicated CTEs. In response, we modify the first distribution request to be `Non-Singleton Non-Replicated`. This request exclusively handles non-replicated input, where both the producer and consumer naturally execute across all segments. For replicated input, a hash filter is applied to enforce the producer into a random distribution, allowing the consumer to still execute across all segments. In the second request, we maintain the request for the producer to be `Any` to preserve its original property. If the original property is replicated, we request the consumer to be replicated as well. This avoids abandoning the optimization context. It's important to clarify that the terms "producer" and "consumer" in the above requests pertain to the producer and consumer **sides** of the physical sequence operator. The requests don't directly impact the actual producer, i.e., the shared scan, or the actual consumer, i.e., consumption of the shared scan. Therefore, the updated requests don't eliminate the possibility of a number mismatch between the producer and consumer. However, it does significantly minimize the likelihood. This is because the plan with the minimal cost is unlikely to incorporate motions that alter the locality of the actual producer or consumer from that is requested on their respective sides in an asymmetric manner that would cause number mismatch. Interestingly, the physical sequence operator receives a third distribution request originating from the CTE consumer. This bottom-up request stands out as an exception in ORCA's top-down optimization framework. The methodology is documented in this [VLDB paper, section 7.1](https://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf), with the goal of minimizing unnecessary motions, hash redistributionin particular. It was implemented by PR 11904. What occurs is the casting of the consumer's derived property (post partial optimization) as a request back onto the producer. Currently, this request does not specifically address a replicated CTE input. However, based on the analysis presented earlier, it becomes evident that this bottom-up request should not override the producer's duplicate sensitivity. Specifically, if the CTE starts as replicated but is then converted to random by the hash filter, we should not allow a duplicate-insensitive request to be sent to the producer, as it may lead to duplicate hazards and query hang. In this PR, we assess the derived duplicate sensitivity of the producer and consumer, preventing mismatching requests. **Tests** This PR reverts commit b350160, which solved the query hang by intentional fall back, while retaining the tests introduced by that commit. Now we no longer encounter query hang or fallback. The MDP test, NLJ-Rewindability-CTAS.mdp, reveals a plan diff in which the duplicate sensitive random distribution request is no more sent to the physical filter on the CTE consumer side. Pushing a duplicate sensitive random motion below the physical filter is a conservative maneuver used to handle the scenario where a CTE is inlined by a filter. In this context, one CTE is referenced within another CTE, and the results of the first CTE contribute to the filtering criteria in the second CTE. While the physical filter is unaware of whether it references an inlined CTE, the above restriction on replicated distribution eliminates the possibility of costly query hangs. In this case, the cost model determines it's less expensive to enforce the duplicate sensitive random property after the CTE consumption instead of before. (cherry picked from commit 3e37d72) --------- Co-authored-by: Jingyu Wang <wjingyu@vmware.com>
Introduce a new GUC gp_keep_partition_children_locks that preserves relation locks on AO partition leaves, that are taken during planning. This fixes concurrency issues such as #16545, where a SELECT pre-dating a concurrent lazy vacuum can read past eof of a segfile, since the vacuum would "drop" the segment. Adequate locking during planning is one way to ensure that vacuumStatement_Relation() can't acquire the AccessExclusiveLock on the coordinator. This would cause it to give up on the drop, and the concurrent SELECT isn't compromised. Unfortunately, we can't quite mark this GUC on by default, as it has consequences on the amount of locks consumed: Setting this GUC on means that we keep the acquired locks in expand_inherited_rtentry() around longer now, up until the end of the transaction. This can have consequences for transactions dealing with a large number of partitions, leading us to inch closer to the max_locks_per_transaction limit. Joins/unions etc with large partition tables now consume more locks than before. Notes: (0) This conforms locking behavior to upstream behavior, that we have in 7X (for append-optimized relations). (1) For ORCA, we take the locks during DXL to planned statement translation, borrowing from expand_inherited_rtentry(). (2) Unlike 7X, neither ORCA nor planner considers the statically pruned list of partitions for locking. (3) Currently these locks are taken only for append-optimized leaves, but the GUC can easily be extended to cover heap tables as well, as similar concurrency issues do exist. Co-authored-by: Divyesh Vanjare <vanjared@vmware.com> Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
Backported from 0ce4c65 with minor conflicts in test output (DECLARE vs DECLARE CURSOR etc). Original commit message follows: This test exposes accounting for locks when we have multiple active portals in the same session. It also shows that we display the session_user in the gp_locks_on_resqueue view, even though we use the current_user to grab the resource queue lock. Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 37bb438 with minor conflict in the schedule file and test output (CREATE ROLE vs CREATE etc). Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from a78e33d with minor conflict in isolation2 schedule and test output (CREATE QUEUE vs CREATE etc). Original commit message follows: Some of the behavior for ALTER RESOURCE QUEUE with respect to holders and waiters can be somewhat non-intuitive. So, add a test to document it and add coverage. Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 2365410 with minor conflict in isolation2 schedule and in test output (DECLARE CURSOR vs DECLARE etc). It is possible to leak a slot in a resource queue in the face of concurrent self-deadlock, query cancellation/deadlock and an ALTER QUEUE operation that bumps the resource queue limit. This can be demonstrated with a retail build (asserts-build trips various assertions): 1. Session 1: Has tripped the self-deadlock check while trying to lock say portal C2, and is currently ERRORing out (it has already been granted the extra grant that follows a positive self-deadlock check). We haven't reached cleanup yet. 2. Session 2: Is waiting on the same resource queue as Session 1. We then induce a query cancel and let's say Session 2 is in the middle of ERRORing out and is at the very top of ResLockWaitCancel(). 3. Session 3: Bumps the queue limit with ALTER such that portal C2 from Session 1 will now fit into the queue. 4. Session 2: Resumes in ResLockWaitCancel() and encounters Session1/C2 by looking at lock->waitProcs and proc->waitPortalId in: ResLockWaitCancel()->ResRemoveFromWaitQueue()->ResCleanupLock()->ResProcLockRemoveSelfAndWakeup() It then does an extra grant and limit update for C2, before completing. 5. Session 1: Now proceeds and cleans up the extra grant for C2 from Step 1 and errors out. 6. Running gp_resqueue_status shows that there is still 1 active statement. This is the leak. Observation: The extra grant and limit update in Step 4 is never cleaned up, which causes the leak. Note that this extra grant and limit update could also have been caused by a regular deadlock check (instead of query cancellation) The main reason why we have the leak is other backends can find the self-deadlocking process/portal in LOCK->waitProcs and PGPROC->waitPortalId. There is no real need to start waiting before conducting the self-deadlock check. Thus, we now do the self-deadlock check much earlier. Reviewed-by: Ashwin Agrawal <aashwin@vmware.com> Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from e8bae53 with minor conflicts in regress/init_file and a LOG vs DEBUG1 conflict in ResLockRelease's "proclock not held" message. Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 8a8a203 with minor conflict in isolation2 schedule and minor conflicts in test output (CREATE QUEUE vs CREATE etc). Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 3d981ee with very minor conflicts in code and in test output (DECLARE CURSOR vs DECLARE etc). Original commit message follows: We fix a leak of a resouce queue slot that can be caused by the following sequence of events: 1. A backend waiting on a resource queue lock receives the TERM signal and is just about to perform cleanup in: AbortOutOfAnyTransaction() -> .. -> ResLockWaitCancel() 2. Another backend releases its position in the queue, finds the 1st backend in the wait queue, evicts it and grants it a slot in the resource queue. 3. ResLockWaitCancel() resumes for the 1st backend, doesn't find itself on the wait queue and is a no-op. Thereafter, the backend finishes termination without cleaning up it's increment set, causing the leak. Note: Termination is special in the sense that we don't call ResLockRelease() in the PG_CATCH() in ResLockPortal()/ResLockUtilityPortal(), as we do for other exceptional happenstances such as cancellation/deadlock. This is because we can't catch FATAL errors (like the one that arises from termination). We never get to clean up the external grant in this case. To solve this issue, we clean up such dangling increment sets/locks during PortalDrop() and in AtExitCleanup_ResPortals(). PS: The fact that we have to recover in this fashion is due to our TRY/CATCH cleanup workflow, something that upstream code advises against in WaitOnLock(). Fixing that is non-trivial and is another matter. Co-authored-by: Ashwin Agrawal <aashwin@vmware.com> Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Actually use the return value of ResLockRelease()! Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com> (cherry picked from commit 1ad11b8)
Backported from 5bcfe1b with very minor conflicts. Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
Backported from 168bcb0 with very minor conflicts. Reviewed-by: Xuejing Zhao <zxuejing@vmware.com> Reviewed-by: Jimmy Yih <jyih@vmware.com> Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
This is a backport of 12e28aa with the file conflict resolved, and the is_set output preserved. It is safe to PANIC here, as ERRORing out here in InitProcess() -> OwnLatch() leads to an elevated FATAL, and since PMSignalState->PMChildFlags[slot] = PM_CHILD_ACTIVE at this time, the postmaster considers this failure to warrant a reset (ReleasePostmasterChildSlot() returns false and it calls HandleChildCrash()). We now get the benefit of a core file, which we didn't get before, as a result of this change. Original commit message follows: Build farm animal gharial recently failed a few times in a parallel worker's call to OwnLatch() with "ERROR: latch already owned". Let's turn that into a PANIC and show the PID of the owner, to try to learn more. Discussion: https://postgr.es/m/CA%2BhUKGJ_0RGcr7oUNzcHdn7zHqHSB_wLSd3JyS2YC_DYB%2B-V%3Dg%40mail.gmail.com
Backported from 0709b7e with minor conflict that arose due to the absence of sparcv8 as a platform. Original commit message follows: Previously, they functioned as barriers against CPU reordering but not compiler reordering, an odd API that required extensive use of volatile everywhere that spinlocks are used. That's error-prone and has negative implications for performance, so change it. In theory, this makes it safe to remove many of the uses of volatile that we currently have in our code base, but we may find that there are some bugs in this effort when we do. In the long run, though, this should make for much more maintainable code. Patch by me. Review by Andres Freund.
Buildfarm member castoroides is unhappy with this, for entirely understandable reasons.
Yet another silly mistake in 0709b7e, again found by buildfarm member castoroides.
Buildfarm member anole caught this one.
Failing to do so can lead to a race condition between an exiting backend (Y) and another backend (X) undergoing init. If the act of returning the PGPROC to the freelist for Y gets reordered before the store for procLatch->pid = 0, then there is a chance that X can grab the same PGPROC object from the freelist and can still see that procLatch->pid = Y. This can lead to the PANIC inside OwnLatch(). This can only happen if X's OwnLatch() is scheduled before Y's store for pid = 0. Using a memory barrier in DisownLatch() is sufficient to avoid this issue. Technically, the SpinLockAcquire() and Release() that succeeds the DisownLatch() call should act as compiler barriers and should suffice on x86, we still take this precautionary measure to be doubly sure. The memory barrier inside OwnLatch() is an extension of that paranoia. Since OwnLatch() and DisownLatch() aren't in hot code paths, the added memory barriers should not be a performance sink. Co-authored-by: Brent Doil <bdoil@vmware.com> Co-authored-by: Jimmy Yih <jyih@vmware.com> Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
Issue: gpexpand --help was showing wrong for -n option -n <parallel_processes> The number of tables to redistribute simultaneously. Valid values are 1 - 16. Each table redistribution process requires two database connections: one to alter the table, and another to update the table's status in the expansion schema. Before increasing -n, check the current value of the server configuration parameter max_connections and make sure the maximum connection limit is not exceeded. instead of 1 - 16 it should be 1 - 96 as per the code MAX_PARALLEL_EXPANDS = 96 Fix: updated max range value in the help file.
We have a new release tag for [6.26.3](https://github.com/greenplum-db/gpdb/releases/tag/6.26.3) that doesn't have the commit [4ac7227]( greenplum-db/gpdb@4ac7227) Adding the symbol that was added as part of the above mentioned commit to be ignored for 6.26.3 release to avoid abi check failures for new PR's on 6X.
Unlike TOAST tables, AO aux tables are linked to extension during its creation. When table is dropped from extension using ALTER, aux tables are left as extension dependencies. It means we can't DROP a table after that, because AO aux tables are still used by extension. In other words, this behavior makes impossible to DROP AO table that was created by extension without extension DROP. The goal of the patch is to change this behavior, allowing to DROP a table. Unnecessary dependencies are now not created in heap_create_with_catalog(). The behavior is the same as for TOAST tables, which are aux tables too. The dependency created in CreateAOAuxiliaryTable() is enough. The patch contains a test that describes target behavior. This is a backport of dfe49e3. The test was changed as 6x has no test_ext* extensions. To achieve the same result, new test extension was made. (cherry picked from commit 10ec9c7)
This commit is a cherry-pick of 6b7e4c3. In summary `gpexpand redistribution` earlier used to error out when the table would have been already expanded manually by the user or the table would have been damaged. This commit changes this behaviour so that any expanded/broken tables are skipped without affecting the redistribution for other tables. Error message was changed due two difference in messages of 6x and 7x. execSingleton was changed to execSQLForSingleton in mgmt_utils.py to make it work on 6x. Also, gpexpand.feature test was brought to a common standard by adding 'rm -rf ~/gpAdminLogs/gpinitsystem*'. Without this line a failing test would cause other tests to fail. Co-authored-by: Dmitry Smal <mialinx@yandex-team.ru> Co-authored-by: Nihal Jain <jnihal@vmware.com> (cherry picked from commit cfb97d1)
Currently, gpfdist transformation can only be applied to table source on external table level. It can be done by specifying #transform tag. Sometimes it may be not handy to specify the same transformation for all source files. The goal of the patch is to implement new flags with the default transformation name for input and output transformations. The default transformation(s) can be set at gpfdist start and applied only if table level transformation is not specified. New flags are a subject of only basic check. The idea was to keep the same behavior and check transformation existence only at runtime. The test shows a behavior described in patch goal. (cherry picked from commit 2f6e3ef)
We assume that plan before hazard-motion transformation was validated (such check exists) and CTE producers/consumers are consistent. The next cases may appear after hazard-motion transformation: 1. all slices, which contains shared scans, was transformed 2. the slices, which contains CTE producers, and some CTE consumers (the plan may contains other slice with consumers of these producers) was transformed 3. the slice, which contains only producer/s, was transformed 4. the slice, which contains only consumer/s, was transformed 5. the slice, without CTE producers/consumers, was transformed This patch is a follow-up for #302. Now detection of unpaired CTE's is performed on the current slice (current motion) instead of recursing into it's children slices (motions), also validation of unpaired CTE was changed: previous approach collected all CTE consumers into array and CTE producers was collected into set at the end each element of CTE producer array was probed at producers hash-set (this may lead to a false-negative cases when the slice contains two different producers, but only a single consumer). Now the sets of CTE consumers and producers compares. Current approach correctly validates 3, 4, 5 cases, while 1 case may give a false-positive result. The 2 case maybe dangerous: there is a possible situation, when the hazard-motion (slice) contains a CTE producer and it's consumer, but the plan may contain other unmodified motion (slice) with the CTE consumer of the producer from modified slice, thus this approach won't reconnize that there are unpaired CTE producers and consumers, because it checks only the consistency of producers/consumers within the hazzard motion (one slice), on the other side, this situation is possible if the replicated distribution requested for the Sequence node, but due to this patch (https://github.com/greenplum-db/gpdb/pull/15124) such situation shouldn't appear. Follow-up for #302 (51fe92e) Cherry-picked-from: a5b5c04 to reapply above 9e03478 (cherry picked from commit 5a10d0a) (cherry picked from commit 54ee058)
Geenplum python scrips try to parse system command’s STDOUT and STDERR in English and may fail if locale is different from en_US. This patch adds tests that cover this case Also, this patch reinstalls glibc-common in docker container. This is necessary to get langpacks in docker because docker images don't contain them. Cherry-picked-from: 6298e77 (cherry picked from commit fc2aade) (cherry picked from commit a3a94c5)
…es_now_fast external table (#636) gpperfmon writes a string to queries_now.dat in the expectation that it will be processed by gpmon_catqrynow.py, but the queries_now_fast external table reads from the file directly. This leads to an error, because queries_now.dat contains the query_hash column that is not declared in the queries_now_fast table. Also queries_now.dat contains empty columns at the end that should be filled by python script. These columns are not declared in the table too. This patch fixes this issue by removing extra columns from queries_now.dat and adding them directly in the python script and in the write_qlog_full (write into queries_tail.dat) function. Co-authored-by: Andrey Sokolov <a.sokolov@arenadata.io> (cherry picked from commit 742bf6f)
Missing DROP echo was added to the end of the src/test/isolation2/expected/concurrent_select_vacuum_ao_partitioned_table.out src/test/isolation2/expected/concurrent_select_vacuum_ao_partitioned_table_optimizer.out files. Initially these changes were incorporated in a897532
…#295) Commit fa287d0 had introduced early lock releasing on partitions within operations with partitioned parent table. This was done to save memory in lock manager I suppose. But such behavior incurs some issues, e.g., possible segfault when we perform drop of partitioned table and concurrent `pg_total_relation_size()` call on some AO partition. Obviously, we have to restore full locking for inheritance hierarchy as it's done in vanilla PostgreSQL. To eliminate exhaustion of lock memory we need to increase max_locks_per_transactions setting. Memory is cheap, and if we're working with tens of thousands of partitions, we can afford reserving some memory for the lock manager. Partially for DML operations the problem of partitions locking was resolved in commit 99fbc8e. Therefore in current work we have to complete that fix to support partitions locking for DDL operations and other cases. And it's in fact backport of similar commit to master 86ded36 with small additions. Also there have been added test scenario to reproduce noted above segfault case and related assert check inside `calculate_table_size()` function. Backport of PR https://github.com/greenplum-db/gpdb/pull/10468. Fixes https://github.com/greenplum-db/gpdb/issues/12962 ------- Changes from 4ac7227 in src/backend/optimizer/prep/prepunion.c were overwritten with changes from this commit. (cherry-picked from commit 2763ff1) (cherry picked from commit 74fb2b3)
In a897532 the new GUC gp_keep_partition_children_locks was introduced. This setting made ORCA take locks on child AO partitions during planning of SELECT statement. This behaviour secures the concurrent execution of SELECT with root partition from other command like VACUUM modifying leaf AO partition. In this case without locks at QD the SELECT could dispatch the invalid snapshot for accessing AO table metadata, because the VACUUM could already commit and break the leaf partition access for SELECT on certain segment. The GUC is disabled by default on the grounds that taking locks on all the partitions (the mentioned commit takes locks on whole partition hierarchy in ORCA) can exhaust the available lock amount and waste memory of lock manager. The commit above changes the planner's behaviour in the same way, but during of its backport the planner's previous behaviour remained unchanged (74fb2b3) Planner takes locks unconditionally on all partitions of various storage types. The described commit challenges mainly the issue with AO tables and concurrent VACUUM. However, similar issues for heap tables exist. Consider the scenario when SELECT from root partition and TRUNCATE on leaf partition are performed concurrently. And without locks on leaf partitions at QD SELECT could return data of leaf partition only from one segment, because at other segments TRUNCATE had succeed to truncate the relation files before the SELECT started to scan those files of leaf partition. And at first segment TRUNCATE went after the SELECT, resulting in broken atomicity of SELECT. Therefore, this patch addresses the issue about concurrency for SELECT in a more generalized way. First of all, the GUC is enabled now by default, because, when VACUUM causes SELECT to throw just an error and do not return incorrect results, TRUNCATE can break SELECT's atomicity and return unexpected data, what could lead to inconsistent behaviour. Secondly, the locks are acquired on both AO and heap leaf partitions, protecting the SELECT execution from any destructive operations. Additionally, in order to reduce the amount of locks, this patch makes ORCA take locks only for statically selected partitions when possible. In other cases ORCA behaves like planner, which takes locks on all partitions. The locktype is defined in EvalLockModeForChildRelations() and expand_inherited_rtentry() respectively. (cherry picked from commit c336098)
…es to ORCA (#746) Problem description: ORCA fails to handle queries with columns in the target list, not listed in GROUP BY clause, if grouping is done by a primary key (meaning that these columns have a functional dependency on grouping columns). In such a case, it falls back to the standard planner. Expected behavior - ORCA generates a plan for a query with columns in target list, if these columns have a functional dependency on columns in GROUP BY list. Root cause: Function CQueryMutators::ShouldFallback during query normalization finds out that there is a target list entry, that is not a grouping column, and it triggers fallback to the standard planner. Fix: All columns from the target list with functional dependency on grouping columns are added explicitly to group by clause at the query normalization stage (at start of groupby normalization, before checking if fallback is required) before the translation to DXL. It requires the following steps: 1) Extract such columns from target list expressions, and if they do not have a relevant target list entry, add resjunk target list entries for them. 2) Store current unique target list entry references in groupClause - they will be required at step 4. 3) Update all grouping sets that contain the primary key - add the functionally dependent columns from the target list. 4) Update arguments of GROUPING functions (based on target list entry references stored on step 2 and updated target list entry references), because they could be shifted after step 3. Plus, new test cases were added to the functional_deps test. ------- An #include clause was moved in src/include/gpopt/gpdbwrappers.h to resolve conflicts. (cherry-picked from commit b1373f8) (cherry picked from commit 89430f7)
Otherwise the detection can spuriously detect symbol as available, because the compiler may just emits reference to non-existant symbol. (cherry picked from commit 85abb5b)
On some systems the results of 64 bit __builtin_mul_overflow() operations can be computed at compile time, but not at runtime. The known cases are arm buildfar animals using clang where the runtime operation is implemented in a unavailable function. Try to avoid compile-time computation by using volatile arguments to __builtin_mul_overflow(). In that case we hopefully will get a link error when unavailable, similar to what buildfarm animals dangomushi and gull are reporting. Author: Andres Freund Discussion: https://postgr.es/m/20171213213754.pydkyjs6bt2hvsdb@alap3.anarazel.de (cherry picked from commit c04d35f)
Commit c04d35f didn't quite do the job here, because it still allowed the compiler to deduce that the function call could be optimized away. Prevent that by putting the arguments and results in global variables. Discussion: https://postgr.es/m/20171213213754.pydkyjs6bt2hvsdb@alap3.anarazel.de (cherry picked from commit c6d21d5)
At commit 0cb79ba dumping datadir was added support to include datadir in segment configuration dump. At the unit test the gpsegconfig_dump was changed according to this commit. This patch adds datadir path to each segment configuration at the gpsegconfig_dump file.
commit 708502f enables GUC optimizer_trace_fallback at the test bfv_cte. It leads to printing error message at Orca for query: create table rep1 (id bigserial not null, isc varchar(15) not null,iscd varchar(15) null) distributed replicated; create table rep2 (id numeric null, rc varchar(255) null,ri numeric null) distributed replicated; insert into rep1 (isc,iscd) values ('cmn_bin_yes', 'cmn_bin_yes'); Error message looks: INFO: GPORCA failed to produce a plan, falling back to planner DETAIL: Feature not supported: Volatile functions with replicated relations Orca never produces a plan for this query because of e749b2d, but error message is printed only if GUC optimizer_trace_fallback is enabled. This patch adds missed error message.
In 'always' mode, the standby independently archives all files it receives from the primary. Original patch by Fujii Masao, docs and review by me. (cherry picked from commit fa759d1) Resolved conflicts: In 5e3e0f7 close logic was moved to a new function: `XLogWalRcvClose()`. As upstream has no archive_mode `always` in 6x, the branch for `XLogArchiveMode != ARCHIVE_MODE_ALWAYS` was removed by them during back-port. It was restored later in 4f2db19, but in a modified way according to commit needs. According to our needs, `XLogArchiveNotify()` should be called for `ARCHIVE_MODE_ALWAYS` too. The condition was slightly reworked to back-port current patch. The same was done for changes in `WalReceiverMain()`.
A restart point or a checkpoint recycling WAL segments treats segments marked with neither ".done" (archiving is done) or ".ready" (segment is ready to be archived) in archive_status the same way for archive_mode being "on" or "always". While for a primary this is fine, a standby running a restart point with archive_mode = on would try to mark such a segment as ready for archiving, which is something that will never happen except after the standby is promoted. Note that this problem applies only to WAL segments coming from the local pg_wal the first time archive recovery is run. Segments part of a self-contained base backup are the most common case where this could happen, however even in this case normally the .done markers would be most likely part of the backup. Segments recovered from an archive are marked as .ready or .done by the startup process, and segments finished streaming are marked as such by the WAL receiver, so they are handled already. Reported-by: Haruka Takatsuka Author: Michael Paquier Discussion: https://postgr.es/m/15402-a453c90ed4cf88b2@postgresql.org Backpatch-through: 9.5, where archive_mode = always has been added. (cherry picked from commit 62ab24a)
…very. After the archiver dies, postmaster tries to start a new one immediately. But previously this could happen only while server was running normally even though archiving was enabled always (i.e., archive_mode was set to always). So the archiver running during recovery could not restart soon after it died. This is an oversight in commit ffd3774. This commit changes reaper(), postmaster's signal handler to cleanup after a child process dies, so that it tries to a new archiver even during recovery if necessary. Patch by me. Review by Alvaro Herrera. (cherry picked from commit 71a4062)
Allure report https://allure.adsw.io/launch/72189 |
Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1471305 |
Stolb27
approved these changes
Jun 3, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Conflicts:
New commit:
Reverted:
New commit:
Reverted:
New commit:
Reverted:
New commit:
Reverted:
always
#877) commit 62ab24aNew commit:
Reverted:
New commit:
Reverted:
New commit:
Reverted:
New commit:
Reverted: