Patch set arenadata 56 #955

Stolb27 · 2024-05-24T07:04:09Z

Sync 6.26.2 changes to dev branch

Previously, if a plan contained any motions and a ModifyTable node, Explicit Redistribute Motion would be added under the ModifyTable to send tuples back to a specific segment based on gp_segment_id. We only really need an Explicit Redistribute Motion if the tuples that we are going to modify were sent to a different segment by a Motion, because DELETE and UPDATE use ctid to find tuples that need to be modified, and ctid only makes sense in its original segment. This patch updates the logic: add Explicit Redistribute Motion only if there is a motion that distributes the relation we are going to update and this motion is between the scan and the ModifyTable. UPDATE/DELETE operations are represented by ModifyTable node. We expand ApplyMotionState with a list of resulting relations retrieved from the ModifyTable node, and a motion counter. A counter is used to account for the possibility of multiple motions before the scan. When we encounter a scan node, we search for it's scanrelid (the index of relation) in our list of resulting relations. If the list contains this scanrelid, we have found the scan on the table that's going to be modified by ModifyTable, and remember the motion count. We ignore scans under InitPlans, since there will not be any scan nodes that perform a scan on the same range table entry as ModifyTable under InitPlans. When we encounter the Explicit Redistribute Motion itself, we add it based on the counter. If the motion counter is not 0, set the we can't elide the Explicit Redistribute Motion, since the tuples that are going to be modified were previously sent to a different segment by a motion. Otherwise, if motion counter is 0, we can elide it. The solution is based on a46400e.

This commit is a cherry-pick of 6b7e4c3. In summary `gpexpand redistribution` earlier used to error out when the table would have been already expanded manually by the user or the table would have been damaged. This commit changes this behaviour so that any expanded/broken tables are skipped without affecting the redistribution for other tables. Error message was changed due two difference in messages of 6x and 7x. execSingleton was changed to execSQLForSingleton in mgmt_utils.py to make it work on 6x. Also, gpexpand.feature test was brought to a common standard by adding 'rm -rf ~/gpAdminLogs/gpinitsystem*'. Without this line a failing test would cause other tests to fail. Co-authored-by: Dmitry Smal <mialinx@yandex-team.ru> Co-authored-by: Nihal Jain <jnihal@vmware.com>

Patch 52bbf95 increased memory consumption on the coordinator and caused the resgroup_cpu_rate_limit test to fail. This patch adds the ability to create a demo cluster on multiple hosts, which allows us to create an environment close to the upstream. Also disables the ORCA planner for resource group tests for two reasons. First, starting with commit 39de39e, upstream runs tests without it. Secondly, in the resgroup_cpu_rate_limit test, there is a problem that when the CPU is limited, ORCA can plan a query for a long time, which leads to the test failing.

Commit 1502bef introduced a new flaking test. The test expects a panic to occur in the process that performs the insert. But sometimes the panic happens first in process which started by dtx recovery process that periodically checks for orphaned transactions. Which leads to an error in the test. The solution is to panic only the required process, which is done by extracting session from pg_stat_activity by query.

…ns (#886) Problem description: Some queries with a lateral join cause a segmentation fault. It happens if the query has a subquery with aggregate functions and without a GROUP BY clause or it has more than one level nesting. Cause: Postgres planner for these queries creates a plan with slices, and the lateral variable is accessed from a different slice. Lateral variables can't be sent across motions. So, it causes a segmentation fault. Fix: Fix is to force postgres planner to bring such queries to a single query executor by setting 'force_singleQE' in the PlannerConfig structure. Previously, this flag was set only for queries with lateral variables and a subquery with GROUP BY / LIMIT / DISTINCT clauses. Now this condition is complemented by checking the presence of: - aggregate functions; - set operations that can cause motion (INTERSECT, EXCEPT, UNION without ALL); - recursive CTEs that also can cause motion. Plus recursive check for all CTEs and SUBQUERY RTEs was added.

…or segment" This partially reverts commit 4a5a95b. The fix to src/backend/access/transam/xlogarchive.c was incomplete.

A restart point or a checkpoint recycling WAL segments treats segments marked with neither ".done" (archiving is done) or ".ready" (segment is ready to be archived) in archive_status the same way for archive_mode being "on" or "always". While for a primary this is fine, a standby running a restart point with archive_mode = on would try to mark such a segment as ready for archiving, which is something that will never happen except after the standby is promoted. Note that this problem applies only to WAL segments coming from the local pg_wal the first time archive recovery is run. Segments part of a self-contained base backup are the most common case where this could happen, however even in this case normally the .done markers would be most likely part of the backup. Segments recovered from an archive are marked as .ready or .done by the startup process, and segments finished streaming are marked as such by the WAL receiver, so they are handled already. Reported-by: Haruka Takatsuka Author: Michael Paquier Discussion: https://postgr.es/m/15402-a453c90ed4cf88b2@postgresql.org Backpatch-through: 9.5, where archive_mode = always has been added. Cherry-picked from 78ea8b5

The commit adds tap-test to check that WAL files are not deleted at checkpoint. Far ago, we've backported (fa759d1) archive_mode `always` to 6x. Since that, it was tested only manually. Later, there was a bug around WAL recycling depends on archive_mode. Upstream backported a fix (4a5a95b), but the condition for `XLogArchivingAlways()` was not backported, because upstream doesn't support it. As a result, WAL files on mirrors may be considered as removable and can be actually removed during restart point. Backporting of 78ea8b5 solves the problem, but new tap-test was also added to prove it works and to show such cases. The upstream backport already covers WAL files on master instance, but I decided to add this check to test file too. Test file is heavily inspired by other test files from `recovery` dir.

…es to ORCA (#746) Problem description: ORCA fails to handle queries with columns in the target list, not listed in GROUP BY clause, if grouping is done by a primary key (meaning that these columns have a functional dependency on grouping columns). In such a case, it falls back to the standard planner. Expected behavior - ORCA generates a plan for a query with columns in target list, if these columns have a functional dependency on columns in GROUP BY list. Root cause: Function CQueryMutators::ShouldFallback during query normalization finds out that there is a target list entry, that is not a grouping column, and it triggers fallback to the standard planner. Fix: All columns from the target list with functional dependency on grouping columns are added explicitly to group by clause at the query normalization stage (at start of groupby normalization, before checking if fallback is required) before the translation to DXL. It requires the following steps: 1) Extract such columns from target list expressions, and if they do not have a relevant target list entry, add resjunk target list entries for them. 2) Store current unique target list entry references in groupClause - they will be required at step 4. 3) Update all grouping sets that contain the primary key - add the functionally dependent columns from the target list. 4) Update arguments of GROUPING functions (based on target list entry references stored on step 2 and updated target list entry references), because they could be shifted after step 3. Plus, new test cases were added to the functional_deps test.

Currently ORCA generated query plan does not show aliases, unlike original PostgreSQL planner. Usually it is much more descriptive to have the alias in the plan, which are specified by user in the query and might have meaning. This patch adds table aliases support to ORCA. Table aliases are added to both internal structures and DXL schema and for debug representation of internal objects. Since explain command uses RangeTableEntry fields of PlannedStmt to extract the alias, we also fill rte->alias field during PlannedStmt creation. Table aliases are widely used in the tests, so simple adding them requires lots of tests changes and patch becomes rather huge. To eliminate this, GUC parameter optimizer_enable_table_alias has been introduced. Default value is on, it enables table aliases. Setting it to off makes aliases not used, as before the patch. The optimizer_enable_table_alias parameter intended mainly for automated tests, so tests have to set it explicitly for all the cases. This is done in two steps: 1. When optimizer=on is used, i.e. we are running ORCA-enabled tests, the OPTIMIZER_ENABLE_TABLE_ALIAS environment variable is set to off. 2. At the demo cluster startup, in demo_cluster.sh, if OPTIMIZER_ENABLE_TABLE_ALIAS is set, the line optimizer_enable_table_alias=off is appended to the file clusterConfigPostgresAddonsFile which is used to distribute configuration across demo cluster. Basically, optimizer_enable_table_alias is needed only for the master node and could be set on per-connection basis, but this approach is simple, consistent and has minimum required changes.

Some hash join operations, that contained outer hash join as their inner part, could wrongly enforce data redistribution only for the outer part, what lead to inconsistent join result. During [CPhysical\*Join] optimization PdshashedMatching function could wrongly interpret the combined hashed distribution spec derived for the outer join, which was the inner child of the [CPhysical\*Join] expression being optimized. The combined hashed spec for the inner child (like hash left join in the test) is derived from both outer and inner relations (it's done in PdsDerive function of CPhysicalLeftOuterHashJoin), i.e. the derived distribution spec for the outer join consists of 2 CDistributionSpecHashed entries, one for each relation. This concept was introduced by the a15fdc9, where the HOJ disribution spec was reconsidered. Because join is outer and NULLs are only added to unmatched rows, distribution spec for the inner relation has NullsColocated flag set to false. And those 2 entries are represented as linked list, where the inner spec goes first. Because of that, when we try to find matching distribution for such combined hashed spec inside the PdshashedMatching function, we could build a distribution matching the first spec from combined spec (one that has NullsColocated false) judging only by hash exprs. This logic is valid only for the cases when the matching distribution does not require NULLs to be co-located (NullsColocated false). And in cases when matching distribution spec requires co-located NULLs this logic could lead to a discrepancy between parts of join expression being optimized, because inner distribution has NullsColocated false, but the outer (matching) distribution is required to have NullsColocated to be true (taking into account that the hash exprs are equivalent). This patch prevents distribution mismatch for described case in the following way: In case when matching distribution has NullsColocated set to true but the input distribution has NullsColocated set to false, we try to build a matching distribution spec based on input's equivalent distribution (pdshashedEquiv). If none of the equivalent specs from combined distribution satisfies the condition of NULLs colocation, we eventually raise the exception.

Some operations create temporary files on disk. The growth of temporary files in conjunction with temporary tables can lead to 100% usage of the data directory and an emergency stop of the DBMS. Implement temp_spill_files_tablespaces GUC that controls tablespaces where temporary files are stored. Temporary tables will still be stored in temp_tablespaces. In case temp_spill_files_tablespaces is not set: the behavior will remain the same, and both files and tables will be stored in the specified table spaces according to the list in temp_tablespaces, or if it's also empty, in system table spaces. If the temp_spill_files_tablespaces is not empty, but temp_tablespaces is empty, only temporary files will be saved in the specified table spaces. Temporary table files will still be stored in the system tablespace. If temp_spill_files_tablespaces is set to the empty string ("") or 'pg_default', it will not fall back to temp_tablespaces and will instead use the default tablespace.

The sscanf function is not safe in the sense that it can read data into the buffer without relying on its size. To prevent buffer overflows, this patch adds an explicit maximum read size.

canonical_grpsets is asserted to be not-NULL before the affected condition. Remove the redundant NULL check.

The sqlca->sqlstate variable is an array of 5 characters without 0 at the end. The patch explicitly adds size to print this variable correctly.

If fcntl fails with F_DUPFD_CLOEXEC, attempts are made to duplicate the file descriptor using a regular dup. But since, as a result of fcntl failure, write_fd will become -1 and dup will also fail. This patch returns write_fd to its original value in case of fcntl failure.

memcpy was used to copy overlapping buffers in the SWAPN macro. For memcpy, according to its specification, "if the objects overlap, the behavior is undefined". Thus, it may potentially cause errors. The memcpy function is replaced with memmove, which also copies data from one buffer to another one, but the buffers may overlap according to its specification.

Problem: Isolation crash recovery tests (for example, uao_crash_compaction_column, crash_recovery_dtm, etc.) failed with 'stuck spinlock' panic with very low reproduction rate. Cause: Problem happened in the following case: 1. During the test, a backend process was forced into a panic (according to test scenario). 2. At the very same moment, the background writer process called 'SIMPLE_FAULT_INJECTOR("fault_in_background_writer_main")', he did it regularly from the loop in 'BackgroundWriterMain'. The bg writer acquired the spinlock inside the fault injector, but before it released it... 3. Postmaster started to send the SIGQUIT signal to all child processes. 4. Background writer process received SIGQUIT and halted its execution without releasing the spinlock. 5. In the background writer process, a handler for the SIGQUIT signal was invoked - 'bg_quickdie'. And 'bg_quickdie' called 'SIMPLE_FAULT_INJECTOR("fault_in_background_writer_quickdie");'. As the spinlock was not released, 'bg_quickdie' hanged on the spinlock. 6. Postmaster failed to wait child processes completion. And after timeout it tried to check 'SIMPLE_FAULT_INJECTOR("postmaster_server_loop_no_sigkill")' and hanged on the same spinlock as well. 7. Finally, both postmaster and 'bg_quickdie' failed with 'stuck spinlock' panic. Fix: The general rule would be that it is NOT allowed to access the fault injector from the postmaster process or from any SIGQUIT handler of child processes when the system is resetting after the crash of some backend. It is so because during reset postmaster terminates child processes, and it might terminate some process when the process has acquired the spinlock of the fault injector, but hasn't released it yet. So, subsequent calls to the fault injector api from postmaster or any SIGQUIT handler will lead to deadlock. Only 'doomed' processes can still call the fault injector api, as they will soon be terminated anyway. According to written above, following changes are made: 1. Access to the fault injector during a reset is removed from 'bg_quickdie' and postmaster's ServerLoop. 2. The fts_segment_reset test and related code are redesigned. Now, instead of sleeping for a delay defined by the test in 'bg_quickdie', postmaster sends SIGSTOP to the bg writer, and starts a timer for the delay. Once the delay elapses, the timer handler sends SIGCONT and SIGQUIT to the bg writer. This logic is triggered if the postmaster detects the new fault "postmaster_delay_termination_bg_writer" (which is a replacement for "fault_in_background_writer_quickdie" and "postmaster_server_loop_no_sigkill"). This fault is checked only when the postmaster is in the PM_RUN state. So it should be safe to check for it. New tests specific to this issue are not added because of the unstable nature of the problem.

…ransaction. (#619)" This reverts commit 5894018.

…nsaction (#295)" This reverts commit 2763ff1.

…uping ones to ORCA (#746)" This reverts commit b1373f8.

Introduce a new GUC gp_keep_partition_children_locks that preserves relation locks on AO partition leaves, that are taken during planning. This fixes concurrency issues such as #16545, where a SELECT pre-dating a concurrent lazy vacuum can read past eof of a segfile, since the vacuum would "drop" the segment. Adequate locking during planning is one way to ensure that vacuumStatement_Relation() can't acquire the AccessExclusiveLock on the coordinator. This would cause it to give up on the drop, and the concurrent SELECT isn't compromised. Unfortunately, we can't quite mark this GUC on by default, as it has consequences on the amount of locks consumed: Setting this GUC on means that we keep the acquired locks in expand_inherited_rtentry() around longer now, up until the end of the transaction. This can have consequences for transactions dealing with a large number of partitions, leading us to inch closer to the max_locks_per_transaction limit. Joins/unions etc with large partition tables now consume more locks than before. Notes: (0) This conforms locking behavior to upstream behavior, that we have in 7X (for append-optimized relations). (1) For ORCA, we take the locks during DXL to planned statement translation, borrowing from expand_inherited_rtentry(). (2) Unlike 7X, neither ORCA nor planner considers the statically pruned list of partitions for locking. (3) Currently these locks are taken only for append-optimized leaves, but the GUC can easily be extended to cover heap tables as well, as similar concurrency issues do exist. Co-authored-by: Divyesh Vanjare <vanjared@vmware.com> Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> Co-authored-by: Ashwin Agrawal <aashwin@vmware.com> ------- Missing DROP echo was added to the end of the src/test/isolation2/expected/concurrent_select_vacuum_ao_partitioned_table.out src/test/isolation2/expected/concurrent_select_vacuum_ao_partitioned_table_optimizer.out files. (cherry-picked from 4ac7227)

…es to ORCA (#746) Problem description: ORCA fails to handle queries with columns in the target list, not listed in GROUP BY clause, if grouping is done by a primary key (meaning that these columns have a functional dependency on grouping columns). In such a case, it falls back to the standard planner. Expected behavior - ORCA generates a plan for a query with columns in target list, if these columns have a functional dependency on columns in GROUP BY list. Root cause: Function CQueryMutators::ShouldFallback during query normalization finds out that there is a target list entry, that is not a grouping column, and it triggers fallback to the standard planner. Fix: All columns from the target list with functional dependency on grouping columns are added explicitly to group by clause at the query normalization stage (at start of groupby normalization, before checking if fallback is required) before the translation to DXL. It requires the following steps: 1) Extract such columns from target list expressions, and if they do not have a relevant target list entry, add resjunk target list entries for them. 2) Store current unique target list entry references in groupClause - they will be required at step 4. 3) Update all grouping sets that contain the primary key - add the functionally dependent columns from the target list. 4) Update arguments of GROUPING functions (based on target list entry references stored on step 2 and updated target list entry references), because they could be shifted after step 3. Plus, new test cases were added to the functional_deps test. ------- An #include clause was moved in src/include/gpopt/gpdbwrappers.h to resolve conflicts. (cherry-picked from commit b1373f8)

…#295) Commit fa287d0 had introduced early lock releasing on partitions within operations with partitioned parent table. This was done to save memory in lock manager I suppose. But such behavior incurs some issues, e.g., possible segfault when we perform drop of partitioned table and concurrent `pg_total_relation_size()` call on some AO partition. Obviously, we have to restore full locking for inheritance hierarchy as it's done in vanilla PostgreSQL. To eliminate exhaustion of lock memory we need to increase max_locks_per_transactions setting. Memory is cheap, and if we're working with tens of thousands of partitions, we can afford reserving some memory for the lock manager. Partially for DML operations the problem of partitions locking was resolved in commit 99fbc8e. Therefore in current work we have to complete that fix to support partitions locking for DDL operations and other cases. And it's in fact backport of similar commit to master 86ded36 with small additions. Also there have been added test scenario to reproduce noted above segfault case and related assert check inside `calculate_table_size()` function. Backport of PR https://github.com/greenplum-db/gpdb/pull/10468. Fixes https://github.com/greenplum-db/gpdb/issues/12962 ------- Changes from 4ac7227 in src/backend/optimizer/prep/prepunion.c were overwritten with changes from this commit. (cherry-picked from commit 2763ff1)

…on. (#619) When heap tuple is updated by legacy planner and the updated tuple is placed at the same page (heap-only tuple, HOT), an update chain is created. It's a chain of updated tuples, in which each tuple's ctid points to the next tuple in the chain. HOT chains allow to store only one index entry, which points to the first tuple in the chain. And during Index Scan we pass through the chain, and the first tuple visible for the current transaction is taken (for more information, see src/backend/access/heap/README.HOT). If we create a second index on column that has been updated, it will store the ctid of the beginning of the existing HOT chain. If a repeatable read transaction started before the transaction in which the second index was created, then this index could be used in the query plan. As a result of the search for this index, a tuple could be found that does not meet the search condition (by a new value that is not visible to the transaction) In the case of the legacy planner, this problem is solved the following way: "To address this issue, regular (non-concurrent) CREATE INDEX makes the new index usable only by new transactions and transactions that don't have snapshots older than the CREATE INDEX command. This prevents queries that can see the inconsistent HOT chains from trying to use the new index and getting incorrect results. Queries that can see the index can only see the rows that were visible after the index was created, hence the HOT chains are consistent for them." But ORCA does not handle this case and can use an index with a broken HOT-chain. This patch resolves the issue for ORCA in the same way as legacy planner. During planning we ignore newly created indexes based on their xmin. Additionally, ORCA faced another related problem. Since ORCA has its own cache (MD Cache) and can cache a relation object without an index that cannot be used in the current snapshot (because MDCacheSetTransientState function returns true), we won't be able to use the index after the problematic snapshot changes. Therefore, we need to reset the cache after the snapshot changes in order to use index. This patch solves the problem in the following way: during index filtering, if we encounter an index that we cannot use, we save TransactionXmin in the mdcache_transaction_xmin variable. In the next queries, we check the saved xmin, and if it is valid and differs from the current one, we reset the cache. The create_index_hot test has also been changed. Now optimizer is turned off before the update. Since ORCA always uses Split Update, in which case HOT chains are not created and the problem is not reproduced. And that's why ORCA wasn't actually tested before. (cherry-picked from commit 5894018)

The make_partition_rules function expects pElem to be non-NULL, but checking before calling allows pElem to be NULL. This patch corrects the condition to prevent calling make_partition_rules with pElem being NULL.

Also, fix pager enable selection for such cases, and other cleanups for zero-column output. Report by Thom Brown cherry-picked from 376a0c4

It's at least theoretically possible to overflow int32 when adding up column width estimates to make a row width estimate. (The bug example isn't terribly convincing as a real use-case, but perhaps wide joins would provide a more plausible route to trouble.) This'd lead to assertion failures or silly planner behavior. To forestall it, make the relevant functions compute their running sums in int64 arithmetic and then clamp to int32 range at the end. We can reasonably assume that MaxAllocSize is a hard limit on actual tuple width, so clamping to that is simply a correction for dubious input values, and there's no need to go as far as widening width variables to int64 everywhere. Per bug #18247 from RekGRpth. There've been no reports of this issue arising in practical cases, so I feel no need to back-patch. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org This is backport of original postgresql commit postgres/postgres@7e1ce2b

Problem: Static code analyzer found a potential issue with 'pNode'/'pNode2' pointer, that is dereferenced before checking for a NULL value. Fix: 1. Assert is added at the beginning of the function to ensure that 'pNode' and the accessed fields of 'pNode' are not NULL. 2. Assert is added at the beginning of the loop to ensure that the accessed fields of 'pNode2' are not NULL. 3. Redundant check for 'pNode2' not being NULL is removed, as it is checked by the 'while' condition. 4. Assert is added after the update of 'pNode2' to ensure that the accessed fields of the 'pNode2' are not NULL. 5. Second redundant check for 'pNode2' not being NULL is removed, as it is guaranteed above by statements 'if (prule && prule->children)' and 'pNode2 = prule->children;'.

Note that lc_monetary and lc_time are related to formatting output data. Besides, formatting functions will be pushed down to QEs in some common cases. So to keep the output data consistent with the locale value set in QD, we need to sync these GUCs between QD and QEs. Co-authored-by: wuchengwen <wcw190496@alibaba-inc.com> This is a backport of commit e1a4fdb In the test, the query with the call to 'pg_terminate_backend' has been wrapped in 'ignore' section. Reason is that on 6x this query is planned in such a way that the function is called on the segments with arguments (PIDs) collected from all the cluster. That resulted in several warnings "WARNING: PID XXXXXX is not a PostgreSQL server process" (and each time the count can be different), before it actually terminated the backend process.

When execute inject fault `checkpoint_after_redo_calculated` or 'checkpoint_control_file_updated' in a newly promoted mirror node, connection failed, error occurs due to the fatal log 'mirror is being promoted'. It means when connection state is MIRROR_READY but the role changes to primary during promoting, the connection will be declined to avoid confusion. Wait for segment promotion finished and accept connection before inject fault. This is a backport of commit 01d9c59. The connectSeg function has been adapted for the old Python. Co-authored-by: Xing Guo <higuoxing@gmail.com> Co-authored-by: wenru yan <ywenru@vmware.com>

#921) If the tests of the resource groups fail, the script should return a non-zero exit code. But bash returns an exit code as a result of the last command; this led to that script always exiting with a zero exit code. This patch adds strict mode to the bash script and makes the shutdown of containers a separate function that will be called on exit. The command that run the test is now the last command, which ensures the script will return its exit code. Ticket: ADBDEV-5289

Unlike TOAST tables, AO aux tables are linked to extension during its creation. When table is dropped from extension using ALTER, aux tables are left as extension dependencies. It means we can't DROP a table after that, because AO aux tables are still used by extension. In other words, this behavior makes impossible to DROP AO table that was created by extension without extension DROP. The goal of the patch is to change this behavior, allowing to DROP a table. Unnecessary dependencies are now not created in heap_create_with_catalog(). The behavior is the same as for TOAST tables, which are aux tables too. The dependency created in CreateAOAuxiliaryTable() is enough. The patch contains a test that describes target behavior. This is a backport of dfe49e3. The test was changed as 6x has no test_ext* extensions. To achieve the same result, new test extension was made.

Backend process GUCs marked as GUC_GPDB_NEED_SYNC are overwritten by coordinator upon receiving a new connection. The coordinator sends it's configuration to the segments the same way as if it were a frontend communicating with the coordinator. This means the default priority for these GUCs becomes client priority. Previously, when gpstop -u was executed, it sent a SIGHUP signal, prompting the coordinator and segments to re-read their configuration files. However, the priority of options derived from the configuration file is lower than that of client options. This difference caused the segments to ignore overwritten GUCs when they re-read their configuration files, and these GUCs were not re-dispatched by the QD either. This situation resulted in a desynchronization of GUCs that are required to be synchronized between the coordinator and the segments. The solution is to explicitly dispatch the changed configuration from the coordinator to the segments, similar to how it is done when initializing the backend process. When the QD receives a SIGHUP signal, it reads its configuration file to identify any GUCs that have been modified. If these GUCs are marked with the GUC_GPDB_NEED_SYNC flag, the QD dispatches SET commands containing these modified GUCs to the QEs. To differentiate between this type of synchronization and user-initiated SET commands, a flag GP_OPT_SYNCHRONIZATION_SET has been introduced to the distributed transaction options.

In a897532 the new GUC gp_keep_partition_children_locks was introduced. This setting made ORCA take locks on child AO partitions during planning of SELECT statement. This behaviour secures the concurrent execution of SELECT with root partition from other command like VACUUM modifying leaf AO partition. In this case without locks at QD the SELECT could dispatch the invalid snapshot for accessing AO table metadata, because the VACUUM could already commit and break the leaf partition access for SELECT on certain segment. The GUC is disabled by default on the grounds that taking locks on all the partitions (the mentioned commit takes locks on whole partition hierarchy in ORCA) can exhaust the available lock amount and waste memory of lock manager. The commit above changes the planner's behaviour in the same way, but during of its backport the planner's previous behaviour remained unchanged (74fb2b3) Planner takes locks unconditionally on all partitions of various storage types. The described commit challenges mainly the issue with AO tables and concurrent VACUUM. However, similar issues for heap tables exist. Consider the scenario when SELECT from root partition and TRUNCATE on leaf partition are performed concurrently. And without locks on leaf partitions at QD SELECT could return data of leaf partition only from one segment, because at other segments TRUNCATE had succeed to truncate the relation files before the SELECT started to scan those files of leaf partition. And at first segment TRUNCATE went after the SELECT, resulting in broken atomicity of SELECT. Therefore, this patch addresses the issue about concurrency for SELECT in a more generalized way. First of all, the GUC is enabled now by default, because, when VACUUM causes SELECT to throw just an error and do not return incorrect results, TRUNCATE can break SELECT's atomicity and return unexpected data, what could lead to inconsistent behaviour. Secondly, the locks are acquired on both AO and heap leaf partitions, protecting the SELECT execution from any destructive operations. Additionally, in order to reduce the amount of locks, this patch makes ORCA take locks only for statically selected partitions when possible. In other cases ORCA behaves like planner, which takes locks on all partitions. The locktype is defined in EvalLockModeForChildRelations() and expand_inherited_rtentry() respectively.

…r. (#899) When authorizing via LDAP, the ldap_simple_bind_s function may return an error. This occurs when an internal call to the ldap_int_poll utility function is interrupted by a signal. The openldap library can handle this properly using the LDAP_OPT_RESTART option. Setting this option to LDAP_OPT_ON causes the select() system call to be restarted when interrupted by a signal. The default value is LDAP_OPT_OFF. This patch sets the value to LDAP_OPT_ON (also it requires explicitly specifying the -llber library to build) after ldap is initialized. Tests are not provided because they require a separate LDAP container and the error is not reproducible in a stable manner.

Problem: Segfault happened in the queries with 'get_ao_compression_ratio' function in the target list and the argument for this function that was taken from a table. The problem was reproduced in the release build. In the development build the same queries led to assertion. Cause: Function 'get_ao_compression_ratio' executes a query on 'pg_aoseg.pg_aoseg_' table to calculate the compression ratio. A function that accesses hash-distributed or randomly distributed tables must be executed on coordinator. But the planner created plans with the 'get_ao_compression_ratio' function which was executed on segments. Inside 'get_ao_compression_ratio' there was an assert to check this case, but on the release build the assert was removed. So, the code of the function was executed on a segment and got unexpected results from 'pg_aoseg.pg_aoseg_', that led to a segfault. Fix: Assert is changed to a runtime check. Now, if the planner creates a plan with the function which is executed on segments, the user will get a proper error message. It is also done for the similar functions 'gp_update_ao_master_stats' and 'get_ao_distribution'. Plus, unreachable code is removed from internal functions. Note: it seems that a fully proper way would be to mark these functions as PROEXECLOCATION_MASTER instead of PROEXECLOCATION_ANY, but that would require changing of the system catalog, and it is acceptable only between major versions update. So this approach is not used. Note2: for an empty column-oriented table 'get_ao_compression_ratio' returns -1, but for an empty row-oriented table 'get_ao_compression_ratio' returns 1. This behavior is inconsistent, but it is not altered by this commit as not related to the original issue. It has a reflection in the added test case. Be aware.

Problem: Function 'adb_collect_table_stats' failed to execute after creation of a table that is partitioned by a column with a non-timestamp type (particularly, by the 'bigint' column). Cause: There is a query in 'adb_collect_table_stats', which checks if there is presented a partition of 'db_files_history' that covers today's date. When this query was planned, the filter that checks the partitioning was planned to execute before filtering by the relname. Thus the query tried to cast 'bigint' to 'timestamp', which led to an error. Fix: The query is rewritten in a way that the filter that checks the partitioning is executed after all other filters. Also, the 'adb_collect_table_stats_test' test is updated to check the case with an additional partition in the 'db_files_history'. It is required to verify the filter that checks the presence of a partition for the current month.

…ion. (#872)" This patch did not solve the problem in the case when the dropped column was not at the end. Since when skipping dropped columns, the discrepancy between the attribute numbers in the source and target relations was not taken into account. This reverts commit 10ec6fb.

While copying columns ACLs when adding a new partition, we assumed that the attribute numbers in the source and target relation would match, which is not true in the case of the dropped columns in the original relation. In this case, we would try to get an ACL for a non-existent column in the target table, which would lead to a segfault. This patch adds a separate iterator for the list of columns in the target relation to obtain the correct skipping of dropped columns in the original relation and obtain the correct attribute number for the target relation. We assume that there are no dropped columns in the target relation, since this is a newly created relation.

Currently, gpfdist transformation can only be applied to table source on external table level. It can be done by specifying #transform tag. Sometimes it may be not handy to specify the same transformation for all source files. The goal of the patch is to implement new flags with the default transformation name for input and output transformations. The default transformation(s) can be set at gpfdist start and applied only if table level transformation is not specified. New flags are a subject of only basic check. The idea was to keep the same behavior and check transformation existence only at runtime. The test shows a behavior described in patch goal.

pip from the CentOS 7 repository can no longer update itself to the actual version (20.3.4) because of the SSL error. This patch replaces the installation of pip from the repository and the update to install the actual version via the get-pip.py script.

If columns of join USING condition have different types, parser represents them as RelabelNode to indicate that these columns probably need type adjustment. Orca failed to process such join conditions due to assert restrictions during DXL translation of join expression. This patch allows RelabelNode to be processed by relaxing the assertion. This is safe and no additional changes are required because Orca is able to handle nodes of RelabelType

…ckets in IPv6 disabled environments (#679)" This reverts commit 5beccaa.

`SendDummyPacket` eventually calls `getaddrinfo` (which is a reentrant), however, `getaddrinfo` is not an async-signal-safe function. `getaddrinfo` internally calls `malloc`, which is strongly advised to not do within a signal handler as it may cause deadlocks. Cache the accepted socket information for the listener, so that it can be reused in `SendDummyPacket()`. The purpose of `SendDummyPacket` is to exit more quickly; it circumvents the polling that happens, which eventually times out after 250ms. Without `SendDummyPacket()`, there will be multiple test failures since some tests expects the backend connection to terminate almost immediately. To view all the async-signal-safe functions, please view the signal-safety(7) — Linux manual page. ==================== Accommodate for AF_INET6 and AF_INET when doing a motion layer IPC teardown Previously on commit 70306db, we removed pg_getaddrinfo_all for signal handlers. However, in doing so, the capability of supporting both AF_INET6 and AF_INET was lost; this responsibility must now be handled by us. The commit mentioned above fixed the issue for AF_INET (IPv4), but not for AF_INET6 (IPv6). This commit also addresses the situation for both AF_INET and AF_INET6. ==================== This commit backports: 70306db and 1ee34a4. Conflicts resolved: - setupUDPListeningSocket is not one to one with 7X, so resolve conflict on the location where `listenerSockaddr` is `memcpy`ed. - resolve conflicts due to difference of from the modernized version and refactor from 50748b7 Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com> (cherry-picked from 7f3c91f)

This patch reimplements changes introduced in 5beccaa (which backported 70306db, 1ee34a4) and fixes them by correctly handling IPv6 addresses, including link-local ones. In these patches, a fix was created for UDPIFC that reworked SendDummyPacket() by removing the unnecessary call to getaddrinfo() to avoid using malloc() in a signal handler, making setupUDPListeningSocket cache the address of the socket used for the UDPIFC interconnect when it was first created. The original patch (5beccaa) didn’t account that AI_ADDRCONFIG flag does not protect us from link-local addresses (and other non-global addresses), which are issued automatically upon connecting to some networks. These addresses cause a false positive even with that flag. When IPv6 is disabled, all IPv6 addresses are removed from the network interfaces. However, upon reconnecting to the network, these addresses reappear, even though IPv6 remains disabled, with a single exception being loopback interface, IPv6 addresses of which do not reappear. AI_ADDRCONFIG usage is replaced with a manual check, which looks for the first available IPv6 address and simultaneously makes sure that the loopback interface has an IPv6 address assigned. This is a rework of commit 5beccaa.

…ucts. tokenize_file() now returns a single list of TokenizedLine structs, carrying the same information as before. We were otherwise going to grow a fourth list to deal with error messages, and that was getting a bit silly. Haribabu Kommi, revised a bit by me Discussion: https://postgr.es/m/CAJrrPGfbgbKsjYp=bgZXhMcgxoaGSoBb9fyjrDoOW_YymXv1Kw@mail.gmail.com This is full backport of commit 350cb92

This is partial backport a676201 by Georgy Shelkovy <g.shelkovy@arenadata.io> and it adds only one auxiliary function strlist_to_textarray to return a TEXT array out of a list of C-strings.

This view is designed to wit it shows what is currently in the file, not what the postmaster has loaded as the active settings. That allows it to be used to pre-vet edits before issuing SIGHUP. As with the earlier view, go out of our way to allow errors in the file to be reflected in the view, to assist that use-case. (We might at some point invent a view to show the current active settings, but this is not that patch; and it's not trivial to do.) Haribabu Kommi, reviewed by Ashutosh Bapat, Michael Paquier, Simon Riggs, and myself Discussion: https://postgr.es/m/CAJrrPGerH4jiwpcXT1-46QXUDmNp2QDrG9+-Tek_xC8APHShYw@mail.gmail.com This is partial backport de16ab7 by Georgy Shelkovy <g.shelkovy@arenadata.io> without backporting catalog changes and handling changes to SSL settings in version 10

This patch adds a public interface to the pg_hba_file_rules function via the arenadata_toolkit extension. The new function is called adb_hba_file_rules and the new view is called adb_hba_file_rules_view.

BenderArenadata · 2024-05-24T07:37:55Z

Allure report https://allure.adsw.io/launch/71412

BenderArenadata · 2024-05-24T09:45:00Z

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1430262

Stolb27 and others added 30 commits March 15, 2024 17:46

Merge pull request #891 from arenadata/adb-6.x

cdc9332

Sync 6.26.2 changes to dev branch

Revert "Ignore XLOG archive check when creating restart point on mirr…

b2a986c

…or segment" This partially reverts commit 4a5a95b. The fix to src/backend/access/transam/xlogarchive.c was incomplete.

Ensure sscanf function not overwrite buffer. (#908)

56eb57f

The sscanf function is not safe in the sense that it can read data into the buffer without relying on its size. To prevent buffer overflows, this patch adds an explicit maximum read size.

Remove a null pointer check for canonical_grpsets (#913)

149d8f3

canonical_grpsets is asserted to be not-NULL before the affected condition. Remove the redundant NULL check.

Fix output of non-null-terminated strings (#909)

99e8a17

The sqlca->sqlstate variable is an array of 5 characters without 0 at the end. The patch explicitly adds size to print this variable correctly.

Revert "Fix use by ORCA of a newer index with HOT-chain in an older t…

90e2cab

…ransaction. (#619)" This reverts commit 5894018.

Revert "On DDL operation keep the lock on partitions until end of tra…

aa4b440

…nsaction (#295)" This reverts commit 2763ff1.

Revert "Add support for target cols with functional dependency on gro…

2495693

…uping ones to ORCA (#746)" This reverts commit b1373f8.

Fix a possible segmentation fault in make_partition_rules

bbf74f8

The make_partition_rules function expects pElem to be non-NULL, but checking before calling allows pElem to be NULL. This patch corrects the condition to prevent calling make_partition_rules with pElem being NULL.

psql: show proper row count in \x mode for zero-column output

3b8464c

Also, fix pager enable selection for such cases, and other cleanups for zero-column output. Report by Thom Brown cherry-picked from 376a0c4

dreamedcheng and others added 22 commits April 11, 2024 08:49

Revert "Fix the check for address availability when setting up UDP so…

cbb10eb

…ckets in IPv6 disabled environments (#679)" This reverts commit 5beccaa.

Add strlist_to_textarray

96db6c9

This is partial backport a676201 by Georgy Shelkovy <g.shelkovy@arenadata.io> and it adds only one auxiliary function strlist_to_textarray to return a TEXT array out of a list of C-strings.

Add a public interface to the pg_hba_file_rules function

7ef8cd7

This patch adds a public interface to the pg_hba_file_rules function via the arenadata_toolkit extension. The new function is called adb_hba_file_rules and the new view is called adb_hba_file_rules_view.

Merge branch 'adb-6.x' into arenadata56

d51fff2

Stolb27 marked this pull request as ready for review May 24, 2024 07:35

Stolb27 requested a review from a team May 24, 2024 07:35

RekGRpth approved these changes May 24, 2024

View reviewed changes

Stolb27 mentioned this pull request May 24, 2024

Fix pip update in Dockerfile (#950) #954

Closed

Stolb27 merged commit 25d47a7 into adb-6.x May 24, 2024
5 checks passed

Stolb27 deleted the arenadata56 branch May 24, 2024 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch set arenadata 56 #955

Patch set arenadata 56 #955

Stolb27 commented May 24, 2024 •

edited

Loading

BenderArenadata commented May 24, 2024

BenderArenadata commented May 24, 2024

Patch set arenadata 56 #955

Patch set arenadata 56 #955

Conversation

Stolb27 commented May 24, 2024 • edited Loading

BenderArenadata commented May 24, 2024

BenderArenadata commented May 24, 2024

Stolb27 commented May 24, 2024 •

edited

Loading