Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch set arenadata 56 #955

Merged
merged 56 commits into from
May 24, 2024
Merged

Patch set arenadata 56 #955

merged 56 commits into from
May 24, 2024

Conversation

Stolb27
Copy link
Collaborator

@Stolb27 Stolb27 commented May 24, 2024

  1. ADBDEV-4152 ADBDEV-4152: Optimize the Explicit Redistribute application logic #666
  2. ADBDEV-4954 ADBDEV-4954 gpexpand: skip expanded/broken tables when redistributing #887
  3. ADBDEV-4969 ADBDEV-4969: Fix resource group tests #883
  4. ADBDEV-5103 ADBDEV-5103: Make frozen_insert_crash test stable #893
  5. ADBDEV-4858 Fix SIGSEGV while executing lateral subqueries with aggregate functions  #886
  6. ADBDEV-4965 Fix WAL file removing on mirrors in archive_mode always #877
  7. ADBDEV-4507 Add support for target cols with functional dependency on grouping ones to ORCA #746
  8. ADBDEV-4487 alias support for ORCA optional with GUC parameter #780
  9. ADBDEV-5025 ADBDEV-5025 [ORCA] Find proper matching distribution for the combined hashed spec #888
  10. ADBDEV-5093 ADBDEV-5093: Split temp tables storage from temp files storage  #889
  11. ADBDEV-5258 Banch of fixes for issues detected by static analyzer
  12. ADBDEV-5090 Fix panic with 'stuck spinlock' during crash recovery tests #901
  13. ADBDEV-4950 ADBDEV-4950: Backport a patch for partition locking #897
  14. ADBDEV-5303 Make "lc_time" and "lc_monetary" sync GUCs #926
  15. ADBDEV-5187 ADBDEV-5187: Fix case pg_rewind_fail_missing_xlog #902
  16. ADBDEV-5289 ADBDEV-5289: Fix an incorrect exit code in case of failure of resource group tests. #921
  17. ADBDEV-5147 Avoid linking AO aux tables to extension #930
  18. ADBDEV-5036 ADBDEV-5036: Fix gpstop -u not updating GUCs on segments #881
  19. ADBDEV-4949 ADBDEV-4949 Keep locks on child partitions until end of SELECT #928
  20. ADBDEV-5039 ADBDEV-5039: Do not treat interruption of LDAP requests by signals as an auth error. #899
  21. ADBDEV-5235 Fix segfault in get_ao_compression_ratio #907
  22. ADBDEV-5329 Fix 'adb_collect_table_stats' failure #929
  23. ADBDEV-5393 ADBDEV-5393: Fix segfault when copying dropped column ACLs to a new partition #941
  24. ADBDEV-5344 Implement default transformation flags for gpfdist #934
  25. ADBDEV-5579 ADBDEV-5579: Fix pip update in Dockerfile #950
  26. ADBDEV-5561 Fix Orca assertion for USING with different types #947
  27. ADBDEV-5547 ADBDEV-5547: Reimplement address availability check for UDP interconnect #946
  28. ADBDEV-5477 ADBDEV-5477: Invent adb_hba_file_rules_view view to show the content of pg_hba.conf #949

Stolb27 and others added 30 commits March 15, 2024 17:46
Sync 6.26.2 changes to dev branch
Previously, if a plan contained any motions and a ModifyTable node, Explicit
Redistribute Motion would be added under the ModifyTable to send tuples back to
a specific segment based on gp_segment_id.

We only really need an Explicit Redistribute Motion if the tuples that we are
going to modify were sent to a different segment by a Motion, because DELETE and
UPDATE use ctid to find tuples that need to be modified, and ctid only makes
sense in its original segment.

This patch updates the logic: add Explicit Redistribute Motion only if there is
a motion that distributes the relation we are going to update and this motion is
between the scan and the ModifyTable.

UPDATE/DELETE operations are represented by ModifyTable node. We expand
ApplyMotionState with a list of resulting relations retrieved from the
ModifyTable node, and a motion counter. A counter is used to account for the
possibility of multiple motions before the scan.

When we encounter a scan node, we search for it's scanrelid (the index of
relation) in our list of resulting relations. If the list contains this
scanrelid, we have found the scan on the table that's going to be modified by
ModifyTable, and remember the motion count. We ignore scans under InitPlans,
since there will not be any scan nodes that perform a scan on the same range
table entry as ModifyTable under InitPlans.

When we encounter the Explicit Redistribute Motion itself, we add it based on
the counter. If the motion counter is not 0, set the we can't elide the Explicit
Redistribute Motion, since the tuples that are going to be modified were
previously sent to a different segment by a motion. Otherwise, if motion counter
is 0, we can elide it.

The solution is based on a46400e.
This commit is a cherry-pick of 6b7e4c3.
In summary `gpexpand redistribution` earlier used to error out when the table
would have been already expanded manually by the user or the table would have
been damaged. This commit changes this behaviour so that any expanded/broken
tables are skipped without affecting the redistribution for other tables.

Error message was changed due two difference in messages of 6x and 7x.
execSingleton was changed to execSQLForSingleton in mgmt_utils.py to make it
work on 6x. Also, gpexpand.feature test was brought to a common standard by
adding 'rm -rf ~/gpAdminLogs/gpinitsystem*'. Without this line a failing
test would cause other tests to fail.

Co-authored-by: Dmitry Smal <mialinx@yandex-team.ru>
Co-authored-by: Nihal Jain <jnihal@vmware.com>
Patch 52bbf95 increased memory consumption on
the coordinator and caused the resgroup_cpu_rate_limit test to fail.

This patch adds the ability to create a demo cluster on multiple hosts, which
allows us to create an environment close to the upstream.
Also disables the ORCA planner for resource group tests for two reasons. First,
starting with commit 39de39e, upstream runs
tests without it. Secondly, in the resgroup_cpu_rate_limit test, there is a
problem that when the CPU is limited, ORCA can plan a query for a long time,
which leads to the test failing.
Commit 1502bef introduced a new flaking test. The test expects a panic to occur
in the process that performs the insert. But sometimes the panic happens first
in process which started by dtx recovery process that periodically checks for
orphaned transactions. Which leads to an error in the test. The solution is to
panic only the required process, which is done by extracting session from
pg_stat_activity by query.
…ns (#886)

Problem description:
Some queries with a lateral join cause a segmentation fault. It happens if the
query has a subquery with aggregate functions and without a GROUP BY clause or
it has more than one level nesting.

Cause:
Postgres planner for these queries creates a plan with slices, and the lateral
variable is accessed from a different slice. Lateral variables can't be sent
across motions. So, it causes a segmentation fault.

Fix:
Fix is to force postgres planner to bring such queries to a single query
executor by setting 'force_singleQE' in the PlannerConfig structure. Previously,
this flag was set only for queries with lateral variables and a subquery with
GROUP BY / LIMIT / DISTINCT clauses. Now this condition is complemented by
checking the presence of:
 - aggregate functions;
 - set operations that can cause motion (INTERSECT, EXCEPT, UNION without ALL);
 - recursive CTEs that also can cause motion.
Plus recursive check for all CTEs and SUBQUERY RTEs was added.
…or segment"

This partially reverts commit 4a5a95b.
The fix to src/backend/access/transam/xlogarchive.c was incomplete.
A restart point or a checkpoint recycling WAL segments treats segments
marked with neither ".done" (archiving is done) or ".ready" (segment is
ready to be archived) in archive_status the same way for archive_mode
being "on" or "always".  While for a primary this is fine, a standby
running a restart point with archive_mode = on would try to mark such a
segment as ready for archiving, which is something that will never
happen except after the standby is promoted.

Note that this problem applies only to WAL segments coming from the
local pg_wal the first time archive recovery is run.  Segments part of a
self-contained base backup are the most common case where this could
happen, however even in this case normally the .done markers would be
most likely part of the backup.  Segments recovered from an archive are
marked as .ready or .done by the startup process, and segments finished
streaming are marked as such by the WAL receiver, so they are handled
already.

Reported-by: Haruka Takatsuka
Author: Michael Paquier
Discussion: https://postgr.es/m/15402-a453c90ed4cf88b2@postgresql.org
Backpatch-through: 9.5, where archive_mode = always has been added.

Cherry-picked from 78ea8b5
The commit adds tap-test to check that WAL files are not deleted at checkpoint.

Far ago, we've backported (fa759d1)
archive_mode `always` to 6x. Since that, it was tested only manually.
Later, there was a bug around WAL recycling depends on archive_mode. Upstream
backported a fix (4a5a95b), but the condition
for `XLogArchivingAlways()` was not backported, because upstream doesn't support
it. As a result, WAL files on mirrors may be considered as removable and can be
actually removed during restart point. Backporting of
78ea8b5 solves the problem, but new tap-test
was also added to prove it works and to show such cases.

The upstream backport already covers WAL files on master instance, but I decided
to add this check to test file too. Test file is heavily inspired by other test
files from `recovery` dir.
…es to ORCA (#746)

Problem description: ORCA fails to handle queries with columns in the target
list, not listed in GROUP BY clause, if grouping is done by a primary key
(meaning that these columns have a functional dependency on grouping columns).
In such a case, it falls back to the standard planner. Expected behavior - ORCA
generates a plan for a query with columns in target list, if these columns have
a functional dependency on columns in GROUP BY list.

Root cause: Function CQueryMutators::ShouldFallback during query normalization 
finds out that there is a target list entry, that is not a grouping column, and 
it triggers fallback to the standard planner. 

Fix: All columns from the target list with functional dependency on grouping 
columns are added explicitly to group by clause at the query normalization stage
(at start of groupby normalization, before checking if fallback is required)
before the translation to DXL. It requires the following steps:
1) Extract such columns from target list expressions, and if they do not have a
relevant target list entry, add resjunk target list entries for them.
2) Store current unique target list entry references in groupClause - they will
be required at step 4.
3) Update all grouping sets that contain the primary key - add the functionally 
dependent columns from the target list. 
4) Update arguments of GROUPING functions (based on target list entry references
stored on step 2 and updated target list entry references), because they could
be shifted after step 3.

Plus, new test cases were added to the functional_deps test.
Currently ORCA generated query plan does not show aliases, unlike original
PostgreSQL planner. Usually it is much more descriptive to have the alias in the
plan, which are specified by user in the query and might have meaning.

This patch adds table aliases support to ORCA. Table aliases are added to both
internal structures and DXL schema and for debug representation of internal
objects. Since explain command uses RangeTableEntry fields of PlannedStmt to
extract the alias, we also fill rte->alias field during PlannedStmt creation.

Table aliases are widely used in the tests, so simple adding them requires lots
of tests changes and patch becomes rather huge. To eliminate this, GUC parameter
optimizer_enable_table_alias has been introduced. Default value is on, it
enables table aliases. Setting it to off makes aliases not used, as before the
patch.

The optimizer_enable_table_alias parameter intended mainly for automated tests,
so tests have to set it explicitly for all the cases. This is done in two steps:
1. When optimizer=on is used, i.e. we are running ORCA-enabled tests,
the OPTIMIZER_ENABLE_TABLE_ALIAS environment variable is set to off. 
2. At the demo cluster startup, in demo_cluster.sh, if
OPTIMIZER_ENABLE_TABLE_ALIAS is set, the line optimizer_enable_table_alias=off
is appended to the file clusterConfigPostgresAddonsFile which is used to
distribute configuration across demo cluster. Basically,
optimizer_enable_table_alias is needed only for the master node and could be set
on per-connection basis, but this approach is simple, consistent and has minimum
required changes.
Some hash join operations, that contained outer hash join as their inner part,
could wrongly enforce data redistribution only for the outer part, what lead
to inconsistent join result.

During [CPhysical\*Join] optimization PdshashedMatching function could wrongly
interpret the combined hashed distribution spec derived for the outer join,
which was the inner child of the [CPhysical\*Join] expression being optimized.
The combined hashed spec for the inner child (like hash left join in the test)
is derived from both outer and inner relations (it's done in PdsDerive function
of CPhysicalLeftOuterHashJoin), i.e. the derived distribution spec for the outer
join consists of 2 CDistributionSpecHashed entries, one for each relation. This
concept was introduced by the a15fdc9, where the HOJ disribution spec was
reconsidered. Because join is outer and NULLs are only added to unmatched rows,
distribution spec for the inner relation has NullsColocated flag set to false.
And those 2 entries are represented as linked list, where the inner spec goes
first. Because of that, when we try to find matching distribution for such
combined hashed spec inside the PdshashedMatching function, we could build a
distribution matching the first spec from combined spec (one that has
NullsColocated false) judging only by hash exprs. This logic is valid only for
the cases when the matching distribution does not require NULLs to be co-located
(NullsColocated false). And in cases when matching distribution spec requires
co-located NULLs this logic could lead to a discrepancy between parts of join
expression being optimized, because inner distribution has NullsColocated false,
but the outer (matching) distribution is required to have NullsColocated to be
true (taking into account that the hash exprs are equivalent).

This patch prevents distribution mismatch for described case in the following
way: In case when matching distribution has NullsColocated set to true but the
input distribution has NullsColocated set to false, we try to build a matching
distribution spec based on input's equivalent distribution (pdshashedEquiv). If
none of the equivalent specs from combined distribution satisfies the condition
of NULLs colocation, we eventually raise the exception.
Some operations create temporary files on disk. The growth of temporary files in
conjunction with temporary tables can lead to 100% usage of the data directory
and an emergency stop of the DBMS.

Implement temp_spill_files_tablespaces GUC that controls tablespaces where
temporary files are stored. Temporary tables will still be stored in
temp_tablespaces.

In case temp_spill_files_tablespaces is not set: the behavior will remain the
same, and both files and tables will be stored in the specified table spaces
according to the list in temp_tablespaces, or if it's also empty, in system
table spaces.

If the temp_spill_files_tablespaces is not empty, but temp_tablespaces is empty,
only temporary files will be saved in the specified table spaces. Temporary
table files will still be stored in the system tablespace.

If temp_spill_files_tablespaces is set to the empty string ("") or 'pg_default',
it will not fall back to temp_tablespaces and will instead use the default
tablespace.
The sscanf function is not safe in the sense that it can read data into the
buffer without relying on its size. To prevent buffer overflows, this patch adds
an explicit maximum read size.
canonical_grpsets is asserted to be not-NULL before the affected condition.
Remove the redundant NULL check.
The sqlca->sqlstate variable is an array of 5 characters without 0 at the end.
The patch explicitly adds size to print this variable correctly.
If fcntl fails with F_DUPFD_CLOEXEC, attempts are made to duplicate the file
descriptor using a regular dup. But since, as a result of fcntl failure,
write_fd will become -1 and dup will also fail.

This patch returns write_fd to its original value in case of fcntl failure.
memcpy was used to copy overlapping buffers in the SWAPN macro. For memcpy,
according to its specification, "if the objects overlap, the behavior is
undefined". Thus, it may potentially cause errors. The memcpy function is
replaced with memmove, which also copies data from one buffer to another one,
but the buffers may overlap according to its specification.
Problem:
Isolation crash recovery tests (for example, uao_crash_compaction_column, 
crash_recovery_dtm, etc.) failed with 'stuck spinlock' panic with very low
reproduction rate.

Cause:
Problem happened in the following case:
1. During the test, a backend process was forced into a panic (according to test
scenario).
2. At the very same moment, the background writer process called 
'SIMPLE_FAULT_INJECTOR("fault_in_background_writer_main")', he did it regularly
from the loop in 'BackgroundWriterMain'. The bg writer acquired the spinlock
inside the fault injector, but before it released it...
3. Postmaster started to send the SIGQUIT signal to all child processes.
4. Background writer process received SIGQUIT and halted its execution without
releasing the spinlock.
5. In the background writer process, a handler for the SIGQUIT signal was 
invoked - 'bg_quickdie'. And 'bg_quickdie' called 
'SIMPLE_FAULT_INJECTOR("fault_in_background_writer_quickdie");'. As the spinlock
was not released, 'bg_quickdie' hanged on the spinlock.
6. Postmaster failed to wait child processes completion. And after timeout it
tried to check 'SIMPLE_FAULT_INJECTOR("postmaster_server_loop_no_sigkill")' and
hanged on the same spinlock as well.
7. Finally, both postmaster and 'bg_quickdie' failed with 'stuck spinlock'
panic.

Fix:
The general rule would be that it is NOT allowed to access the fault injector 
from the postmaster process or from any SIGQUIT handler of child processes when 
the system is resetting after the crash of some backend. It is so because during 
reset postmaster terminates child processes, and it might terminate some process 
when the process has acquired the spinlock of the fault injector, but hasn't 
released it yet. So, subsequent calls to the fault injector api from postmaster 
or any SIGQUIT handler will lead to deadlock. Only 'doomed' processes can still 
call the fault injector api, as they will soon be terminated anyway. According 
to written above, following changes are made:
1. Access to the fault injector during a reset is removed from 'bg_quickdie' and 
postmaster's ServerLoop. 
2. The fts_segment_reset test and related code are redesigned. Now, instead of 
sleeping for a delay defined by the test in 'bg_quickdie', postmaster sends 
SIGSTOP to the bg writer, and starts a timer for the delay. Once the delay 
elapses, the timer handler sends SIGCONT and SIGQUIT to the bg writer. This 
logic is triggered if the postmaster detects the new fault 
"postmaster_delay_termination_bg_writer" (which is a replacement for 
"fault_in_background_writer_quickdie" and "postmaster_server_loop_no_sigkill"). 
This fault is checked only when the postmaster is in the PM_RUN state. So it
should be safe to check for it. 

New tests specific to this issue are not added because of the unstable nature of
the problem.
Introduce a new GUC gp_keep_partition_children_locks that preserves
relation locks on AO partition leaves, that are taken during planning.

This fixes concurrency issues such as #16545, where a SELECT pre-dating
a concurrent lazy vacuum can read past eof of a segfile, since the
vacuum would "drop" the segment. Adequate locking during planning is one
way to ensure that vacuumStatement_Relation() can't acquire the
AccessExclusiveLock on the coordinator. This would cause it to give up
on the drop, and the concurrent SELECT isn't compromised.

Unfortunately, we can't quite mark this GUC on by default, as it has
consequences on the amount of locks consumed:

Setting this GUC on means that we keep the acquired locks in
expand_inherited_rtentry() around longer now, up until the end of the
transaction. This can have consequences for transactions dealing with a
large number of partitions, leading us to inch closer to the
max_locks_per_transaction limit. Joins/unions etc with large partition
tables now consume more locks than before.

Notes:
(0) This conforms locking behavior to upstream behavior, that we have
in 7X (for append-optimized relations).

(1) For ORCA, we take the locks during DXL to planned statement
translation, borrowing from expand_inherited_rtentry().

(2) Unlike 7X, neither ORCA nor planner considers the statically pruned
list of partitions for locking.

(3) Currently these locks are taken only for append-optimized leaves,
but the GUC can easily be extended to cover heap tables as well, as
similar concurrency issues do exist.

Co-authored-by: Divyesh Vanjare <vanjared@vmware.com>
Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>

-------

Missing DROP echo was added to the end of the
src/test/isolation2/expected/concurrent_select_vacuum_ao_partitioned_table.out
src/test/isolation2/expected/concurrent_select_vacuum_ao_partitioned_table_optimizer.out
files.

(cherry-picked from 4ac7227)
…es to ORCA (#746)

Problem description: ORCA fails to handle queries with columns in the target
list, not listed in GROUP BY clause, if grouping is done by a primary key
(meaning that these columns have a functional dependency on grouping columns).
In such a case, it falls back to the standard planner. Expected behavior - ORCA
generates a plan for a query with columns in target list, if these columns have
a functional dependency on columns in GROUP BY list.

Root cause: Function CQueryMutators::ShouldFallback during query normalization
finds out that there is a target list entry, that is not a grouping column, and
it triggers fallback to the standard planner.

Fix: All columns from the target list with functional dependency on grouping
columns are added explicitly to group by clause at the query normalization stage
(at start of groupby normalization, before checking if fallback is required)
before the translation to DXL. It requires the following steps:
1) Extract such columns from target list expressions, and if they do not have a
relevant target list entry, add resjunk target list entries for them.
2) Store current unique target list entry references in groupClause - they will
be required at step 4.
3) Update all grouping sets that contain the primary key - add the functionally
dependent columns from the target list.
4) Update arguments of GROUPING functions (based on target list entry references
stored on step 2 and updated target list entry references), because they could
be shifted after step 3.

Plus, new test cases were added to the functional_deps test.

-------

An #include clause was moved in src/include/gpopt/gpdbwrappers.h to resolve
conflicts.

(cherry-picked from commit b1373f8)
…#295)

Commit fa287d0 had introduced early lock releasing on partitions within
operations with partitioned parent table. This was done to save memory
in lock manager I suppose. But such behavior incurs some issues, e.g.,
possible segfault when we perform drop of partitioned table and
concurrent `pg_total_relation_size()` call on some AO partition.

Obviously, we have to restore full locking for inheritance hierarchy as
it's done in vanilla PostgreSQL. To eliminate exhaustion of lock memory
we need to increase max_locks_per_transactions setting. Memory is cheap,
and if we're working with tens of thousands of partitions, we can afford
reserving some memory for the lock manager.

Partially for DML operations the problem of partitions locking was
resolved in commit 99fbc8e. Therefore in current work we have to complete
that fix to support partitions locking for DDL operations and other cases.
And it's in fact backport of similar commit to master 86ded36 with small
additions. Also there have been added test scenario to reproduce noted
above segfault case and related assert check inside
`calculate_table_size()` function.

Backport of PR https://github.com/greenplum-db/gpdb/pull/10468.
Fixes https://github.com/greenplum-db/gpdb/issues/12962

-------

Changes from 4ac7227 in src/backend/optimizer/prep/prepunion.c were overwritten
with changes from this commit.

(cherry-picked from commit 2763ff1)
…on. (#619)

When heap tuple is updated by legacy planner and the updated tuple is
placed at the same page (heap-only tuple, HOT), an update chain is created.
It's a chain of updated tuples, in which each tuple's ctid points to the next
tuple in the chain.

HOT chains allow to store only one index entry, which points to the first tuple
in the chain. And during Index Scan we pass through the chain, and the first
tuple visible for the current transaction is taken (for more information, see
src/backend/access/heap/README.HOT).

If we create a second index on column that has been updated, it will store the
ctid of the beginning of the existing HOT chain. If a repeatable read
transaction started before the transaction in which the second index was
created, then this index could be used in the query plan. As a result of the
search for this index, a tuple could be found that does not meet the search
condition (by a new value that is not visible to the transaction)

In the case of the legacy planner, this problem is solved the following way:

"To address this issue, regular (non-concurrent) CREATE INDEX makes the
new index usable only by new transactions and transactions that don't
have snapshots older than the CREATE INDEX command. This prevents
queries that can see the inconsistent HOT chains from trying to use the
new index and getting incorrect results. Queries that can see the index
can only see the rows that were visible after the index was created,
hence the HOT chains are consistent for them."

But ORCA does not handle this case and can use an index with a broken HOT-chain.

This patch resolves the issue for ORCA in the same way as legacy planner. During
planning we ignore newly created indexes based on their xmin.

Additionally, ORCA faced another related problem. Since ORCA has its own cache
(MD Cache) and can cache a relation object without an index that cannot be used
in the current snapshot (because MDCacheSetTransientState function returns true),
we won't be able to use the index after the problematic snapshot changes.
Therefore, we need to reset the cache after the snapshot changes in order to use
index.

This patch solves the problem in the following way: during index filtering, if
we encounter an index that we cannot use, we save TransactionXmin in the
mdcache_transaction_xmin variable. In the next queries, we check the saved xmin,
and if it is valid and differs from the current one, we reset the cache.

The create_index_hot test has also been changed. Now optimizer is turned off
before the update. Since ORCA always uses Split Update, in which case HOT chains
are not created and the problem is not reproduced. And that's why ORCA wasn't
actually tested before.

(cherry-picked from commit 5894018)
The make_partition_rules function expects pElem to be non-NULL, but checking
before calling allows pElem to be NULL.

This patch corrects the condition to prevent calling make_partition_rules with
pElem being NULL.
Also, fix pager enable selection for such cases, and other cleanups for
zero-column output.

Report by Thom Brown

cherry-picked from 376a0c4
It's at least theoretically possible to overflow int32 when adding up
column width estimates to make a row width estimate.  (The bug example
isn't terribly convincing as a real use-case, but perhaps wide joins
would provide a more plausible route to trouble.)  This'd lead to
assertion failures or silly planner behavior.  To forestall it, make
the relevant functions compute their running sums in int64 arithmetic
and then clamp to int32 range at the end.  We can reasonably assume
that MaxAllocSize is a hard limit on actual tuple width, so clamping
to that is simply a correction for dubious input values, and there's
no need to go as far as widening width variables to int64 everywhere.

Per bug #18247 from RekGRpth.  There've been no reports of this issue
arising in practical cases, so I feel no need to back-patch.

Richard Guo and Tom Lane

Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org

This is backport of original postgresql commit postgres/postgres@7e1ce2b
Problem:
Static code analyzer found a potential issue with 'pNode'/'pNode2' pointer, that
is dereferenced before checking for a NULL value.

Fix:
1. Assert is added at the beginning of the function to ensure that 'pNode' and
the accessed fields of 'pNode' are not NULL.
2. Assert is added at the beginning of the loop to ensure that the accessed
fields of 'pNode2' are not NULL.
3. Redundant check for 'pNode2' not being NULL is removed, as it is checked by
the 'while' condition.
4. Assert is added after the update of 'pNode2' to ensure that the accessed
fields of the 'pNode2' are not NULL.
5. Second redundant check for 'pNode2' not being NULL is removed, as it is
guaranteed above by statements 'if (prule && prule->children)' and
'pNode2 = prule->children;'.
dreamedcheng and others added 22 commits April 11, 2024 08:49
Note that lc_monetary and lc_time are related to formatting
output data. Besides, formatting functions will be pushed down
to QEs in some common cases. So to keep the output data consistent
with the locale value set in QD, we need to sync these GUCs
between QD and QEs.

Co-authored-by: wuchengwen <wcw190496@alibaba-inc.com>

This is a backport of commit e1a4fdb

In the test, the query with the call to 'pg_terminate_backend' has been wrapped
in 'ignore' section. Reason is that on 6x this query is planned in such a way
that the function is called on the segments with arguments (PIDs) collected from
all the cluster. That resulted in several warnings "WARNING: PID XXXXXX is not a
PostgreSQL server process" (and each time the count can be different), before it
actually terminated the backend process.
When execute inject fault `checkpoint_after_redo_calculated` or
'checkpoint_control_file_updated' in a newly promoted mirror node,
connection failed, error occurs due to the fatal log 'mirror is being
promoted'. It means when connection state is MIRROR_READY but the
role changes to primary during promoting, the connection will be
declined to avoid confusion.

Wait for segment promotion finished and accept connection before
inject fault.

This is a backport of commit 01d9c59.
The connectSeg function has been adapted for the old Python.

Co-authored-by: Xing Guo <higuoxing@gmail.com>
Co-authored-by: wenru yan <ywenru@vmware.com>
#921)

If the tests of the resource groups fail, the script should return a non-zero
exit code. But bash returns an exit code as a result of the last command; this
led to that script always exiting with a zero exit code.

This patch adds strict mode to the bash script and makes the shutdown of
containers a separate function that will be called on exit. The command that run
the test is now the last command, which ensures the script will return its exit
code.

Ticket: ADBDEV-5289
Unlike TOAST tables, AO aux tables are linked to extension during its creation.
When table is dropped from extension using ALTER, aux tables are left as
extension dependencies. It means we can't DROP a table after that, because AO
aux tables are still used by extension. In other words, this behavior makes
impossible to DROP AO table that was created by extension without extension
DROP.

The goal of the patch is to change this behavior, allowing to DROP a table.
Unnecessary dependencies are now not created in heap_create_with_catalog(). The
behavior is the same as for TOAST tables, which are aux tables too. The
dependency created in CreateAOAuxiliaryTable() is enough.

The patch contains a test that describes target behavior.

This is a backport of dfe49e3. The test was
changed as 6x has no test_ext* extensions. To achieve the same result, new test
extension was made.
Backend process GUCs marked as GUC_GPDB_NEED_SYNC are overwritten by coordinator
upon receiving a new connection. The coordinator sends it's configuration to the
segments the same way as if it were a frontend communicating with the
coordinator. This means the default priority for these GUCs becomes client
priority.

Previously, when gpstop -u was executed, it sent a SIGHUP signal, prompting the
coordinator and segments to re-read their configuration files. However, the
priority of options derived from the configuration file is lower than that of
client options. This difference caused the segments to ignore overwritten
GUCs when they re-read their configuration files, and these GUCs were not
re-dispatched by the QD either. This situation resulted in a desynchronization
of GUCs that are required to be synchronized between the coordinator and the
segments.

The solution is to explicitly dispatch the changed configuration from the
coordinator to the segments, similar to how it is done when initializing the
backend process.

When the QD receives a SIGHUP signal, it reads its configuration file to
identify any GUCs that have been modified. If these GUCs are marked with the
GUC_GPDB_NEED_SYNC flag, the QD dispatches SET commands containing these
modified GUCs to the QEs. To differentiate between this type of synchronization
and user-initiated SET commands, a flag GP_OPT_SYNCHRONIZATION_SET has been
introduced to the distributed transaction options.
In a897532 the new GUC
gp_keep_partition_children_locks was introduced. This setting made ORCA take
locks on child AO partitions during planning of SELECT statement. This behaviour
secures the concurrent execution of SELECT with root partition from other
command like VACUUM modifying leaf AO partition. In this case without locks at
QD the SELECT could dispatch the invalid snapshot for accessing AO table
metadata, because the VACUUM could already commit and break the leaf partition
access for SELECT on certain segment. The GUC is disabled by default on the
grounds that taking locks on all the partitions (the mentioned commit takes
locks on whole partition hierarchy in ORCA) can exhaust the available
lock amount and waste memory of lock manager. The commit above changes the
planner's behaviour in the same way, but during of its backport the planner's
previous behaviour remained unchanged (74fb2b3)
Planner takes locks unconditionally on all partitions of various storage types.

The described commit challenges mainly the issue with AO tables and concurrent
VACUUM. However, similar issues for heap tables exist. Consider the scenario
when SELECT from root partition and TRUNCATE on leaf partition are performed
concurrently. And without locks on leaf partitions at QD SELECT could return
data of leaf partition only from one segment, because at other segments TRUNCATE
had succeed to truncate the relation files before the SELECT started to scan
those files of leaf partition. And at first segment TRUNCATE went after the
SELECT, resulting in broken atomicity of SELECT.

Therefore, this patch addresses the issue about concurrency for SELECT in a more
generalized way. First of all, the GUC is enabled now by default, because, when
VACUUM causes SELECT to throw just an error and do not return incorrect results,
TRUNCATE can break SELECT's atomicity and return unexpected data, what could
lead to inconsistent behaviour. Secondly, the locks are acquired on both AO and
heap leaf partitions, protecting the SELECT execution from any destructive
operations. Additionally, in order to reduce the amount of locks, this patch
makes ORCA take locks only for statically selected partitions when possible.
In other cases ORCA behaves like planner, which takes locks on all partitions.
The locktype is defined in EvalLockModeForChildRelations() and
expand_inherited_rtentry() respectively.
…r. (#899)

When authorizing via LDAP, the ldap_simple_bind_s function may return an error.
This occurs when an internal call to the ldap_int_poll utility function is
interrupted by a signal. The openldap library can handle this properly using the
LDAP_OPT_RESTART option. Setting this option to LDAP_OPT_ON causes the select()
system call to be restarted when interrupted by a signal. The default value is
LDAP_OPT_OFF. This patch sets the value to LDAP_OPT_ON (also it requires
explicitly specifying the -llber library to build) after ldap is initialized.

Tests are not provided because they require a separate LDAP container and the
error is not reproducible in a stable manner.
Problem:
Segfault happened in the queries with 'get_ao_compression_ratio' function in the
target list and the argument for this function that was taken from a table. The
problem was reproduced in the release build. In the development build the same
queries led to assertion.

Cause:
Function 'get_ao_compression_ratio' executes a query on
'pg_aoseg.pg_aoseg_' table to calculate the compression ratio. A function
that accesses hash-distributed or randomly distributed tables must be executed
on coordinator. But the planner created plans with the
'get_ao_compression_ratio' function which was executed on segments. Inside
'get_ao_compression_ratio' there was an assert to check this case, but on the
release build the assert was removed. So, the code of the function was executed
on a segment and got unexpected results from 'pg_aoseg.pg_aoseg_', that led to a
segfault.

Fix:
Assert is changed to a runtime check. Now, if the planner creates a plan with
the function which is executed on segments, the user will get a proper error
message. It is also done for the similar functions 'gp_update_ao_master_stats'
and 'get_ao_distribution'. Plus, unreachable code is removed from internal
functions.

Note: it seems that a fully proper way would be to mark these functions as
PROEXECLOCATION_MASTER instead of PROEXECLOCATION_ANY, but that would require
changing of the system catalog, and it is acceptable only between major versions
update. So this approach is not used.

Note2: for an empty column-oriented table 'get_ao_compression_ratio' returns -1,
but for an empty row-oriented table 'get_ao_compression_ratio' returns 1. This
behavior is inconsistent, but it is not altered by this commit as not related
to the original issue. It has a reflection in the added test case. Be aware.
Problem:
Function 'adb_collect_table_stats' failed to execute after creation of a table
that is partitioned by a column with a non-timestamp type (particularly, by the
'bigint' column).

Cause:
There is a query in 'adb_collect_table_stats', which checks if there is
presented a partition of 'db_files_history' that covers today's date. When
this query was planned, the filter that checks the partitioning was planned to
execute before filtering by the relname. Thus the query tried to cast 'bigint'
to 'timestamp', which led to an error.

Fix:
The query is rewritten in a way that the filter that checks the partitioning is
executed after all other filters. Also, the 'adb_collect_table_stats_test' test
is updated to check the case with an additional partition in the
'db_files_history'. It is required to verify the filter that checks the presence
of a partition for the current month.
…ion. (#872)"

This patch did not solve the problem in the case when the dropped column was not
at the end. Since when skipping dropped columns, the discrepancy between the
attribute numbers in the source and target relations was not taken into account.

This reverts commit 10ec6fb.
While copying columns ACLs when adding a new partition, we assumed that the
attribute numbers in the source and target relation would match, which is not
true in the case of the dropped columns in the original relation. In this case,
we would try to get an ACL for a non-existent column in the target table, which
would lead to a segfault.

This patch adds a separate iterator for the list of columns in the target
relation to obtain the correct skipping of dropped columns in the original
relation and obtain the correct attribute number for the target relation. We
assume that there are no dropped columns in the target relation, since this is a
newly created relation.
Currently, gpfdist transformation can only be applied to table source on
external table level. It can be done by specifying #transform tag. Sometimes it
may be not handy to specify the same transformation for all source files. The
goal of the patch is to implement new flags with the default transformation name
for input and output transformations. The default transformation(s) can be set
at gpfdist start and applied only if table level transformation is not
specified.

New flags are a subject of only basic check. The idea was to keep the same
behavior and check transformation existence only at runtime.
The test shows a behavior described in patch goal.
pip from the CentOS 7 repository can no longer update itself to the actual
version (20.3.4) because of the SSL error.

This patch replaces the installation of pip from the repository and the update
to install the actual version via the get-pip.py script.
If columns of join USING condition have different types, parser represents them
as RelabelNode to indicate that these columns probably need type adjustment.
Orca failed to process such join conditions due to assert restrictions during
DXL translation of join expression.

This patch allows RelabelNode to be processed by relaxing the assertion. This is
safe and no additional changes are required because Orca is able to handle nodes
of RelabelType
…ckets in IPv6 disabled environments (#679)"

This reverts commit 5beccaa.
`SendDummyPacket` eventually calls `getaddrinfo` (which is a reentrant),
however, `getaddrinfo` is not an async-signal-safe function.
`getaddrinfo` internally calls `malloc`, which is strongly advised to
not do within a signal handler as it may cause deadlocks.
Cache the accepted socket information for the listener, so that it can
be reused in `SendDummyPacket()`.

The purpose of `SendDummyPacket` is to exit more quickly; it
circumvents the polling that happens, which eventually times out after 250ms.

Without `SendDummyPacket()`, there will be multiple test failures since
some tests expects the backend connection to terminate almost
immediately.

To view all the async-signal-safe functions, please view
the signal-safety(7) — Linux manual page.

====================

Accommodate for AF_INET6 and AF_INET when doing a motion layer IPC teardown

Previously on commit 70306db, we removed pg_getaddrinfo_all for
signal handlers. However, in doing so, the capability of supporting both
AF_INET6 and AF_INET was lost; this responsibility must now be handled
by us. The commit mentioned above fixed the issue for AF_INET (IPv4),
but not for AF_INET6 (IPv6).

This commit also addresses the situation for both AF_INET and AF_INET6.

====================

This commit backports: 70306db and
1ee34a4.
Conflicts resolved:
- setupUDPListeningSocket is not one to one with 7X, so resolve conflict
  on the location where `listenerSockaddr` is `memcpy`ed.
- resolve conflicts due to difference of from the modernized version and
  refactor from 50748b7

Reviewed-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>

(cherry-picked from 7f3c91f)
This patch reimplements changes introduced in 5beccaa (which backported 70306db,
1ee34a4) and fixes them by correctly handling IPv6 addresses, including
link-local ones.

In these patches, a fix was created for UDPIFC that reworked SendDummyPacket()
by removing the unnecessary call to getaddrinfo() to avoid using malloc() in a
signal handler, making setupUDPListeningSocket cache the address of the socket
used for the UDPIFC interconnect when it was first created.

The original patch (5beccaa) didn’t account that AI_ADDRCONFIG flag does not
protect us from link-local addresses (and other non-global addresses), which are
issued automatically upon connecting to some networks. These addresses cause a
false positive even with that flag.

When IPv6 is disabled, all IPv6 addresses are removed from the network
interfaces. However, upon reconnecting to the network, these addresses reappear,
even though IPv6 remains disabled, with a single exception being loopback
interface, IPv6 addresses of which do not reappear.

AI_ADDRCONFIG usage is replaced with a manual check, which looks for the first
available IPv6 address and simultaneously makes sure that the loopback interface
has an IPv6 address assigned.

This is a rework of commit 5beccaa.
…ucts.

tokenize_file() now returns a single list of TokenizedLine structs,
carrying the same information as before.  We were otherwise going to grow a
fourth list to deal with error messages, and that was getting a bit silly.

Haribabu Kommi, revised a bit by me

Discussion: https://postgr.es/m/CAJrrPGfbgbKsjYp=bgZXhMcgxoaGSoBb9fyjrDoOW_YymXv1Kw@mail.gmail.com

This is full backport of commit 350cb92
This is partial backport a676201 by Georgy Shelkovy <g.shelkovy@arenadata.io>
and it adds only one auxiliary function strlist_to_textarray to return a TEXT
array out of a list of C-strings.
This view is designed to wit it shows what is currently in the file, not what
the postmaster has loaded as the active settings.  That allows it to be used to
pre-vet edits before issuing SIGHUP.  As with the earlier view, go out of our
way to allow errors in the file to be reflected in the view, to assist that
use-case.

(We might at some point invent a view to show the current active settings,
but this is not that patch; and it's not trivial to do.)

Haribabu Kommi, reviewed by Ashutosh Bapat, Michael Paquier, Simon Riggs,
and myself

Discussion: https://postgr.es/m/CAJrrPGerH4jiwpcXT1-46QXUDmNp2QDrG9+-Tek_xC8APHShYw@mail.gmail.com

This is partial backport de16ab7 by Georgy Shelkovy <g.shelkovy@arenadata.io>
without backporting catalog changes and handling changes to SSL settings in
version 10
This patch adds a public interface to the pg_hba_file_rules function via the
arenadata_toolkit extension. The new function is called adb_hba_file_rules and
the new view is called adb_hba_file_rules_view.
@Stolb27 Stolb27 marked this pull request as ready for review May 24, 2024 07:35
@Stolb27 Stolb27 requested a review from a team May 24, 2024 07:35
@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/71412

@BenderArenadata
Copy link

Failed job Regression tests with ORCA on x86_64: https://gitlab.adsw.io/arenadata/github_mirroring/gpdb/-/jobs/1430262

@Stolb27 Stolb27 merged commit 25d47a7 into adb-6.x May 24, 2024
5 checks passed
@Stolb27 Stolb27 deleted the arenadata56 branch May 24, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet