Add commits from Pg14 archive branch#1840
Open
leborchuk wants to merge 127 commits into
Open
Conversation
This commit performs a comprehensive license compliance cleanup to align
with release requirements, which are pointed out by Incubator PMC
review.
The main changes include:
1. Add License Headers: Added the standard Apache License Version 2.0
header to numerous source files that were missing it. This covers
multiple file types, including YAML, Markdown, SQL, C/C++, Python,
and shell scripts. These files are originally created by the
cloudberry community.
2. Simplify LICENSE and NOTICE:
- Restructured the root LICENSE file for better clarity.
- Cleaned up the NOTICE file by removing redundant information which
have been listed in the LICENSE.
3. Remove the unused deployment docs from the `deploy/build`, which can
help us manage the file licenses.
4. Update RAT Configuration: Updated `pom.xml` to reflect the changes of
the file license headers and attribution.
See: #1236
Regenerate configure file from configure.ac by autoconf
…tats_ext_exprs. (#1551) This is postgres/postgres@c342538 commit, applied to Cloudberry. There was no issues in apply, only changes are to gporca expected output original commit message follows === The catalog view pg_stats_ext fails to consider privileges for expression statistics. The catalog view pg_stats_ext_exprs fails to consider privileges and row-level security policies. To fix, restrict the data in these views to table owners or roles that inherit privileges of the table owner. It may be possible to apply less restrictive privilege checks in some cases, but that is left as a future exercise. Furthermore, for pg_stats_ext_exprs, do not return data for tables with row-level security enabled, as is already done for pg_stats_ext. On the back-branches, a fix-CVE-2024-4317.sql script is provided that will install into the "share" directory. This file can be used to apply the fix to existing clusters. Bumps catversion on 'master' branch only. Reported-by: Lukas Fittl Reviewed-by: Noah Misch, Tomas Vondra, Tom Lane Security: CVE-2024-4317 Backpatch-through: 14
setup_cdb_schema() checked errno after a readdir() loop without resetting it beforehand. In some environments (e.g., Ubuntu 24.04), a stale errno value from operations inside the loop (such as pg_realloc or pg_strdup) could persist, causing readdir's normal termination to be misinterpreted as a failure (e.g., "Function not implemented"). This commit fixes the issue by adopting the standard PostgreSQL idiom: - Use "while (errno = 0, (file = readdir(dir)) != NULL)" to ensure errno is cleared strictly before each readdir() call. - Move closedir() after the errno check to prevent it from overwriting the error code from readdir(). - Add defensive error checking for the closedir() call itself. This ensures robust directory scanning and reliable error reporting during cluster initialization.
Fix SyntaxWarning caused by invalid escape sequences in mainUtils.py and logfilter.py. These warnings appear on Python 3.12+ (e.g., Ubuntu 24.04) and will become SyntaxError in Python 3.14. Changes: - mainUtils.py: Use raw strings for shell commands containing '\$' - logfilter.py: Use raw docstrings for functions containing regex examples See: #1587
Correct a typo in LICENSE that referenced the ISC license file with a duplicated directory name. No functional change.
* remove unused build_xerces.py
When CREATE TABLESPACE ... LOCATION '' is dispatched from QD to QE, the serialization converts the empty string to NULL. On the segment, pstrdup(stmt->location) crashes with SIGSEGV because stmt->location is NULL. Add a NULL guard to treat NULL the same as empty string, preserving in-place tablespace semantics. Fixes #1627
movedb() acquires a session-level AccessExclusiveLock on the database
via MoveDbSessionLockAcquire(). The release in CommitTransaction()
only checked for GP_ROLE_DISPATCH and IS_SINGLENODE(), missing
GP_ROLE_UTILITY. This caused the lock to leak in standalone backends
(e.g. TAP tests), triggering a proc.c assertion failure at exit:
FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))")
Add GP_ROLE_UTILITY to the release condition.
Also fix a spurious "could not read symbolic link" log message when
dropping in-place tablespaces: readlink() on a directory returns
EINVAL, which is expected and can be safely skipped.
Fixes #1626
Two fixes:
- Use TestLib::slurp_file instead of PostgreSQL::Test::Utils::slurp_file
in adjust_conf(). PostgresNode.pm only imports TestLib, not
PostgreSQL::Test::Utils, so the latter is undefined and causes
t/101_restore_point_and_startup_pause to fail.
- Replace cp with install -m 644 in enable_archiving() archive_command.
coreutils 8.32 (Rocky 8, Ubuntu 22.04) uses copy_file_range() in cp
which crashes on Docker overlayfs. install does not use
copy_file_range(), avoiding the crash.
Also add ic-recovery test suite to rocky8 and deb CI pipelines.
Guard pfree/list_free calls with pointer-equality checks to avoid freeing live nodes when flatten_join_alias_vars returns the same pointer unchanged (e.g., outer-reference Vars with varlevelsup > 0). The unconditional pfree(havingQual) freed the Var node, whose memory was later reused by palloc for a T_List. copyObjectImpl then copied the wrong node type into havingQual, causing ORCA to encounter an unexpected RangeTblEntry and fall back to the Postgres planner. Applies the same guard pattern to all six fields: targetList, returningList, havingQual, scatterClause, limitOffset, limitCount. Reported-in: #1618
Force correlated execution (SubPlan) for scalar subqueries with GROUP BY () and a correlated HAVING clause. Previously ORCA decorrelated such subqueries into Left Outer Join + COALESCE(count,0), which incorrectly returned 0 instead of NULL when the HAVING condition was false. Add FHasCorrelatedSelectAboveGbAgg() to detect the pattern where NormalizeHaving() has converted the HAVING clause into a CLogicalSelect with outer refs above a CLogicalGbAgg with empty grouping columns. When detected, set m_fCorrelatedExecution = true in Psd() to bypass the COALESCE decorrelation path. Update groupingsets_optimizer.out expected output to reflect the new ORCA SubPlan format instead of Postgres planner fallback. Reported-in: #1618
Add a new AQUMV code path that rewrites multi-table JOIN queries to
scan materialized views when the query exactly matches the MV
definition. This compares the saved raw parse tree against stored
viewQuery from gp_matview_aux, bypassing the single-table AQUMV
logic entirely.
This enables significant query acceleration for common analytical
patterns: instead of repeatedly computing expensive multi-table joins
at query time, the planner can directly read pre-computed results
from the materialized view, turning O(N*M) join operations into a
simple sequential scan.
For example, given:
CREATE MATERIALIZED VIEW mv AS
SELECT t1.a, t2.b FROM t1 JOIN t2 ON t1.a = t2.a;
-- Before (GUC off): original join plan
Gather Motion 3:1
-> Hash Join
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t1
-> Hash
-> Seq Scan on t2
-- After (GUC on): rewritten to MV scan
Gather Motion 3:1
-> Seq Scan on mv
Fix three issues in aqumv_query_is_exact_match(): - Add groupDistinct comparison (GROUP BY vs GROUP BY DISTINCT) - Add limitOption comparison (LIMIT vs FETCH FIRST WITH TIES) - Clear qp_extra in-place via aqumv_context->qp_extra instead of allocating a local char array; move standard_qp_extra typedef to planner.h so aqumv.c can reference the proper struct type Add test cases 26-28 to verify the new comparisons: - LIMIT vs FETCH FIRST WITH TIES non-match and exact match - GROUP BY DISTINCT vs GROUP BY non-match
The rewrite cleared sortClause, so grouping_planner() skipped adding a Sort node — queries with ORDER BY returned unsorted results from the MV scan. Fix: preserve sortClause and copy ressortgroupref to rewritten target entries so the upper planner generates Sort correctly. Before: Limit -> Gather -> Limit -> Seq Scan on mv After: Limit -> Gather -> Limit -> Sort -> Seq Scan on mv
When the relkind of a relache entry changes, because a table is converted into a view, pgstats can get confused in 15+, leading to crashes or assertion failures. For HEAD, Tom fixed this in b23cd18, by removing support for converting a table to a view, removing the source of the inconsistency. This commit just adds an assertion that a relcache entry's relkind does not change, just in case we end up with another case of that in the future. As there's no cases of changing relkind anymore, we can't add a test that that's handled correctly. For 15, fix the problem by not maintaining the association with the old pgstat entry when the relkind changes during a relcache invalidation processing. In that case the pgstat entry needs to be unlinked first, to avoid PgStat_TableStatus->relation getting out of sync. Also add a test reproducing the issues. No known problem exists in 11-14, so just add the test there. Reported-by: vignesh C <vignesh21@gmail.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CALDaNm2yXz+zOtv7y5zBd5WKT8O0Ld3YxikuU3dcyCvxF7gypA@mail.gmail.com Discussion: https://postgr.es/m/CALDaNm3oZA-8Wbps2Jd1g5_Gjrr-x3YWrJPek-mF5Asrrvz2Dg@mail.gmail.com Backpatch: 15-
When pg_dump retrieves the list of database objects and performs the data dump, there was possibility that objects are replaced with others of the same name, such as views, and access them. This vulnerability could result in code execution with superuser privileges during the pg_dump process. This issue can arise when dumping data of sequences, foreign tables (only 13 or later), or tables registered with a WHERE clause in the extension configuration table. To address this, pg_dump now utilizes the newly introduced restrict_nonsystem_relation_kind GUC parameter to restrict the accesses to non-system views and foreign tables during the dump process. This new GUC parameter is added to back branches too, but these changes do not require cluster recreation. Back-patch to all supported branches. Reviewed-by: Noah Misch Security: CVE-2024-7348 Backpatch-through: 12
This is a further update based on the PR: - #1625
Remove generation and installation of diskquota-build-info from diskquota to avoid writing extra files into the install prefix root (e.g. $GPHOME or /usr/local/cloudberry-db). Drop the unused CMake helper cmake/BuildInfo.cmake and remove cmake/Git.cmake and its invocation now that no build-info is produced.
It was always false in single-user mode, in autovacuum workers, and in background workers. This had no specifically-identified security consequences, but non-core code or future work might make it security-relevant. Back-patch to v11 (all supported versions). Jelte Fennema-Nio. Reported by Jelte Fennema-Nio.
This commit changes libpq so that errors reported by the backend during the protocol negotiation for SSL and GSS are discarded by the client, as these may include bytes that could be consumed by the client and write arbitrary bytes to a client's terminal. A failure with the SSL negotiation now leads to an error immediately reported, without a retry on any other methods allowed, like a fallback to a plaintext connection. A failure with GSS discards the error message received, and we allow a fallback as it may be possible that the error is caused by a connection attempt with a pre-11 server, GSS encryption having been introduced in v12. This was a problem only with v17 and newer versions; older versions discard the error message already in this case, assuming a failure caused by a lack of support for GSS encryption. Author: Jacob Champion Reviewed-by: Peter Eisentraut, Heikki Linnakangas, Michael Paquier Security: CVE-2024-10977 Backpatch-through: 12 Back-ported-by: reshke <reshke@double.cloud> ====== CBDB source commit is https://git.postgresql.org/cgit/postgresql.git/commit/?h=e6c9454764d880ee30735aa8c1e05d3674722ff9
Full and right outer joins were not supported in the initial implementation of Parallel Hash Join because of deadlock hazards (see discussion). Therefore FULL JOIN inhibited parallelism, as the other join strategies can't do that in parallel either. Add a new PHJ phase PHJ_BATCH_SCAN that scans for unmatched tuples on the inner side of one batch's hash table. For now, sidestep the deadlock problem by terminating parallelism there. The last process to arrive at that phase emits the unmatched tuples, while others detach and are free to go and work on other batches, if there are any, but otherwise they finish the join early. That unfairness is considered acceptable for now, because it's better than no parallelism at all. The build and probe phases are run in parallel, and the new scan-for-unmatched phase, while serial, is usually applied to the smaller of the two relations and is either limited by some multiple of work_mem, or it's too big and is partitioned into batches and then the situation is improved by batch-level parallelism. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com
Hash join tuples reuse the HOT status bit to indicate match status during hash join execution. Correct reuse requires clearing the bit in all tuples. Serial hash join and parallel multi-batch hash join do so upon inserting the tuple into the hashtable. Single batch parallel hash join and batch 0 of unexpected multi-batch hash joins forgot to do this. It hadn't come up before because hashtable tuple match bits are only used for right and full outer joins and parallel ROJ and FOJ were unsupported. 11c2d6f introduced support for parallel ROJ/FOJ but neglected to ensure the match bits were reset. Author: Melanie Plageman <melanieplageman@gmail.com> Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/flat/CAMbWs48Nde1Mv%3DBJv6_vXmRKHMuHZm2Q_g4F6Z3_pn%2B3EV6BGQ%40mail.gmail.com
As reported by buildfarm member conchuela, one of the regression tests added by 558c9d7 is having some ordering issues. This commit adds an ORDER BY clause to make the output more stable for the problematic query. Fix suggested by Tom Lane. The plan of the query updated still uses a parallel hash full join. Author: Melanie Plageman Discussion: https://postgr.es/m/623596.1684541098@sss.pgh.pa.us
PostgreSQL originally excluded FULL and RIGHT outer joins from parallel
hash join because of deadlock hazards in the per-batch barrier protocol.
PG 14 resolved this by introducing a dedicated PHJ_BATCH_SCAN phase: one
elected worker emits unmatched inner-side rows after probing, while the
others detach and move on.
In CBDB, distributed execution adds a second dimension: after a full
outer join the unmatched NULL-filled rows may come from any segment, so
the result carries a HashedOJ locus rather than a plain Hashed locus.
This change teaches the parallel planner about that:
- FULL JOIN and RIGHT JOIN are now valid parallel join types in the
distributed planner. Previously they were unconditionally rejected,
forcing serial execution across all segments.
- The HashedOJ locus produced by a parallel full join now carries
parallel_workers, so operators above the join (aggregates, further
joins) can remain parallel.
- A crash that could occur when a parallel LASJ_NOTIN (NOT IN) join
encountered NULL inner keys is fixed. The worker would exit early
but the batch barrier, which was never attached to, would be touched
on shutdown causing an assertion failure.
Example plans (3 segments, parallel_workers=2):
-- FULL JOIN: result locus is HashedOJ with Parallel Workers: 2
EXPLAIN(costs off, locus)
SELECT count(*) FROM t1 FULL JOIN t2 USING (id);
Finalize Aggregate
Locus: Entry
-> Gather Motion 6:1 (slice1; segments: 6)
-> Partial Aggregate
Locus: HashedOJ
Parallel Workers: 2
-> Parallel Hash Full Join
Locus: HashedOJ
Parallel Workers: 2
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on t1
Locus: HashedWorkers
-> Parallel Hash
-> Parallel Seq Scan on t2
Locus: HashedWorkers
-- RIGHT JOIN: when t1 is larger the planner hashes the smaller t2
-- and probes with t1; result locus HashedWorkers
EXPLAIN(costs off, locus)
SELECT count(*) FROM t1 RIGHT JOIN t2 USING (id);
Finalize Aggregate
Locus: Entry
-> Gather Motion 6:1 (slice1; segments: 6)
-> Partial Aggregate
Locus: HashedWorkers
Parallel Workers: 2
-> Parallel Hash Right Join
Locus: HashedWorkers
Parallel Workers: 2
Hash Cond: (t1.id = t2.id)
-> Parallel Seq Scan on t1
Locus: HashedWorkers
-> Parallel Hash
-> Parallel Seq Scan on t2
Locus: HashedWorkers
Performance (3 segments x 2 parallel workers, 6M rows each, 50% overlap):
FULL JOIN parallel: 4040 ms serial: 6347 ms speedup: 1.57x
RIGHT JOIN parallel: 3039 ms serial: 5568 ms speedup: 1.83x
cbdb_parallel.sql: add a new test block covering:
- Parallel Hash Full Join (HashedWorkers FULL JOIN HashedWorkers
produces HashedOJ with parallel_workers=2)
- Parallel Hash Right Join (pj_t1 is 3x larger than pj_t2, so the
planner hashes the smaller pj_t2 and probes with pj_t1; result
locus HashedWorkers)
- Correctness checks: count(*) matches serial execution
- Locus propagation: HashedOJ(parallel) followed by INNER JOIN
produces HashedOJ; followed by FULL JOIN produces HashedOJ
join_hash.sql/out: CBDB-specific adaptations for the upstream parallel
full join test -- disable parallel mode for tests that require serial
plans, fix SAVEPOINT inside a parallel worker context, and update
expected output to match CBDB plan shapes.
MotionLayerState accumulates stats across all MotionNodeEntry instances. Per-node entries already use uint64. The global sum ≥ any individual node, so it overflows first — at 4GB. Fix by widening to uint64. Also fix the debug elog() format specifiers to match.
…resgroup_status This is a defect of the original GPDB. When enabling resource group management and the transaction switches from "Assign" to "Bypass" state, the "num_executed" counter is repeatedly counted.
If we are building with openssl but USE_SSL_ENGINE didn't get set, initialize_SSL's variable "pkey" is declared but used nowhere. Apparently this combination hasn't been exercised in the buildfarm before now, because I've not seen this warning before, even though the code has been like this a long time. Move the declaration to silence the warning (and remove its useless initialization). Per buildfarm member sawshark. Back-patch to all supported branches.
We have two sections in a Makefile - one for CPP_OBJS and one for OBJS. CPP_OBJS use wildcards and src/protos includes bot in CPP_OBJS and in OBJS. So generated gcc string includes multiple items of proto *.o files. That leads to multiple definitions errors in linking time. Do not include proto files in CPP_OBJS macros and use it in OBJS macros.
Some functions are used in the tree and are currently marked as deprecated by upstream. This commit refreshes the code to use the recommended functions, leading to the following changes: - xmlSubstituteEntitiesDefault() is gone, and needs to be replaced with XML_PARSE_NOENT for the paths doing the parsing. - xmlParseMemory() -> xmlReadMemory(). These functions, as well as more functions setting global states, have been officially marked as deprecated by upstream in August 2022. Their replacements exist since the 2001-ish area, as far as I have checked, so that should be safe. Author: Dmitry Koval Discussion: https://postgr.es/m/18274-98d16bc03520665f@postgresql.org
ORCA's window frame translation always emits a BETWEEN frame (start + end bound), so include FRAMEOPTION_BETWEEN alongside FRAMEOPTION_NONDEFAULT to match the executor's expectations.
…ated host (#1702) * Fix null dereference on dedicated hot standby coordinator getCdbComponentInfo() populates hostPrimaryCountHash with primary hosts only. When IS_HOT_STANDBY_QD() is true, mirror and standby hosts are also looked up in the hash but return NULL on dedicated standby nodes that host no primary segments. Replace Assert(found) with a null-safe check to prevent SIGSEGV.
…nsumer (#1719) * orca: fallback to Postgres optimizer on cross-slice replicated CTE Consumer. Inspired by greengage 51fe92e: before Expr->DXL translation, walk the physical tree and track which slice each CTE Producer and Consumer lives on. If a Consumer is on a different slice than its Producer and the Producer's distribution is replicated, force a fallback to the Postgres optimizer. The replicated filter is essential: ordinary cross-slice CTE plans (non-replicated Producer with Gather/Redistribute Consumer) are a normal ORCA pattern and must not trigger fallback. 51fe92e doesn't trigger when a CTE over a replicated table is referenced from a scalar subquery, so the query hangs. This commit replaces the single-point check with a whole-tree walker that catches both cases. Tests: shared_scan adds a scalar-subquery reproducer guarded by statement_timeout. qp_orca_fallback adds two cases over a replicated CTE: a scalar-subquery form that triggers the walker (the hang case 51fe92e missed -- fallback to Postgres), and the original 51fe92e JOIN form where ORCA emits a safe plan with a One-Time Filter (gp_execution_segment() = N) and the walker correctly stays silent (guards against false positives). (cherry picked from commit open-gpdb/gpdb@3a9aebf)
Update Go version to 1.25.10 across all development Docker images for Rocky Linux 8/9/10 and Ubuntu 22.04/24.04. Changes: - Go version: go1.24.13 -> go1.25.10 - Updated SHA256 checksums for linux-amd64 and linux-arm64 archives See: apache/cloudberry-go-libs#19 (comment)
A scalar (plain) aggregate with no grouping columns always emits exactly one row regardless of input cardinality. Predicates above it (from a HAVING clause) filter that output row, so they cannot be moved onto the aggregate's input without changing semantics: SELECT count(*) FROM t HAVING false -- 0 rows SELECT count(*) FROM t WHERE false -- 1 row (count=0) CNormalizer::FPushable previously only blocked pushing volatile predicates below a GbAgg. Any other predicate -- including a constant false -- was considered pushable because its used-column set was trivially contained in the aggregate's output columns. The normalizer then routed the Select's predicate through the GbAgg and down into its logical child, dropping HAVING semantics for scalar aggregates.
Add comprehensive parallel table scan capability to GPORCA optimizer, enabling worker-level parallelism within segments for improved query performance on large table scans. Key components: - New CPhysicalParallelTableScan operator and CDistributionSpecWorkerRandom distribution specification for worker-level data distribution - CXformGet2ParallelTableScan transformation with parallel safety checks (excludes CTEs, dynamic scans, foreign tables, replicated tables, etc.) - Cost model integration with parallel_setup_cost and efficiency degradation scaling (logarithmic based on worker count) - DXL serialization/deserialization for CDXLPhysicalParallelTableScan - Plan translation to PostgreSQL SeqScan nodes with parallel_aware=true - Rewindability constraints (parallel scans are non-rewindable) - GUC integration: max_parallel_workers_per_gather controls worker count
Add libicu-devel package to Rocky Linux 8, 9, and 10 Dockerfiles to provide ICU (International Components for Unicode) library support required for PostgreSQL 16 kernel compilation. This dependency is already present in Ubuntu 22.04 and Ubuntu 24.04 development images, ensuring consistency across all supported build platforms for PostgreSQL 16 compilation requirements.
The command `gppkg --clean` fails with the following error: "'SyncPackages' object has no attribute 'ret'". This occurs because `operations` was being passed positionally during the OperationWorkerPool initialization, which incorrectly bound it to the `should_stop` argument instead of `items` in the base WorkerPool class. The solution is to pass `operations` as a keyword argument..
* Fix: FDW OPTIONS encoding accepts symbolic names (issue #1726) Both the FDW catalog reader (src/backend/access/external/external.c) and the gp_exttable_fdw option validator (gpcontrib/gp_exttable_fdw/option.c) parsed the "encoding" OPTIONS value with atoi(). atoi("UTF8") returns 0 (PG_SQL_ASCII) and PG_VALID_ENCODING(0) is true, so symbolic names like 'UTF8', 'utf-8', 'GBK' silently fell through validation and were stored as SQL_ASCII at read time. By contrast, the legacy CREATE EXTERNAL TABLE ... ENCODING ... path resolves names via pg_char_to_encoding() and persists a numeric form into OPTIONS — only the FDW OPTIONS entry point bypassed that translation. Add a small shared helper parse_fdw_encoding_option(const char *) in src/backend/access/external/external.c (declared in src/include/access/external.h): - first try pg_char_to_encoding(name) — same logic as the legacy path; - otherwise try a strict numeric form via strtol() with end-of-string and PG_VALID_ENCODING() checks (atoi is intentionally avoided, since atoi("UTF8")==0 is the bug being fixed); - otherwise ereport(ERROR). Both the validator and GetExtFromForeignTableOptions() call this helper. On-disk values in pg_foreign_table.ftoptions are stored verbatim as the user wrote them; correctness is established at read time. This avoids a ProcessUtility_hook approach, which is unworkable here because the extension's _PG_init runs lazily on the first dlopen, after the current statement's hook check has already passed. Affected scope: gp_exttable_fdw (used by gp_exttable_server). The standalone pxf_fdw is unaffected — its validator already routes encoding through ProcessCopyOptions, which is name-aware. Behavior change on upgrade: existing rows whose ftoptions literally contain encoding=<name> have, until now, been silently interpreted as SQL_ASCII. After this fix they are interpreted as the named encoding. This will be called out in the release notes; a detection query is provided in the PR description for operators who wish to pin specific tables to numeric form before upgrade. Tests added in gpcontrib/gp_exttable_fdw/{input,output}/gp_exttable_fdw.source cover encoding '6' / 'UTF8' / 'utf-8' / 'GBK' / 'bogus' and an ALTER FOREIGN TABLE ... OPTIONS (SET encoding 'UTF8') path. The pre-existing encoding '-1' error case has its expected error message updated to match the new helper's wording. * test: pad expected output headers to match psql separator widths The new tests added in the previous commit had column header lines without the trailing-space padding that psql's aligned output emits to match the separator. The pre-existing ext_special_uri header (' a | b') was also unintentionally stripped of its trailing space during the same edit. Pure whitespace fix. No behavior change. * test: drop trailing blank line in gp_exttable_fdw expected output pg_regress diffs the expected and actual .out files strictly, including the final newline count. The new encoding test block ended with a stray empty line (";\n\n") while psql produces ";\n", causing a 1-line diff at end-of-file. Pure whitespace fix. * test: reject mixed numeric+letters in FDW encoding option Add a regression case for `encoding '6abc'`. atoi("6abc") would have silently returned 6 (= UTF8), which is the class of bug that motivated moving the FDW encoding option parser off atoi() and onto a strict strtol() form in parse_fdw_encoding_option(). Without this test, the strictness of the numeric path was not directly exercised — only the "unknown name" path ('bogus') was. Pure test addition; no code change. Lands the third of the reviewer's suggestions on issue #1726 (the first two — strict strtol parsing and a single shared helper between the validator and the read path — were already in place in the original fix commit). * ci: retrigger to clear flaky alter_distribution_policy --------- Co-authored-by: chenqiang <chenqiang@hashdata.cn>
ClearAOCSFileSegInfo/ClearFileSegInfo (called from ao_vacuum_rel_recycle_dead_segments) updates pg_aoseg rows via simple_heap_update, which assigns the current CommandId to the new tuple. AppendOptimizedTruncateToEOF then opens a catalog snapshot via GetCatalogSnapshot, which also uses GetCurrentCommandId. Because both operations share the same CommandId, the just-zeroed rows are invisible to the snapshot (cid >= snapshot->curcid), while the old rows with their original non-zero EOF values remain visible. TruncateAOSegmentFile then sees a 0-byte physical file but a non-zero logical EOF and raises: "file size smaller than logical eof" Advancing the command counter before AppendOptimizedTruncateToEOF ensures the zeroed rows are visible to its catalog snapshot (their cid is now strictly less than the new curcid). Fixes: #1746
ReleaseSysCache(htup) was called before NameStr(staForm->stxname) was read, returning a pointer into the already-released tuple buffer. Copy the name with pstrdup() first, then release the cache entry.
This PR fixes the recovery flow when the internal WAL replication slot does not already exist on the source segment. Before this change, both gpsegrecovery and gpconfigurenewsegment would start pg_basebackup first and only retry with slot creation after the backup failed. In practice, that meant a full base backup could run for a long time and then fail at the end because the slot was missing. This change fixes that at the root: adds a shared helper to check whether the replication slot already exists creates the slot up front when needed, before pg_basebackup starts removes the fallback second pg_basebackup attempt from both recovery paths updates unit tests to cover the new behavior and the new failure mode --------- Co-authored-by: Leonid <63977577+leborchuk@users.noreply.github.com>
When expanding a cluster, gpexpand copies the postgresql.conf file directly from the template segment (content 0). This causes issues for tools like wal-g which use a --content-id flag in archive_command and restore_command. Previously, new segments inherited --content-id=0 from the template. This caused them to push WAL segments to the wrong location, potentially overwriting segment 0's segments. This fix ensures the content ID in archive_command and restore_command is updated to match the new segment's ID during expansion. If the commands do not contain the --content-id flag, they remain unchanged.
Historically Yandex Greenplum allows non-superuser no managed resource groups. So, a regular non-superuser role allowed to run pg_resgroup_move_query(), and tune CPU/memory limits if granted with mdb_admin. Such feature was introduced as early as 6.22, see also gpdb commit 3ac99962. This commit introduces same feature for managed Cloudberry. To disallow altering predefined roles, fixed-OID hardening is used, reserving 8067 OID to be an mdb_admin role OID. We choose this (efficiently a catalog change) over complex bookkeeping what CREATEROLE can do and what is disallowed. We use Yandex managed predefined roles bootstrap util via auxiliary contrib extension, based on what Yandex Postgres fork does, see also pg-sharding/cpg repo. Co-authored-by: Andrey Borodin <x4mmm@yandex-team.ru> Co-authored-by: reshke <reshke@double.cloud>
9b7e2b2 to
9768d92
Compare
reshke
approved these changes
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It's the set of remaining commits from PG14_ARCHIVE branch that weren't commited in REL_2_STABLE
I created special rebased PG_14_ARCHIVE branch without breaking catalog compatibility changes (or depend on removed commits):
Generate gp_ view for desired pg_ system views
Fix system_views_gp.in and fix test query_conflict
Fix names of pg_stat_all_tables|indexes
Add gp_stat_progress_%_summary system views
Add gp summary system views
Fix: Adapt system view after cherry-pick
[PAX stats test: wait for seq_tup_read before exiting wait_for_stats()]
(047ed41)
Added new commits to fix tests:
Fix: change main version back to 2