Skip to content

ADBDEV-2281 ADB 5.29.1 sync#285

Merged
deart2k merged 19 commits intoadb-5.xfrom
5.29.1-sync
Nov 23, 2021
Merged

ADBDEV-2281 ADB 5.29.1 sync#285
deart2k merged 19 commits intoadb-5.xfrom
5.29.1-sync

Conversation

@deart2k
Copy link
Member

@deart2k deart2k commented Nov 19, 2021

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

Lisa Owen and others added 18 commits September 29, 2021 13:41
* docs - ditamap updates for 5.29 release

* update oss book subnavs
Co-authored-by: David Yozie <dyozie@pivotal.io>
Before current commit we initailized pg_aocsseg entries with frozen
tuples to make these entries implicitly visible even after rollback.
It is a working approach for AO tables, but AOC tables store additional
"vpinfo" structure with serialized information about every column EOF.
This can cause inconsistency between "vpinfo" structure in pg_aocsseg
entries and the actual number of columns in a table when we modify the
amount of columns under explicit transaction and rollback it later.
As a result we fail on asserts in "GetAllAOCSFileSegInfo_pg_aocsseg_rel()"
or (in a case of a build without asserts) in some unpredictable places
in the code when retrieve metadata from pg_aocsseg.

As a fix this commit inserts simple tuples to pg_aocsseg but still insert
frozen tuples to gp_fastsequence (as it generates row numbers for segment
files and should never be rollbacked to avoid inconsistency in index pointers
of the AO/AOC tables).

Co-authored-by: Vasiliy Ivanov <ivi@arenadata.io>
Co-authored-by: Denis Smirnov <sd@arenadata.io>
This relaxes the need to have Debug_filerep_print on for cases where the
resync manager skips a relfile for resync. There are some improvements
made to these log messages along the way.

Additionally, we now log by default whenever a resync worker updates a
resync hash table entry as completed.

This is done in response to bug reports where a full recovery, following
an interrupted resync from an incremental recovery, has inadvertently
skipped over scheduling relfiles. The relaxed logging can give us clues
as to why certain relfiles were skipped - as all files should be
scheduled for resync, when running a full recovery.

Co-authored-by: Ashwin Agrawal <aashwin@vmware.com>
This was introduced in aa6c213. fsObjName is not initialized before
the logging statement. Use relFileNode instead to log the persistent
object being skipped.

This was causing PANICs such as:
Unexpected persistent file-system object type: 17419088
We were using the passed in entry in the log statement. Use the updated
entry instead, as the passed in entry might not have all of its fields
populated - as is the case for the callsite in
FileRepPrimary_ResyncBufferPoolIncrementalWrite().

Also, in case the lookup returned NULL, we used to log as if nothing
happened. So, we add an appropriate WARNING here.

Co-authored-by: Jimmy Yih <jyih@pivotal.io>
Remove behave step "the database is killed on hosts mdw,sdw1,sdw2" in
favor of "the database is not running. This shuts down the databse more
cleanly, and avoids any potential race conditons.

(cherry picked from commit cb5d0ba)

This was causing the gpaddmirrors behave tests to flake with the
following error during gpinitsystem:

gpinitsystem:mdw:gpadmin-[FATAL]:-Host sdw1 has an active database process on port = 20500 Script Exiting!:

Conflicts:

* Decided to keep:
@given('the database is killed on hosts "{hostnames}"')
since it's still used by gpexpand tests. Removing that can be tackled in
a separate commit.

* Removed the rm -rf /tmp/gpaddmirrors/* statements - we need the
datadirs to gpstop: Incidentally, this was also removed n 6X commit:
76a8111

* check_database_is_running() was getting the wrong value from
master_data_directory. Using os.environ['MASTER_DATA_DIRECTORY']
instead.

Co-authored-by: Jimmy Yih <jyih@vmware.com>
When primary and unique key constraints are added to a partition
hierarchy with an ALTER TABLE ADD statement on the root table, the
auto-generated name of the constraint differs on the partition root vs
the partition children.  As a result, when the same constraint is
dropped from the parent, the constraint is NOT dropped from the
partition children. To make matters worse, any attempt at dropping the
constraint from the children directly, results in:
ERROR:  can't drop a constraint from <child_table_name>; it is part of a partitioned table
HINT:  You may be able to perform the operation on the partitioned table as a whole.

The above was first raised in issue: #3750 and was fixed in 6X+ by #7047
which employs large changes to constraint and partition table
infrastructure, making it not viable to backport.

Furthermore, renaming the constraints is not viable as a workaround, as
renaming constraints is not supported in 5X.

Thus, this commit adds gp_enable_drop_key_constraint_child_partition, a
GUC that provides the user a way to drop the primary/unique key
constraint directly from the partition children.

Note: the GUC does not apply to:
i) CHECK constraints - they don't have the naming inconsistency
ii) FOREIGN KEY constraints - they are not auto-created on partition
children.
iii) NOT NULL constraints = they are not auto-created on partition children.

Furthermore, the GUC should not be necessary for primary/unique key
constraints added at table creation time - as any constraints added at
table creation time don't suffer from the naming inconsistency described
above. Thus dropping such constraints from the root should automatically
drop the constraints from the children.

Co-authored-by: Jimmy Yih <jyih@vmware.com>
Our getnameinfo() replacement implementation in getaddrinfo.c failed
unless NI_NUMERICHOST and NI_NUMERICSERV were given as flags, because
it doesn't resolve host names, only numeric IPs.  But per standard,
when those flags are not given, an implementation can still degrade to
not returning host names, so this restriction is unnecessary.  When we
remove it, we can eliminate some code in postmaster.c that apparently
tried to work around that.

Backported from upstream Postgres:
postgres/postgres@c424d0d

There were some minor merge conflicts because of a previous backport:
https://github.com/greenplum-db/gpdb/commit/a9768e32a0a310bd7e7a45c089898542670bf2c2
…too.

Per testing of previous patch.

Backported from upstream Postgres:
postgres/postgres@1997f34
There was a regression caused by a previous incomplete backport that
made it so that invalid WARNING logs were being logged when
log_hostname GUC was enabled and a user connects via AF_UNIX
socket. As part of completing the backport to fix the issue, a change
was made to getnameinfo_unix() to remove an unnecessary
restriction. Add simple CMockery unit test to prevent future
regressions.

Example of WARNING log message:
pg_getnameinfo_all() failed: Non-recoverable failure in name resolution

Incomplete backport GPDB commit:
https://github.com/greenplum-db/gpdb/commit/a9768e32a0a310bd7e7a45c089898542670bf2c2

Related missing upstream Postgres patch that fixes the issue:
postgres/postgres@1997f34
The ip.t CMockery unit test was failing on SuSE because the
sockaddr_un struct is different on SuSE compared to other Linux
operating systems. Fix the issue by just using the sockaddr_un struct
directly to set the sun_path.

GPDB commit reference:
https://github.com/greenplum-db/gpdb/commit/fc13ac07b59a33a80ff35d6a1d1dccf17ff69cea
@deart2k deart2k requested review from Stolb27 and maksm90 November 19, 2021 13:08
Copy link
Collaborator

@Stolb27 Stolb27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maksm90, please take a look in the morning. I've requested disabling building on power for adb-5.x. Failed isolation test is a known issue that is going to be resolved in https://arenadata.atlassian.net/browse/ADBDEV-2154.

@Stolb27
Copy link
Collaborator

Stolb27 commented Nov 23, 2021

@deart2k, it's ready to merge

@deart2k deart2k merged commit bd04fb7 into adb-5.x Nov 23, 2021
@deart2k deart2k deleted the 5.29.1-sync branch November 23, 2021 04:41
RekGRpth added a commit that referenced this pull request Mar 2, 2026
There was a 11 years old bug in the python function fetching postmaster
PID for remote host. The bash command had unescaped $ symbol. As a
result part of awk command was interpolated as bash argument before ssh.
So awk do nothing (`awk {print }`), full ps output grepped by pid and
any part of string could match, e.g. postmaster port. Moreover, even PID
could be matched as a substring for longest PIDs. The last problem
affected function for local search too. So sometimes gprecoverseg could
kill arbitrary postgres processes on the same segment host.

We simplify bash command with proper escaping and cover these functions
by unit tests

Ticket: GG-227

Co-authored-by: Georgy Shelkovy <g.shelkovy@arenadata.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants