Skip to content

Release 6.16.3 arenadata22#222

Merged
Stolb27 merged 7 commits intoadb-6.xfrom
6.16.3_arenadata22
Jul 9, 2021
Merged

Release 6.16.3 arenadata22#222
Stolb27 merged 7 commits intoadb-6.xfrom
6.16.3_arenadata22

Conversation

@Stolb27
Copy link
Collaborator

@Stolb27 Stolb27 commented Jul 9, 2021

PR to 6X master to test fixes together.

Fixed:

  1. ADBDEV-1710 Fix catalog snapshot AO/AOCO data corruption
  2. ADBDEV-1532 Detect client disconnection while running query and immediately interrupt its execution
  3. ADBDEV-1729 Fix segfault on execution of multilevel correlated queries

darthunix and others added 7 commits June 25, 2021 10:07
In 5X a snapshot satisfies "now" was used when transaction extracted
AO/AOCO segment file eof. In other words "now" allowed us to see all
committed transactions. When this code was ported to 6X a catalog
snapshot was chosen for this purpose. As a result we don't see committed
transactions that started after the snapshot has been taken. As a
result a transaction can change segment file eof between the moment
a snapshot was taken and an exclusive file segment lock was hold.
So AO/AOCO in 6X suffers from data corruption in concurrent DML
queries.

This is the first commit demonstrating the problem (all added tests
should fail).
Previous tests showed how to get the AO/AOCO data corruption in a
mixed mode - cluster-wide queries were mixed with utility mode.
New test prove that we get the same corruption on concurrent cluster-
wide queries as well.
In 5X a snapshot satisfies "now" was used when transaction extracted
AO/AOCO segment file eof. In other words "now" allowed us to see all
committed transactions. When this code was ported to 6X a catalog
snapshot was chosen for this purpose. As a result we don't see committed
transactions that started after the snapshot has been taken. As a
result a transaction can change segment file eof between the moment
a snapshot was taken and an exclusive file segment lock was hold.
So AO/AOCO in 6X suffers from data corruption in concurrent DML
queries.

This commit fixes the problem using snapshot self instead of catalog
snapshot - it allows us to see all committed data as "now" in 5X did.
Added a new test to demonstrate that we don't have anomalies after
we prepared a two phase transaction on QE when use SnapshotSelf.
Wrong snapshot for EOF corrupts data in AO/AOCO
…rupt its execution (#198)

Fixes the case when the server is executing a lengthy query and the
client breaks the connection. The operating system will be aware that
the connection is no more, but postgres node doesn't notice this,
because it doesn't try to read from or write to the socket while running
query. So we'll get a zombie connection. In theory, the query could be
one that runs for a million years, continues to chew up CPU and I/O and
occupies a connection slot that's sad. Worse still, a sent query might
be modifiable and not return any data, then it might be surprising for
disconnected client that his previously sent modification will be
accepted at some point later - at completion of execution. For these
reasons, the query have to be interrupted as much earlier as possible.

The patch provides a new GUC check_client_connection_interval that can
be used to periodically check via CLIENT_CHECK_CONNECTION_TIMEOUT
interrupts whether the client connection has gone away, while running
very long queries. It is disabled by default.
For non-locking check of socket state the patch uses a non-standard
Linux extension (also adopted by at least one other OS) - POLLRDHUP
option that is not defined by POSIX.

Backport from PostgreSQL commits:
 - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c30f54ad732ca5c8762bb68bbe0f51de9137dd72
 - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=22f6f2c1ccb56e0d6a159d4562418587e4b10e01
Execution of multilevel correlated queries with high level of nesting can cause segfault(when using array_agg, json_agg) or can provide wrong results (when using classic aggs like sum()). Due to some GP limitations, correlated subqueries with skip-level correlations are not supported. Additional check condition is provided to prevent such queries from planning. QueryHasDistributedRelation function, used by this check, doesn't recurse over subplans and may return wrong results for distributed RTE_RELATION entries hided by RTE_SUBQUERY entries.
Commit fixes such behavior by adding optional recursion to QueryHasDistributedRelation function. Additional regression test is included. Additional information can be found at issue #12054.
@Stolb27 Stolb27 merged commit 3d6cc3d into adb-6.x Jul 9, 2021
@Stolb27 Stolb27 deleted the 6.16.3_arenadata22 branch July 9, 2021 13:16
hilltracer pushed a commit that referenced this pull request Mar 6, 2026
- test_it_retries_the_connection: use mock object that support context managment
- GpArrayTestCase: use bool type instead str 't'/'f'
- GpCheckCatTestCase: check connection in DbWrapper.
- DifferentialRecoveryClsTestCase and GpStopSmartModeTestCase: mock GgdbCursor
to return connection.
- RepairMissingExtraneousTestCase and UniqueIndexViolationCheckTestCase: use
python arrays instead of string representation of Postgres arrays.

Also fix seg ids set in get_segment_to_oid_mapping. Since seg ids in issues are
now ints, we do not need to cast all_seg_ids array elements to strings.
Stolb27 added a commit that referenced this pull request Mar 10, 2026
The following points have been fixed:
1. PyGreSQL 5 has added support for converting additional data types.
Analyzedb: Converting datetime to a string for correct comparison with the value
saved in the file.
el8_migrate_localte.py, gparray.py, gpcatalog.py and gpcheckcat: using the Bool
type instead of comparing with a string.
gpcheckcat, repair_missing_extraneous.py and unique_index_violation_check:
using python list instead of string parsing.
2. PyGreSQL 5 added support for closing a connection when using the with
construct. Because of this, in a number of places, reading from the cursor took
place after the connection was closed.
3. PyGreSQL 5 does not end the transaction if an error occurs, which leads to a
possible connection leak if an error occurs in the connect function. So catch
errors that happen in the connect function.
4. Add closure of the connection saved in context after the scenario in behave
tests.
5. Add closure to the connection if it does not return from the function.
6. Use the python wrapper for the connect function instead of C one.
7. Use a custom cursor to disable row postprocessing to avoid correcting a large
amount of code.
8. Fix the bool and array format in isolation2 tests.
9. Add notifications processing to isolation2 tests.
10. Also fix the notifications processing in the resgroup_query_mem test.
11. Fix the notifications processing in gpload.
12. Fix pg_config search when building deb packages.
13. Fix gpexpand behave tests (#176) The previous commit added a few
    regressions. The regression was related to replacing the comparison
    condition of the comparison with 't' with a truth check. This change
    is due to the fact that in PyGreSQL 5, unlike the 4th version, it
    converts the bool values. But it was not taken into account that
    such values can be set in Python code. The error of calling verify
    in TestDML has also been fixed. The verify method was called without
    passing a connection, and although the verify implementation in the
    class itself does not require a connection, this function may be
    overloaded in a child class.
14. Fix PyGreSQL install to be compatible with both python versions
    (#183) PyGreSQL install works in Python 2 but breaks in Python 3
    because the _pg extension must be importable as a top-level module
    (e.g. from _pg import *). Python 3 resolves extension modules via
    sys.path, so _pg*.so has to be located at the sys.path root, not
    only inside the pygresql/ package directory. Move _pg*.so from
    pygresql directory to the top-level, so the same install layout
    works for both Python versions. Update _pg*.so RPATH to match its
    installed location so dpkg-shlibdeps can resolve libpq.so during
    Debian packaging.
15. Fix Python unit tests after PyGreSQL update (#222)
    - test_it_retries_the_connection: use mock object that support
      context managment
    - GpArrayTestCase: use bool type instead str 't'/'f'
    - GpCheckCatTestCase: check connection in DbWrapper.
    - DifferentialRecoveryClsTestCase and GpStopSmartModeTestCase: mock
      GgdbCursor to return connection.
    - RepairMissingExtraneousTestCase and
      UniqueIndexViolationCheckTestCase: use python arrays instead of
      string representation of Postgres arrays. Also fix seg ids set in
      get_segment_to_oid_mapping. Since seg ids in issues are now ints,
      we do not need to cast all_seg_ids array elements to strings.
16. Move PyGreSQL code to submodule (#269)
    It would be nice to avoid patching this module. Also this patch
    fixes Greengage installation scripts for PyGreSQL to support
    non-root release builds over DESTDIR. It was a problem of Greengage,
    non PyGreSQL. Additionally, include the new PyGreSQL license to the
    NOTICE file
17. Fix minirepro and gpsd utility for PyGreSQL-5.2.5 (#291)
    Both utils used outdated version of method pgdb.connect(). The patch
    changes the way pgdb.connect() is used by avoiding usage of
    parameter which later gets parsed. Instead both utils now use
    parameters of the same names.

Co-authored-by: Denis Garsh <d.garsh@arenadata.io>
Co-authored-by: Vasiliy Ivanov <ivi@arenadata.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants