Avoid using pg_locks with distributed tables#291
Merged
Conversation
InnerLife0
reviewed
Jan 10, 2022
In upstream Postgres, pg_locks exposes part of the lock manager so that
DBA's can inspect locks taken by various backends. In Greenplum, we
modified pg_lock_status() -- the function that underlies pg_locks -- to
a) provide additional Greenplum-specific information (e.g.
mppsessionid); and
b) aggregate the locks from master and all primary segments.
One consequence of the implementation we chose to achieve point b above,
is that queries that involve both pg_locks and a distributed table won't
work. If you're lucky (planner or ORCA places the function call in the
top slice), it "won't work" and throws an error like this at you:
ERROR: query plan with multiple segworker groups is not supported
If you're not lucky (planner or ORCA schedules the function call on a
different slice running on master), it most likely secretly doesn't do
point b and only returns locks from the master.
Before we fix pg_locks, rewrite the isolation2 test case "starve_case"
to separate the repeated queries in pg_locks from the main query. This
does logically the same thing in PL/pgSQL but it's now safe.
This commit also adds starve_case back to the isolation2 schedule.
(cherry picked from commit 2ff0127)
For a PL/pgSQL function like the following:
set optimizer_trace_fallback to on;
CREATE OR REPLACE FUNCTION boom()
RETURNS bool AS $$
DECLARE
mel bool;
sesh int[];
BEGIN
sesh := '{42,1}'::int[]; -- query 1
select c = ANY (sesh) INTO mel FROM (values (42), (0)) nums(c); -- query 2
return mel;
END
$$
LANGUAGE plpgsql VOLATILE;
SELECT boom();
With Orca enabled, the database crashes. Starting in 9.2, PL/pgSQL
supplies bound param values in more statement types to enable planner to
fold constants in more cases. This is in contrast to leaving the param
intact and waiting until execution to substitute it with its values.
Previously, only dynamic execution ("EXECUTE 'SELECT $1' USING sesh")
gets this treatment. This revealed the bug because Orca would not have
been able to plan queries whose query trees included params that were
not in subplans (external params) and would just fall back.
When query 1 is planned, it is translated into select '{42,1}'::int[];
For uninteresting reasons, the planner-produced plan for query 1 is
considered "simple", and the ORCA-produced plan is considered regular
(not simple). PL/pgSQL has a fast-path for "simple" plans, minimally
starting the executor via `ExecEvalExpr`. Regular plans are executed
through SPI. During execution, SPI will pack (as part of
`heap_form_tuple`) the 4-byte header datum into a 1-byte header datum.
While planning query 2, we will attempt to substitute the param "sesh"
with the actual const value during pre-processing. Since Orca doesn't
recognize const arrays as arrays, the translator will take the
additional step of translating the const into an array expression. When
accessing the array-typed const, we need to "unpack"
(`DatumGetArrayTypeP`) the datum. This commit does that.
Co-authored-by: Melanie Plageman <mplageman@pivotal.io>
(cherry picked from commit c417d2b)
e4e7960 to
89a40df
Compare
InnerLife0
approved these changes
Jan 11, 2022
hilltracer
pushed a commit
that referenced
this pull request
Mar 6, 2026
Both utils used outdated version of method pgdb.connect(). The patch changes the way pgdb.connect() is used by avoiding usage of parameter which later gets parsed. Instead both utils now use parameters of the same names.
Stolb27
added a commit
that referenced
this pull request
Mar 10, 2026
The following points have been fixed: 1. PyGreSQL 5 has added support for converting additional data types. Analyzedb: Converting datetime to a string for correct comparison with the value saved in the file. el8_migrate_localte.py, gparray.py, gpcatalog.py and gpcheckcat: using the Bool type instead of comparing with a string. gpcheckcat, repair_missing_extraneous.py and unique_index_violation_check: using python list instead of string parsing. 2. PyGreSQL 5 added support for closing a connection when using the with construct. Because of this, in a number of places, reading from the cursor took place after the connection was closed. 3. PyGreSQL 5 does not end the transaction if an error occurs, which leads to a possible connection leak if an error occurs in the connect function. So catch errors that happen in the connect function. 4. Add closure of the connection saved in context after the scenario in behave tests. 5. Add closure to the connection if it does not return from the function. 6. Use the python wrapper for the connect function instead of C one. 7. Use a custom cursor to disable row postprocessing to avoid correcting a large amount of code. 8. Fix the bool and array format in isolation2 tests. 9. Add notifications processing to isolation2 tests. 10. Also fix the notifications processing in the resgroup_query_mem test. 11. Fix the notifications processing in gpload. 12. Fix pg_config search when building deb packages. 13. Fix gpexpand behave tests (#176) The previous commit added a few regressions. The regression was related to replacing the comparison condition of the comparison with 't' with a truth check. This change is due to the fact that in PyGreSQL 5, unlike the 4th version, it converts the bool values. But it was not taken into account that such values can be set in Python code. The error of calling verify in TestDML has also been fixed. The verify method was called without passing a connection, and although the verify implementation in the class itself does not require a connection, this function may be overloaded in a child class. 14. Fix PyGreSQL install to be compatible with both python versions (#183) PyGreSQL install works in Python 2 but breaks in Python 3 because the _pg extension must be importable as a top-level module (e.g. from _pg import *). Python 3 resolves extension modules via sys.path, so _pg*.so has to be located at the sys.path root, not only inside the pygresql/ package directory. Move _pg*.so from pygresql directory to the top-level, so the same install layout works for both Python versions. Update _pg*.so RPATH to match its installed location so dpkg-shlibdeps can resolve libpq.so during Debian packaging. 15. Fix Python unit tests after PyGreSQL update (#222) - test_it_retries_the_connection: use mock object that support context managment - GpArrayTestCase: use bool type instead str 't'/'f' - GpCheckCatTestCase: check connection in DbWrapper. - DifferentialRecoveryClsTestCase and GpStopSmartModeTestCase: mock GgdbCursor to return connection. - RepairMissingExtraneousTestCase and UniqueIndexViolationCheckTestCase: use python arrays instead of string representation of Postgres arrays. Also fix seg ids set in get_segment_to_oid_mapping. Since seg ids in issues are now ints, we do not need to cast all_seg_ids array elements to strings. 16. Move PyGreSQL code to submodule (#269) It would be nice to avoid patching this module. Also this patch fixes Greengage installation scripts for PyGreSQL to support non-root release builds over DESTDIR. It was a problem of Greengage, non PyGreSQL. Additionally, include the new PyGreSQL license to the NOTICE file 17. Fix minirepro and gpsd utility for PyGreSQL-5.2.5 (#291) Both utils used outdated version of method pgdb.connect(). The patch changes the way pgdb.connect() is used by avoiding usage of parameter which later gets parsed. Instead both utils now use parameters of the same names. Co-authored-by: Denis Garsh <d.garsh@arenadata.io> Co-authored-by: Vasiliy Ivanov <ivi@arenadata.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In upstream Postgres, pg_locks exposes part of the lock manager so that
DBA's can inspect locks taken by various backends. In Greenplum, we
modified pg_lock_status() -- the function that underlies pg_locks -- to
a) provide additional Greenplum-specific information (e.g.
mppsessionid); and
b) aggregate the locks from master and all primary segments.
One consequence of the implementation we chose to achieve point b above,
is that queries that involve both pg_locks and a distributed table won't
work. If you're lucky (planner or ORCA places the function call in the
top slice), it "won't work" and throws an error like this at you:
ERROR: query plan with multiple segworker groups is not supported
If you're not lucky (planner or ORCA schedules the function call on a
different slice running on master), it most likely secretly doesn't do
point b and only returns locks from the master.
Before we fix pg_locks, rewrite the isolation2 test case "starve_case"
to separate the repeated queries in pg_locks from the main query. This
does logically the same thing in PL/pgSQL but it's now safe.
This commit also adds starve_case back to the isolation2 schedule.
(cherry picked from commit 2ff0127)
Here are some reminders before you submit the pull request
make installcheck