[6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter#170
Merged
[6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter#170
Conversation
Collaborator
|
@darthunix, can you rebase it to run tests? |
Stolb27
previously approved these changes
Apr 9, 2021
NOT IN is equivalent to "<> ALL" SQL expression and should be replaced with LAS-Apply in CSubqueryHandler::FRemoveAllSubquery(). Before current commit ORCA transformed "<> ALL" subquery in filter incorrectly. A query select * from R where R.a <> ALL (select S.b from S) was translated to select * from R LAS-APPLY-Not-In (select S.b from S where S.b = R.a) on true; This plan returns wrong results when S.b is NULL. To handle it (S.b = R.a) should be checked with IS_NOT_FALSE expression. For example in https://github.com/greenplum-db/gpdb/issues/10967 create table t1(a int) distributed by (a); insert into t1 values (1), (2), (3); set optimizer_join_order='query'; select * from t1 where a not in ( select b.a from t1 a left join t1 b on false ); returned incorrect (1), (2), (3) result on ORCA and correct empty result on postgres planner. The root of it was in FRemoveAllSubquery(). Previously this function transformed +--CScalarSubqueryAll(<>)["a" (16)] origin: [Grp:6, GrpExpr:0] |--CLogicalLeftOuterJoin origin: [Grp:4, GrpExpr:0] | |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..] | |--CLogicalConstTableGet Columns: ["a" (16),..] | +--CScalarConst (1) origin: [Grp:3, GrpExpr:0] +--CScalarIdent "a" (0) origin: [Grp:5, GrpExpr:0] to +--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16)) |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..] |--CLogicalSelect | |--CLogicalLeftOuterJoin origin: [Grp:4, GrpExpr:0] | | |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..] | | |--CLogicalConstTableGet Columns: ["a" (16),..] | | +--CScalarConst (1) origin: [Grp:3, GrpExpr:0] | +--CScalarCmp (=) | |--CScalarIdent "a" (0) origin: [Grp:5, GrpExpr:0] | +--CScalarIdent "a" (16) +--CScalarConst (1) and didn't add IS_NOT_FALSE check as we should expect. Now we get a correct logical tree with empty result: +--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16)) |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..] |--CLogicalSelect | |--CLogicalLeftOuterJoin origin: [Grp:4, GrpExpr:0] | | |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..] | | |--CLogicalConstTableGet Columns: ["a" (16),..] | | +--CScalarConst (1) origin: [Grp:3, GrpExpr:0] | +--CScalarIsDistinctFrom (=) | |--CScalarCmp (=) | | |--CScalarIdent "a" (0) origin: [Grp:5, GrpExpr:0] | | +--CScalarIdent "a" (16) | +--CScalarConst (0) +--CScalarConst (1) Backport of 2e91028
Stolb27
approved these changes
Apr 19, 2021
hilltracer
pushed a commit
that referenced
this pull request
Mar 6, 2026
Add Python 3 support 1. Fix syntax in Python code in string literals. 2. Use `key` instead of `cmp` in sort 3. Remove the `is` comparison with non-None. 4. Remove string decoding in gpexpand 5. Check string emptiness by comparing their length instead of comparing them with empty string literals in order not to depend on the type of string. 6. Convert the result of subcommands to a regular string. 7. Implement modern comparators for the Segment, GPCatalogTable, and GpVersion classes 8. Fix RuntimeError caused by throwing StopIteration inside loop body 9. Wrap pickle serialized data in base64 in remote calling commands for more stable behavior between versions. 10. Remove the use of the platform module in gppkg. Use info from /etc/os-release instead. 11. Use the subprocess module instead of popen2 in gpscp. 12. Output data in base64 encoding in gpoperation.py. This allows us to convert all subcommand results to regular strings, which in turn reduces diff. 13. Use the correct builtin module name in unit tests according to the python version. 14. Convert the urlsafe_b64decode argument to a string for compatibility with python 2. 15. In the test_inner_exceptions test, declare an exception class inside function so that it cannot be serialized in python 3. 16. Skip test_unsafe_exceptions_with_args test in python 3 because it's not applicable to Python 3. 17. Remove usage of assertItemsEqual 18. Rename assertEquals to assertEqual for compatibility 19. Remove bundled mock. 20. Replace warn with warning for logging. 21. Replace getName() with name for threads.
Stolb27
pushed a commit
that referenced
this pull request
Mar 10, 2026
Add Python 3 support 1. Fix syntax in Python code in string literals. 2. Use `key` instead of `cmp` in sort 3. Remove the `is` comparison with non-None. 4. Remove string decoding in gpexpand 5. Check string emptiness by comparing their length instead of comparing them with empty string literals in order not to depend on the type of string. 6. Convert the result of subcommands to a regular string. 7. Implement modern comparators for the Segment, GPCatalogTable, and GpVersion classes 8. Fix RuntimeError caused by throwing StopIteration inside loop body 9. Wrap pickle serialized data in base64 in remote calling commands for more stable behavior between versions. 10. Remove the use of the platform module in gppkg. Use info from /etc/os-release instead. 11. Use the subprocess module instead of popen2 in gpscp. 12. Output data in base64 encoding in gpoperation.py. This allows us to convert all subcommand results to regular strings, which in turn reduces diff. 13. Use the correct builtin module name in unit tests according to the python version. 14. Convert the urlsafe_b64decode argument to a string for compatibility with python 2. 15. In the test_inner_exceptions test, declare an exception class inside function so that it cannot be serialized in python 3. 16. Skip test_unsafe_exceptions_with_args test in python 3 because it's not applicable to Python 3. 17. Remove usage of assertItemsEqual 18. Rename assertEquals to assertEqual for compatibility 19. Remove bundled mock. 20. Replace warn with warning for logging. 21. Replace getName() with name for threads. 22. Fix gpssh behave tests on Python 3 - Fix the error "TypeError: '<=' not supported between instances of 'int' and 'NoneType'" when comparing retry_attempt <= num_retries when num_retries is None. - Fix the error "TypeError: write() argument must be str, not bytes" when DEBUG_VERBOSE_PRINTING is True. - In Python 3, file operations return bytes, not strings, so the CRNL for comparison must be in bytes. 23. Replace the remaining warn with warning. 24. Fix the remaining row checks for emptiness. 25. Fix the remaining six usage. 26. Restore old dbconn connect logic to fix connection using ipv6. 27. Gpload: - Decode results of sub commands - Fix handling of unicode - Fix handling escaping - Add the ability to use different ans files for different versions of Python. The test_76_gpload_merge_chinese_standard_conforming_str_off part of the test expects a string decoding error when handling an exception from DB, but Python 3 can handle the exception text, which makes the Python 3 test result different. 28. Remove the gpssh dependency from configparser using an alias for Python 3. 29. Use non-order-dependent comparisons in unit tests to compare jsons. 30. Remove depends on LooseVersion. 31. Import io only for Python 3 when use for import StringIO 32. Decode result of subprocess 33. Fix indent in gpsys1 34. Fix packcore with Python 3 35. Fix pytest_requirement.txt for Python 3 36. Silence notices in gpcheckcat 37. Fix gpmemreport and gpmemwatcher for Python 3 Co-authored-by: Georgy Shelkovy <g.shelkovy@arenadata.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NOT IN is equivalent to "<> ALL" SQL expression and should be
replaced with LAS-Apply in CSubqueryHandler::FRemoveAllSubquery().
Before current commit ORCA transformed "<> ALL" subquery in filter
incorrectly. A query
was translated to
This plan returns wrong results when S.b is NULL. To handle it
(S.b = R.a) should be checked with IS_NOT_FALSE expression.
For example in https://github.com/greenplum-db/gpdb/issues/10967
returned incorrect (1), (2), (3) result on ORCA and correct empty
result on postgres planner. The root of it was in FRemoveAllSubquery().
Previously this function transformed
to
and didn't add IS_NOT_FALSE check as we should expect.
Now we get a correct logical tree with empty result:
Backport of 2e91028