[6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter by darthunix · Pull Request #170 · arenadata/gpdb

darthunix · 2021-03-23T09:55:46Z

NOT IN is equivalent to "<> ALL" SQL expression and should be
replaced with LAS-Apply in CSubqueryHandler::FRemoveAllSubquery().
Before current commit ORCA transformed "<> ALL" subquery in filter
incorrectly. A query

select * from R where R.a <> ALL (select S.b from S)

was translated to

select * from R LAS-APPLY-Not-In (select S.b from S where S.b = R.a) on true;

This plan returns wrong results when S.b is NULL. To handle it
(S.b = R.a) should be checked with IS_NOT_FALSE expression.

For example in https://github.com/greenplum-db/gpdb/issues/10967

create table t1(a int) distributed by (a);
insert into t1 values (1), (2), (3);
set optimizer_join_order='query';
select * from t1 where a not in (
    select b.a from t1 a left join t1 b on false
);

returned incorrect (1), (2), (3) result on ORCA and correct empty
result on postgres planner. The root of it was in FRemoveAllSubquery().
Previously this function transformed

+--CScalarSubqueryAll(<>)["a" (16)]   origin: [Grp:6, GrpExpr:0]
    |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    +--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]

to

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16))
    |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..]
    |--CLogicalSelect
    |  |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    |  +--CScalarCmp (=)
    |     |--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]
    |     +--CScalarIdent "a" (16)
    +--CScalarConst (1)

and didn't add IS_NOT_FALSE check as we should expect.
Now we get a correct logical tree with empty result:

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16))
    |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..]
    |--CLogicalSelect
    |  |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    |  +--CScalarIsDistinctFrom (=)
    |     |--CScalarCmp (=)
    |     |  |--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]
    |     |  +--CScalarIdent "a" (16)
    |     +--CScalarConst (0)
    +--CScalarConst (1)

Backport of 2e91028

Stolb27 · 2021-04-09T03:47:32Z

@darthunix, can you rebase it to run tests?

NOT IN is equivalent to "<> ALL" SQL expression and should be replaced with LAS-Apply in CSubqueryHandler::FRemoveAllSubquery(). Before current commit ORCA transformed "<> ALL" subquery in filter incorrectly. A query select * from R where R.a <> ALL (select S.b from S) was translated to select * from R LAS-APPLY-Not-In (select S.b from S where S.b = R.a) on true; This plan returns wrong results when S.b is NULL. To handle it (S.b = R.a) should be checked with IS_NOT_FALSE expression. For example in https://github.com/greenplum-db/gpdb/issues/10967 create table t1(a int) distributed by (a); insert into t1 values (1), (2), (3); set optimizer_join_order='query'; select * from t1 where a not in ( select b.a from t1 a left join t1 b on false ); returned incorrect (1), (2), (3) result on ORCA and correct empty result on postgres planner. The root of it was in FRemoveAllSubquery(). Previously this function transformed +--CScalarSubqueryAll(<>)["a" (16)] origin: [Grp:6, GrpExpr:0] |--CLogicalLeftOuterJoin origin: [Grp:4, GrpExpr:0] | |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..] | |--CLogicalConstTableGet Columns: ["a" (16),..] | +--CScalarConst (1) origin: [Grp:3, GrpExpr:0] +--CScalarIdent "a" (0) origin: [Grp:5, GrpExpr:0] to +--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16)) |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..] |--CLogicalSelect | |--CLogicalLeftOuterJoin origin: [Grp:4, GrpExpr:0] | | |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..] | | |--CLogicalConstTableGet Columns: ["a" (16),..] | | +--CScalarConst (1) origin: [Grp:3, GrpExpr:0] | +--CScalarCmp (=) | |--CScalarIdent "a" (0) origin: [Grp:5, GrpExpr:0] | +--CScalarIdent "a" (16) +--CScalarConst (1) and didn't add IS_NOT_FALSE check as we should expect. Now we get a correct logical tree with empty result: +--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16)) |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..] |--CLogicalSelect | |--CLogicalLeftOuterJoin origin: [Grp:4, GrpExpr:0] | | |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..] | | |--CLogicalConstTableGet Columns: ["a" (16),..] | | +--CScalarConst (1) origin: [Grp:3, GrpExpr:0] | +--CScalarIsDistinctFrom (=) | |--CScalarCmp (=) | | |--CScalarIdent "a" (0) origin: [Grp:5, GrpExpr:0] | | +--CScalarIdent "a" (16) | +--CScalarConst (0) +--CScalarConst (1) Backport of 2e91028

Add Python 3 support 1. Fix syntax in Python code in string literals. 2. Use `key` instead of `cmp` in sort 3. Remove the `is` comparison with non-None. 4. Remove string decoding in gpexpand 5. Check string emptiness by comparing their length instead of comparing them with empty string literals in order not to depend on the type of string. 6. Convert the result of subcommands to a regular string. 7. Implement modern comparators for the Segment, GPCatalogTable, and GpVersion classes 8. Fix RuntimeError caused by throwing StopIteration inside loop body 9. Wrap pickle serialized data in base64 in remote calling commands for more stable behavior between versions. 10. Remove the use of the platform module in gppkg. Use info from /etc/os-release instead. 11. Use the subprocess module instead of popen2 in gpscp. 12. Output data in base64 encoding in gpoperation.py. This allows us to convert all subcommand results to regular strings, which in turn reduces diff. 13. Use the correct builtin module name in unit tests according to the python version. 14. Convert the urlsafe_b64decode argument to a string for compatibility with python 2. 15. In the test_inner_exceptions test, declare an exception class inside function so that it cannot be serialized in python 3. 16. Skip test_unsafe_exceptions_with_args test in python 3 because it's not applicable to Python 3. 17. Remove usage of assertItemsEqual 18. Rename assertEquals to assertEqual for compatibility 19. Remove bundled mock. 20. Replace warn with warning for logging. 21. Replace getName() with name for threads.

Add Python 3 support 1. Fix syntax in Python code in string literals. 2. Use `key` instead of `cmp` in sort 3. Remove the `is` comparison with non-None. 4. Remove string decoding in gpexpand 5. Check string emptiness by comparing their length instead of comparing them with empty string literals in order not to depend on the type of string. 6. Convert the result of subcommands to a regular string. 7. Implement modern comparators for the Segment, GPCatalogTable, and GpVersion classes 8. Fix RuntimeError caused by throwing StopIteration inside loop body 9. Wrap pickle serialized data in base64 in remote calling commands for more stable behavior between versions. 10. Remove the use of the platform module in gppkg. Use info from /etc/os-release instead. 11. Use the subprocess module instead of popen2 in gpscp. 12. Output data in base64 encoding in gpoperation.py. This allows us to convert all subcommand results to regular strings, which in turn reduces diff. 13. Use the correct builtin module name in unit tests according to the python version. 14. Convert the urlsafe_b64decode argument to a string for compatibility with python 2. 15. In the test_inner_exceptions test, declare an exception class inside function so that it cannot be serialized in python 3. 16. Skip test_unsafe_exceptions_with_args test in python 3 because it's not applicable to Python 3. 17. Remove usage of assertItemsEqual 18. Rename assertEquals to assertEqual for compatibility 19. Remove bundled mock. 20. Replace warn with warning for logging. 21. Replace getName() with name for threads. 22. Fix gpssh behave tests on Python 3 - Fix the error "TypeError: '<=' not supported between instances of 'int' and 'NoneType'" when comparing retry_attempt <= num_retries when num_retries is None. - Fix the error "TypeError: write() argument must be str, not bytes" when DEBUG_VERBOSE_PRINTING is True. - In Python 3, file operations return bytes, not strings, so the CRNL for comparison must be in bytes. 23. Replace the remaining warn with warning. 24. Fix the remaining row checks for emptiness. 25. Fix the remaining six usage. 26. Restore old dbconn connect logic to fix connection using ipv6. 27. Gpload: - Decode results of sub commands - Fix handling of unicode - Fix handling escaping - Add the ability to use different ans files for different versions of Python. The test_76_gpload_merge_chinese_standard_conforming_str_off part of the test expects a string decoding error when handling an exception from DB, but Python 3 can handle the exception text, which makes the Python 3 test result different. 28. Remove the gpssh dependency from configparser using an alias for Python 3. 29. Use non-order-dependent comparisons in unit tests to compare jsons. 30. Remove depends on LooseVersion. 31. Import io only for Python 3 when use for import StringIO 32. Decode result of subprocess 33. Fix indent in gpsys1 34. Fix packcore with Python 3 35. Fix pytest_requirement.txt for Python 3 36. Silence notices in gpcheckcat 37. Fix gpmemreport and gpmemwatcher for Python 3 Co-authored-by: Georgy Shelkovy <g.shelkovy@arenadata.io>

darthunix requested a review from Stolb27 March 23, 2021 09:55

darthunix changed the title ~~[6X backport] Fix preprocessing of NOT IN to LAS-Apply in filter~~ [6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter Mar 23, 2021

darthunix force-pushed the ADBDEV-1438 branch from ceef4ee to 33459cf Compare March 23, 2021 11:21

Stolb27 previously approved these changes Apr 9, 2021

View reviewed changes

darthunix dismissed Stolb27’s stale review via 90afebc April 12, 2021 07:13

darthunix force-pushed the ADBDEV-1438 branch from 106b705 to 90afebc Compare April 12, 2021 07:13

Stolb27 approved these changes Apr 19, 2021

View reviewed changes

deart2k merged commit 592acb7 into adb-6.x Apr 20, 2021

deart2k deleted the ADBDEV-1438 branch April 20, 2021 13:01

Stolb27 mentioned this pull request Jan 19, 2022

ADBDEV-2435 ADB 6.19.0 sync #294

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter#170

[6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter#170
deart2k merged 1 commit intoadb-6.xfrom
ADBDEV-1438

darthunix commented Mar 23, 2021 •

edited

Loading

Uh oh!

Stolb27 commented Apr 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

darthunix commented Mar 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Stolb27 commented Apr 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

darthunix commented Mar 23, 2021 •

edited

Loading