Skip to content

[6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter#170

Merged
deart2k merged 1 commit intoadb-6.xfrom
ADBDEV-1438
Apr 20, 2021
Merged

[6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter#170
deart2k merged 1 commit intoadb-6.xfrom
ADBDEV-1438

Conversation

@darthunix
Copy link

@darthunix darthunix commented Mar 23, 2021

NOT IN is equivalent to "<> ALL" SQL expression and should be
replaced with LAS-Apply in CSubqueryHandler::FRemoveAllSubquery().
Before current commit ORCA transformed "<> ALL" subquery in filter
incorrectly. A query

select * from R where R.a <> ALL (select S.b from S)

was translated to

select * from R LAS-APPLY-Not-In (select S.b from S where S.b = R.a) on true;

This plan returns wrong results when S.b is NULL. To handle it
(S.b = R.a) should be checked with IS_NOT_FALSE expression.

For example in https://github.com/greenplum-db/gpdb/issues/10967

create table t1(a int) distributed by (a);
insert into t1 values (1), (2), (3);
set optimizer_join_order='query';
select * from t1 where a not in (
    select b.a from t1 a left join t1 b on false
);

returned incorrect (1), (2), (3) result on ORCA and correct empty
result on postgres planner. The root of it was in FRemoveAllSubquery().
Previously this function transformed

+--CScalarSubqueryAll(<>)["a" (16)]   origin: [Grp:6, GrpExpr:0]
    |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    +--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]

to

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16))
    |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..]
    |--CLogicalSelect
    |  |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    |  +--CScalarCmp (=)
    |     |--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]
    |     +--CScalarIdent "a" (16)
    +--CScalarConst (1)

and didn't add IS_NOT_FALSE check as we should expect.
Now we get a correct logical tree with empty result:

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16))
    |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..]
    |--CLogicalSelect
    |  |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    |  +--CScalarIsDistinctFrom (=)
    |     |--CScalarCmp (=)
    |     |  |--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]
    |     |  +--CScalarIdent "a" (16)
    |     +--CScalarConst (0)
    +--CScalarConst (1)

Backport of 2e91028

@darthunix darthunix requested a review from Stolb27 March 23, 2021 09:55
@darthunix darthunix changed the title [6X backport] Fix preprocessing of NOT IN to LAS-Apply in filter [6X Backport] Fix preprocessing of NOT IN to LAS-Apply in filter Mar 23, 2021
@Stolb27
Copy link
Collaborator

Stolb27 commented Apr 9, 2021

@darthunix, can you rebase it to run tests?

Stolb27
Stolb27 previously approved these changes Apr 9, 2021
NOT IN is equivalent to "<> ALL" SQL expression and should be
replaced with LAS-Apply in CSubqueryHandler::FRemoveAllSubquery().
Before current commit ORCA transformed "<> ALL" subquery in filter
incorrectly. A query

select * from R where R.a <> ALL (select S.b from S)

was translated to

select * from R LAS-APPLY-Not-In (select S.b from S where S.b = R.a) on true;

This plan returns wrong results when S.b is NULL. To handle it
(S.b = R.a) should be checked with IS_NOT_FALSE expression.

For example in https://github.com/greenplum-db/gpdb/issues/10967

create table t1(a int) distributed by (a);
insert into t1 values (1), (2), (3);
set optimizer_join_order='query';
select * from t1 where a not in (
    select b.a from t1 a left join t1 b on false
);

returned incorrect (1), (2), (3) result on ORCA and correct empty
result on postgres planner. The root of it was in FRemoveAllSubquery().
Previously this function transformed

+--CScalarSubqueryAll(<>)["a" (16)]   origin: [Grp:6, GrpExpr:0]
    |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    +--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]

to

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16))
    |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..]
    |--CLogicalSelect
    |  |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    |  +--CScalarCmp (=)
    |     |--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]
    |     +--CScalarIdent "a" (16)
    +--CScalarConst (1)

and didn't add IS_NOT_FALSE check as we should expect.
Now we get a correct logical tree with empty result:

+--CLogicalLeftAntiSemiApplyNotIn (Reqd Inner Cols: "a" (16))
    |--CLogicalGet "t1" ("t1"), Columns: ["a" (0),..]
    |--CLogicalSelect
    |  |--CLogicalLeftOuterJoin   origin: [Grp:4, GrpExpr:0]
    |  |  |--CLogicalGet "t1" ("t1"), Columns: ["a" (8),..]
    |  |  |--CLogicalConstTableGet Columns: ["a" (16),..]
    |  |  +--CScalarConst (1)   origin: [Grp:3, GrpExpr:0]
    |  +--CScalarIsDistinctFrom (=)
    |     |--CScalarCmp (=)
    |     |  |--CScalarIdent "a" (0)   origin: [Grp:5, GrpExpr:0]
    |     |  +--CScalarIdent "a" (16)
    |     +--CScalarConst (0)
    +--CScalarConst (1)

Backport of 2e91028
@deart2k deart2k merged commit 592acb7 into adb-6.x Apr 20, 2021
@deart2k deart2k deleted the ADBDEV-1438 branch April 20, 2021 13:01
@Stolb27 Stolb27 mentioned this pull request Jan 19, 2022
5 tasks
hilltracer pushed a commit that referenced this pull request Mar 6, 2026
Add Python 3 support
1.  Fix syntax in Python code in string literals.
2.  Use `key` instead of `cmp` in sort
3.  Remove the `is` comparison with non-None.
4.  Remove string decoding in gpexpand
5.  Check string emptiness by comparing their length instead of comparing them
with empty string literals in order not to depend on the type of string.
6.  Convert the result of subcommands to a regular string.
7.  Implement modern comparators for the Segment, GPCatalogTable, and GpVersion
classes
8.  Fix RuntimeError caused by throwing StopIteration inside loop body
9. Wrap pickle serialized data in base64 in remote calling commands for more
stable behavior between versions.
10. Remove the use of the platform module in gppkg. Use info from
/etc/os-release instead.
11. Use the subprocess module instead of popen2 in gpscp.
12. Output data in base64 encoding in gpoperation.py. This allows us to convert
all subcommand results to regular strings, which in turn reduces diff.
13. Use the correct builtin module name in unit tests according to the python
version.
14. Convert the urlsafe_b64decode argument to a string for compatibility with
python 2.
15. In the test_inner_exceptions test, declare an exception class inside
function so that it cannot be serialized in python 3.
16. Skip test_unsafe_exceptions_with_args test in python 3 because it's not
applicable to Python 3.
17. Remove usage of assertItemsEqual
18. Rename assertEquals to assertEqual for compatibility
19. Remove bundled mock.
20. Replace warn with warning for logging.
21. Replace getName() with name for threads.
Stolb27 pushed a commit that referenced this pull request Mar 10, 2026
Add Python 3 support
1.  Fix syntax in Python code in string literals.
2.  Use `key` instead of `cmp` in sort
3.  Remove the `is` comparison with non-None.
4.  Remove string decoding in gpexpand
5.  Check string emptiness by comparing their length instead of comparing them
with empty string literals in order not to depend on the type of string.
6.  Convert the result of subcommands to a regular string.
7.  Implement modern comparators for the Segment, GPCatalogTable, and GpVersion
classes
8.  Fix RuntimeError caused by throwing StopIteration inside loop body
9. Wrap pickle serialized data in base64 in remote calling commands for more
stable behavior between versions.
10. Remove the use of the platform module in gppkg. Use info from
/etc/os-release instead.
11. Use the subprocess module instead of popen2 in gpscp.
12. Output data in base64 encoding in gpoperation.py. This allows us to convert
all subcommand results to regular strings, which in turn reduces diff.
13. Use the correct builtin module name in unit tests according to the python
version.
14. Convert the urlsafe_b64decode argument to a string for compatibility with
python 2.
15. In the test_inner_exceptions test, declare an exception class inside
function so that it cannot be serialized in python 3.
16. Skip test_unsafe_exceptions_with_args test in python 3 because it's not
applicable to Python 3.
17. Remove usage of assertItemsEqual
18. Rename assertEquals to assertEqual for compatibility
19. Remove bundled mock.
20. Replace warn with warning for logging.
21. Replace getName() with name for threads.
22. Fix gpssh behave tests on Python 3
- Fix the error "TypeError: '<=' not supported between instances of
  'int' and 'NoneType'" when comparing retry_attempt <= num_retries when
  num_retries is None.
- Fix the error "TypeError: write() argument must be str, not bytes"
  when DEBUG_VERBOSE_PRINTING is True.
- In Python 3, file operations return bytes, not strings, so the CRNL
  for comparison must be in bytes.
23. Replace the remaining warn with warning.
24. Fix the remaining row checks for emptiness.
25. Fix the remaining six usage.
26. Restore old dbconn connect logic to fix connection using ipv6.
27. Gpload:
- Decode results of sub commands
- Fix handling of unicode
- Fix handling escaping
- Add the ability to use different ans files for different versions of
  Python. The test_76_gpload_merge_chinese_standard_conforming_str_off
  part of the test expects a string decoding error when handling an
  exception from DB, but Python 3 can handle the exception text, which
  makes the Python 3 test result different.
28. Remove the gpssh dependency from configparser using an alias for Python 3.
29. Use non-order-dependent comparisons in unit tests to compare jsons.
30. Remove depends on LooseVersion.
31. Import io only for Python 3 when use for import StringIO
32. Decode result of subprocess
33. Fix indent in gpsys1
34. Fix packcore with Python 3
35. Fix pytest_requirement.txt for Python 3
36. Silence notices in gpcheckcat
37. Fix gpmemreport and gpmemwatcher for Python 3

Co-authored-by: Georgy Shelkovy <g.shelkovy@arenadata.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants