Add support for target cols with functional dependency on grouping ones to ORCA by whitehawk · Pull Request #1195 · arenadata/gpdb

whitehawk · 2025-01-28T04:49:26Z

Add support for target cols with functional dependency on grouping ones to ORCA

Problem description: ORCA fails to handle queries with columns in the target
list, not listed in GROUP BY clause, if grouping is done by a primary key
(meaning that these columns have a functional dependency on grouping columns).
In such a case, it falls back to the standard planner. Expected behavior - ORCA
generates a plan for a query with columns in target list, if these columns have
a functional dependency on columns in GROUP BY list.

Root cause: Function CQueryMutators::ShouldFallback during query normalization
finds out that there is a target list entry, that is not a grouping column, and
it triggers fallback to the standard planner.

Fix: All columns from the target list with functional dependency on grouping
columns are added explicitly to group by clause at the query normalization stage
(at start of groupby normalization, before checking if fallback is required)
before the translation to DXL. It requires the following steps:

Extract such columns from target list expressions, and if they do not have a
relevant target list entry, add resjunk target list entries for them.
Store current unique target list entry references in groupClause - they will
be required at step 4.
Update all grouping sets that contain the primary key - add the functionally
dependent columns from the target list.
Update arguments of GROUPING functions (based on target list entry references
stored on step 2 and updated target list entry references), because they could
be shifted after step 3.

Plus, new test cases were added to the functional_deps test.

(cherry picked from commit b1373f8)

Changes comparing to the original commit:

Usage of 'grouping()' function with a column that is not in a grouping sets
is now declined on the parser stage (even if the column has a functional
dependency on the column in the grouping set). So the part related to the
adjustment of the 'grouping()' function arguments is removed (steps 2 and 4 of
algorithm).
Functions 'GetSortGroupClauseExpr()', 'LAppendUniqueInt()' and
'LAppendUnique()' are not needed anymore, so they are removed from the patch.
Function 'get_constraint_relation_oids()' was removed at some point between
6x and 7x versions, but the patch uses it. So this function is brought back.
Internals of 'groupClause' field of the Query structure has changed
significantly comparing to 6x. Now it is a plain list of SortGroupClause nodes.
The structure of the GROUP BY statement is now defined by 'groupingSets' field
in the Query structure. Therefore, now the patch mutates not 'groupClause'
field, but 'groupingSets' field. And missing SortGroupClause node is simply
added to 'groupClause' list. The mutator and 'GroupingListContainsPrimaryKey()'
functions has been updated to work with 'groupingSets'.
After this change some old tests stopped to fallback to the standard planner
(which is an expected consequence of the patch). Some old tests started to
fallback with a different reason (as previous reason is fixed with this patch).
All answer files for such tests were updated accordingly.
Tests were aligned with changes in 6a737f5,
as it is the expected behavior after PostgreSQL 9.5.
Some newly added with the patch test cases fallback to standard planner on 7x
due to different reason (for ex. "GPORCA does not support the following feature:
nested grouping set"). These cases are left untouched, as they are reproduced
even without target cols with functional dependency, so it is out of scope of
the patch. Only optimizer answer file is updated.

For item 6:
Co-authored-by: Aleksandr Kopytov a.kopytov@arenadata.io

…es to ORCA (#746) Problem description: ORCA fails to handle queries with columns in the target list, not listed in GROUP BY clause, if grouping is done by a primary key (meaning that these columns have a functional dependency on grouping columns). In such a case, it falls back to the standard planner. Expected behavior - ORCA generates a plan for a query with columns in target list, if these columns have a functional dependency on columns in GROUP BY list. Root cause: Function CQueryMutators::ShouldFallback during query normalization finds out that there is a target list entry, that is not a grouping column, and it triggers fallback to the standard planner. Fix: All columns from the target list with functional dependency on grouping columns are added explicitly to group by clause at the query normalization stage (at start of groupby normalization, before checking if fallback is required) before the translation to DXL. It requires the following steps: 1) Extract such columns from target list expressions, and if they do not have a relevant target list entry, add resjunk target list entries for them. 2) Store current unique target list entry references in groupClause - they will be required at step 4. 3) Update all grouping sets that contain the primary key - add the functionally dependent columns from the target list. 4) Update arguments of GROUPING functions (based on target list entry references stored on step 2 and updated target list entry references), because they could be shifted after step 3. Plus, new test cases were added to the functional_deps test. (cherry picked from commit b1373f8) Changes comparing to the original commit: 1. Usage of 'grouping()' function with a column that is not in a grouping sets is now declined on the parser stage (even if the column has a functional dependency on the column in the grouping set). So the part related to the adjustment of the 'grouping()' function arguments is removed (steps 2 and 4 of algorithm). 2. Functions 'GetSortGroupClauseExpr()', 'LAppendUniqueInt()' and 'LAppendUnique()' are not needed anymore, so they are removed from the patch. 3. Function 'get_constraint_relation_oids()' was removed at some point between 6x and 7x versions, but the patch uses it. So this function is brought back. 4. Internals of 'groupClause' field of the Query structure has changed significantly comparing to 6x. Now it is a plain list of SortGroupClause nodes. The structure of the GROUP BY statement is now defined by 'groupingSets' field in the Query structure. Therefore, now the patch mutates not 'groupClause' field, but 'groupingSets' field. And missing SortGroupClause node is simply added to 'groupClause' list. The mutator and 'GroupingListContainsPrimaryKey()' functions has been updated to work with 'groupingSets'. 5. Tests were aligned with changes in 6a737f5, as it is the expected behavior after PostgreSQL 9.5. 6. Some newly added with the patch test cases fallback to standard planner on 7x due to different reason (for ex. "GPORCA does not support the following feature: nested grouping set"). These cases are left untouched, as they are reproduced even without target cols with functional dependency, so it is out of scope of the patch. Only optimizer answer file is updated. For item #5: Co-authored-by: Aleksandr Kopytov <a.kopytov@arenadata.io>

src/backend/gpopt/translate/CQueryMutators.cpp

src/test/regress/expected/functional_deps_optimizer.out

whitehawk force-pushed the ADBDEV-5466 branch from 4d56437 to b087fe1 Compare January 28, 2025 05:45

whitehawk force-pushed the ADBDEV-5466 branch from b087fe1 to 0b0e280 Compare January 28, 2025 12:10

Merge branch 'adb-7.2.0' into ADBDEV-5466

2fe0fb7

whitehawk marked this pull request as ready for review January 29, 2025 01:46

mos65o2 reviewed Feb 3, 2025

View reviewed changes

src/backend/gpopt/translate/CQueryMutators.cpp Show resolved Hide resolved

mos65o2 reviewed Feb 3, 2025

View reviewed changes

src/test/regress/expected/functional_deps_optimizer.out Outdated Show resolved Hide resolved

whitehawk added 2 commits February 4, 2025 13:18

Add comments to the test

e3a5f47

Merge branch 'adb-7.2.0' into ADBDEV-5466

a9f4966

mos65o2 approved these changes Feb 4, 2025

View reviewed changes

bimboterminator1 approved these changes Feb 6, 2025

View reviewed changes

Merge branch 'adb-7.2.0' into ADBDEV-5466

778d223

whitehawk merged commit d7d2186 into adb-7.2.0 Feb 6, 2025
4 checks passed

whitehawk deleted the ADBDEV-5466 branch February 6, 2025 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for target cols with functional dependency on grouping ones to ORCA#1195

Add support for target cols with functional dependency on grouping ones to ORCA#1195
whitehawk merged 5 commits intoadb-7.2.0from
ADBDEV-5466

whitehawk commented Jan 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

whitehawk commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

whitehawk commented Jan 28, 2025 •

edited

Loading