Add support for target cols with functional dependency on grouping ones to ORCA#1195
Merged
Add support for target cols with functional dependency on grouping ones to ORCA#1195
Conversation
4d56437 to
b087fe1
Compare
…es to ORCA (#746) Problem description: ORCA fails to handle queries with columns in the target list, not listed in GROUP BY clause, if grouping is done by a primary key (meaning that these columns have a functional dependency on grouping columns). In such a case, it falls back to the standard planner. Expected behavior - ORCA generates a plan for a query with columns in target list, if these columns have a functional dependency on columns in GROUP BY list. Root cause: Function CQueryMutators::ShouldFallback during query normalization finds out that there is a target list entry, that is not a grouping column, and it triggers fallback to the standard planner. Fix: All columns from the target list with functional dependency on grouping columns are added explicitly to group by clause at the query normalization stage (at start of groupby normalization, before checking if fallback is required) before the translation to DXL. It requires the following steps: 1) Extract such columns from target list expressions, and if they do not have a relevant target list entry, add resjunk target list entries for them. 2) Store current unique target list entry references in groupClause - they will be required at step 4. 3) Update all grouping sets that contain the primary key - add the functionally dependent columns from the target list. 4) Update arguments of GROUPING functions (based on target list entry references stored on step 2 and updated target list entry references), because they could be shifted after step 3. Plus, new test cases were added to the functional_deps test. (cherry picked from commit b1373f8) Changes comparing to the original commit: 1. Usage of 'grouping()' function with a column that is not in a grouping sets is now declined on the parser stage (even if the column has a functional dependency on the column in the grouping set). So the part related to the adjustment of the 'grouping()' function arguments is removed (steps 2 and 4 of algorithm). 2. Functions 'GetSortGroupClauseExpr()', 'LAppendUniqueInt()' and 'LAppendUnique()' are not needed anymore, so they are removed from the patch. 3. Function 'get_constraint_relation_oids()' was removed at some point between 6x and 7x versions, but the patch uses it. So this function is brought back. 4. Internals of 'groupClause' field of the Query structure has changed significantly comparing to 6x. Now it is a plain list of SortGroupClause nodes. The structure of the GROUP BY statement is now defined by 'groupingSets' field in the Query structure. Therefore, now the patch mutates not 'groupClause' field, but 'groupingSets' field. And missing SortGroupClause node is simply added to 'groupClause' list. The mutator and 'GroupingListContainsPrimaryKey()' functions has been updated to work with 'groupingSets'. 5. Tests were aligned with changes in 6a737f5, as it is the expected behavior after PostgreSQL 9.5. 6. Some newly added with the patch test cases fallback to standard planner on 7x due to different reason (for ex. "GPORCA does not support the following feature: nested grouping set"). These cases are left untouched, as they are reproduced even without target cols with functional dependency, so it is out of scope of the patch. Only optimizer answer file is updated. For item #5: Co-authored-by: Aleksandr Kopytov <a.kopytov@arenadata.io>
b087fe1 to
0b0e280
Compare
mos65o2
reviewed
Feb 3, 2025
mos65o2
reviewed
Feb 3, 2025
mos65o2
approved these changes
Feb 4, 2025
bimboterminator1
approved these changes
Feb 6, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add support for target cols with functional dependency on grouping ones to ORCA
Problem description: ORCA fails to handle queries with columns in the target
list, not listed in GROUP BY clause, if grouping is done by a primary key
(meaning that these columns have a functional dependency on grouping columns).
In such a case, it falls back to the standard planner. Expected behavior - ORCA
generates a plan for a query with columns in target list, if these columns have
a functional dependency on columns in GROUP BY list.
Root cause: Function CQueryMutators::ShouldFallback during query normalization
finds out that there is a target list entry, that is not a grouping column, and
it triggers fallback to the standard planner.
Fix: All columns from the target list with functional dependency on grouping
columns are added explicitly to group by clause at the query normalization stage
(at start of groupby normalization, before checking if fallback is required)
before the translation to DXL. It requires the following steps:
relevant target list entry, add resjunk target list entries for them.
be required at step 4.
dependent columns from the target list.
stored on step 2 and updated target list entry references), because they could
be shifted after step 3.
Plus, new test cases were added to the functional_deps test.
(cherry picked from commit b1373f8)
Changes comparing to the original commit:
is now declined on the parser stage (even if the column has a functional
dependency on the column in the grouping set). So the part related to the
adjustment of the 'grouping()' function arguments is removed (steps 2 and 4 of
algorithm).
'LAppendUnique()' are not needed anymore, so they are removed from the patch.
6x and 7x versions, but the patch uses it. So this function is brought back.
significantly comparing to 6x. Now it is a plain list of SortGroupClause nodes.
The structure of the GROUP BY statement is now defined by 'groupingSets' field
in the Query structure. Therefore, now the patch mutates not 'groupClause'
field, but 'groupingSets' field. And missing SortGroupClause node is simply
added to 'groupClause' list. The mutator and 'GroupingListContainsPrimaryKey()'
functions has been updated to work with 'groupingSets'.
(which is an expected consequence of the patch). Some old tests started to
fallback with a different reason (as previous reason is fixed with this patch).
All answer files for such tests were updated accordingly.
as it is the expected behavior after PostgreSQL 9.5.
due to different reason (for ex. "GPORCA does not support the following feature:
nested grouping set"). These cases are left untouched, as they are reproduced
even without target cols with functional dependency, so it is out of scope of
the patch. Only optimizer answer file is updated.
For item 6:
Co-authored-by: Aleksandr Kopytov a.kopytov@arenadata.io