Skip to content

Add support for target cols with functional dependency on grouping ones to ORCA#1195

Merged
whitehawk merged 5 commits intoadb-7.2.0from
ADBDEV-5466
Feb 6, 2025
Merged

Add support for target cols with functional dependency on grouping ones to ORCA#1195
whitehawk merged 5 commits intoadb-7.2.0from
ADBDEV-5466

Conversation

@whitehawk
Copy link

@whitehawk whitehawk commented Jan 28, 2025

Add support for target cols with functional dependency on grouping ones to ORCA

Problem description: ORCA fails to handle queries with columns in the target
list, not listed in GROUP BY clause, if grouping is done by a primary key
(meaning that these columns have a functional dependency on grouping columns).
In such a case, it falls back to the standard planner. Expected behavior - ORCA
generates a plan for a query with columns in target list, if these columns have
a functional dependency on columns in GROUP BY list.

Root cause: Function CQueryMutators::ShouldFallback during query normalization
finds out that there is a target list entry, that is not a grouping column, and
it triggers fallback to the standard planner.

Fix: All columns from the target list with functional dependency on grouping
columns are added explicitly to group by clause at the query normalization stage
(at start of groupby normalization, before checking if fallback is required)
before the translation to DXL. It requires the following steps:

  1. Extract such columns from target list expressions, and if they do not have a
    relevant target list entry, add resjunk target list entries for them.
  2. Store current unique target list entry references in groupClause - they will
    be required at step 4.
  3. Update all grouping sets that contain the primary key - add the functionally
    dependent columns from the target list.
  4. Update arguments of GROUPING functions (based on target list entry references
    stored on step 2 and updated target list entry references), because they could
    be shifted after step 3.

Plus, new test cases were added to the functional_deps test.

(cherry picked from commit b1373f8)

Changes comparing to the original commit:

  1. Usage of 'grouping()' function with a column that is not in a grouping sets
    is now declined on the parser stage (even if the column has a functional
    dependency on the column in the grouping set). So the part related to the
    adjustment of the 'grouping()' function arguments is removed (steps 2 and 4 of
    algorithm).
  2. Functions 'GetSortGroupClauseExpr()', 'LAppendUniqueInt()' and
    'LAppendUnique()' are not needed anymore, so they are removed from the patch.
  3. Function 'get_constraint_relation_oids()' was removed at some point between
    6x and 7x versions, but the patch uses it. So this function is brought back.
  4. Internals of 'groupClause' field of the Query structure has changed
    significantly comparing to 6x. Now it is a plain list of SortGroupClause nodes.
    The structure of the GROUP BY statement is now defined by 'groupingSets' field
    in the Query structure. Therefore, now the patch mutates not 'groupClause'
    field, but 'groupingSets' field. And missing SortGroupClause node is simply
    added to 'groupClause' list. The mutator and 'GroupingListContainsPrimaryKey()'
    functions has been updated to work with 'groupingSets'.
  5. After this change some old tests stopped to fallback to the standard planner
    (which is an expected consequence of the patch). Some old tests started to
    fallback with a different reason (as previous reason is fixed with this patch).
    All answer files for such tests were updated accordingly.
  6. Tests were aligned with changes in 6a737f5,
    as it is the expected behavior after PostgreSQL 9.5.
  7. Some newly added with the patch test cases fallback to standard planner on 7x
    due to different reason (for ex. "GPORCA does not support the following feature:
    nested grouping set"). These cases are left untouched, as they are reproduced
    even without target cols with functional dependency, so it is out of scope of
    the patch. Only optimizer answer file is updated.

For item 6:
Co-authored-by: Aleksandr Kopytov a.kopytov@arenadata.io

…es to ORCA (#746)

Problem description: ORCA fails to handle queries with columns in the target
list, not listed in GROUP BY clause, if grouping is done by a primary key
(meaning that these columns have a functional dependency on grouping columns).
In such a case, it falls back to the standard planner. Expected behavior - ORCA
generates a plan for a query with columns in target list, if these columns have
a functional dependency on columns in GROUP BY list.

Root cause: Function CQueryMutators::ShouldFallback during query normalization
finds out that there is a target list entry, that is not a grouping column, and
it triggers fallback to the standard planner.

Fix: All columns from the target list with functional dependency on grouping
columns are added explicitly to group by clause at the query normalization stage
(at start of groupby normalization, before checking if fallback is required)
before the translation to DXL. It requires the following steps:
1) Extract such columns from target list expressions, and if they do not have a
relevant target list entry, add resjunk target list entries for them.
2) Store current unique target list entry references in groupClause - they will
be required at step 4.
3) Update all grouping sets that contain the primary key - add the functionally
dependent columns from the target list.
4) Update arguments of GROUPING functions (based on target list entry references
stored on step 2 and updated target list entry references), because they could
be shifted after step 3.

Plus, new test cases were added to the functional_deps test.

(cherry picked from commit b1373f8)

Changes comparing to the original commit:
1. Usage of 'grouping()' function with a column that is not in a grouping sets
is now declined on the parser stage (even if the column has a functional
dependency on the column in the grouping set). So the part related to the
adjustment of the 'grouping()' function arguments is removed (steps 2 and 4 of
algorithm).
2. Functions 'GetSortGroupClauseExpr()', 'LAppendUniqueInt()' and
'LAppendUnique()' are not needed anymore, so they are removed from the patch.
3. Function 'get_constraint_relation_oids()' was removed at some point between
6x and 7x versions, but the patch uses it. So this function is brought back.
4. Internals of 'groupClause' field of the Query structure has changed
significantly comparing to 6x. Now it is a plain list of SortGroupClause nodes.
The structure of the GROUP BY statement is now defined by 'groupingSets' field
in the Query structure. Therefore, now the patch mutates not 'groupClause'
field, but 'groupingSets' field. And missing SortGroupClause node is simply
added to 'groupClause' list. The mutator and 'GroupingListContainsPrimaryKey()'
functions has been updated to work with 'groupingSets'.
5. Tests were aligned with changes in 6a737f5,
as it is the expected behavior after PostgreSQL 9.5.
6. Some newly added with the patch test cases fallback to standard planner on 7x
due to different reason (for ex. "GPORCA does not support the following feature:
nested grouping set"). These cases are left untouched, as they are reproduced
even without target cols with functional dependency, so it is out of scope of
the patch. Only optimizer answer file is updated.

For item #5:
Co-authored-by: Aleksandr Kopytov <a.kopytov@arenadata.io>
@whitehawk whitehawk marked this pull request as ready for review January 29, 2025 01:46
@whitehawk whitehawk merged commit d7d2186 into adb-7.2.0 Feb 6, 2025
4 checks passed
@whitehawk whitehawk deleted the ADBDEV-5466 branch February 6, 2025 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants