Skip to content

[CALCITE-6749] RelMdUtil#setAggChildKeys may return an incorrect result#4115

Merged
rubenada merged 1 commit intoapache:mainfrom
rubenada:CALCITE-6749
Jan 8, 2025
Merged

[CALCITE-6749] RelMdUtil#setAggChildKeys may return an incorrect result#4115
rubenada merged 1 commit intoapache:mainfrom
rubenada:CALCITE-6749

Conversation

@rubenada
Copy link
Copy Markdown
Contributor

@rubenada
Copy link
Copy Markdown
Contributor Author

As a consequence of the fix, there are a couple of plan changes on *.iq tests, but the result is still correct.

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-6749">[CALCITE-6749]
* RelMdUtil#setAggChildKeys may return an incorrect result</a>. */
@Test void testSetAggChildKeys() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, this is hard to review.
I saw the discussion in the JIRA, but it's not obvious at all that "9" is the right answer.
I wonder whether having the full plan here would actually be easier - at least in a comment.

Copy link
Copy Markdown
Contributor Author

@rubenada rubenada Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we need to convert: "field index X on the Aggregate corresponds to which field index Y on the Aggregate's input?"
If we have this plan

LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)])
  LogicalProject(DEPTNO=[$9], JOB=[$2])
    LogicalJoin(condition=[=($7, $9)], joinType=[right])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

The Aggregate rowType has two fields (0 and 1), the first one (index 0) corresponds to the group 0, and the second one (index 1) corresponds to the argCall COUNT(DISTINCT $1). In this case, if we want to convert into the input's, we'll have 0 => 0, and 1 => 1; there is no problem on the current code.

However, if we have this equivalent plan (same query plan, but without the intermediate Project):

LogicalAggregate(group=[{9}], EXPR$1=[COUNT(DISTINCT $2)])
  LogicalJoin(condition=[=($7, $9)], joinType=[right])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

Now the Aggregate still has two fields (0 and 1), the first one (index 0) corresponds to the group 9, and the second one (index 1) corresponds to the argCall COUNT(DISTINCT $2). In this case, if we want to convert into the input's, we'll have 0 => 9, and 1 => 2; the current code will work fine only for the argCall conversion, but not for the group conversion.

Notice I'm just applying here the same conversion that exists already e.g. when propagating the RelMdColumnOrigins computation past an Aggregate (in terms of our second example: "if we want to get origin of the column 0 of the Aggregate, we need to compute the origin of the column 9 of the Aggregate's input"). The auxiliary method RelMdUtil.setAggChildKeys (used by RelMdDistinctRowCount and RelMdPopulationSize for Aggregates) should behave in the same way as RelMdColumnOrigin for this conversion.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label Jan 6, 2025
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Jan 7, 2025

@rubenada rubenada merged commit c686e2e into apache:main Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants