-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug](materialized-view) check groupby in agg-bottom-plan when rewrite agg query by mv #34274
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 41812 ms
|
TPC-DS: Total hot run time: 185787 ms
|
} | ||
finalOutputExpressions.add(new Alias(rollupedExpression)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final output should contains both aggregate function and group by dimension, this may lost group by dimension out put and change original agg output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final output should contains both aggregate function and group by dimension, this may lost group by dimension out put and change original agg output
I think maybe line 261 does the job you mentioned.
This PR does not modify the logic in rewriteQueryByView().
The key change is line 428 in topPlanSplitToGroupAndFunction() that prevent the loss of groupby in child agg for rules like MaterializedViewProjectAggregateRule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point, the change may change the original agg output order.
such as query is as following:
SELECT sum(o_totalprice), o_orderpriority
FROM orders
GROUP BY o_orderstatus, o_orderpriority;
after mv rewreite will be
SELECT o_orderpriority, sum(o_totalprice)
FROM orders
GROUP BY o_orderstatus, o_orderpriority;
Maybe we should add another check as following
WDYT?
// if top plan doesn't use group by but bottom agg has group by, should also check
if (!queryGroupByExpressions.isEmpty() && queryTopPlanGroupBySet.isEmpty()) {
// todo check as your code in line 242 to 262
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, I will try
} | ||
finalOutputExpressions.add(new Alias(rollupedExpression)); | ||
} | ||
// add project to guarantee group by column ref is slot reference, | ||
// this is necessary because physical createHash will need slotReference later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the following logic here is better, WDYT?
// if top plan doesn't use group by but bottom agg has group by, should also check
if (!queryGroupByExpressions.isEmpty() && queryTopPlanGroupBySet.isEmpty()) {
// todo check as code in line 263 to 278
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried here but caused duplicated code about rewrittenGroupByExpression
, so I just add the compensate groupby to queryExpressions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated code can extract a common method and call the method in different place.
queryExpressions
means the expression query used readly, if you add compensate groupby to queryExpressions
, this may change the queryExpressions
original meaning.
@@ -284,6 +285,26 @@ protected Plan rewriteQueryByView(MatchMode matchMode, | |||
} | |||
// add project to guarantee group by column ref is slot reference, | |||
// this is necessary because physical createHash will need slotReference later | |||
for (Expression expression : queryChildrenGroupBySet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can add if condition as following for performance
if (!queryGroupByExpressions.isEmpty() && queryTopPlanGroupBySet.isEmpty()) {
do your check logic
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有没有可能queryTopPlanGroupBySet不为空,但是也没有包含全部的groupby字段 这种情况?
比如 select sum(o_totalprice), a from orders group by o_orderdate, a, b
这种情况的话if (!queryGroupByExpressions.isEmpty() && queryTopPlanGroupBySet.isEmpty())
是不是筛查不出来
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, this check condition should be if top group by doesn't equals bottom group by, we should add check
@@ -426,7 +447,8 @@ private Pair<Set<? extends Expression>, Set<? extends Expression>> topPlanSplitT | |||
topGroupByExpressions.add(expression); | |||
} | |||
}); | |||
return Pair.of(topGroupByExpressions, topFunctionExpressions); | |||
groupByExpressionSet.removeAll(topGroupByExpressions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
topPlanSplitToGroupAndFunction
means get the group by and agg fuctions expression pair from top plan, this change the measing of the method
May be we can get the result of method topPlanSplitToGroupAndFunction
and then do your removeAll logic
} | ||
NamedExpression groupByExpression = rewrittenGroupByExpression instanceof NamedExpression | ||
? (NamedExpression) rewrittenGroupByExpression : new Alias(rewrittenGroupByExpression); | ||
finalGroupExpressions.add(groupByExpression); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if doesn't pass when check, we return null as the code line 302, this is useful
We should not add the groupByExpression to finalGroupExpressions
like code line 306,
finalGroupExpressions
means mv rewrite result out put
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉得留在finalGroupExpressions里边吧。比如像下面这个select语句在改写的时候,logicalProject里边不包含groupby的字段,这个时候不把child里边的logicalAggregate的groupby o_orderdate留着,那改写成功以后这个groupby不就被丢掉了吗?我理解错了吗
select sum(o_totalprice) from orders group by o_orderdate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, this should add to the finalGroupExpressions.
your test case is nice, can you add the test case to regression-test/suites/nereids_rules_p0/mv/agg_with_roll_up/aggregate_with_roll_up.groovy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'd like to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After add the test case to aggregate_with_roll_up.groovy
you can run the command as following to re generate aggregate_with_roll_up.out:
./run-regression-test.sh --run -d nereids_rules_p0/mv/agg_with_roll_up -s aggregate_with_roll_up -forceGenOut
see regresson-test detail in https://doris.apache.org/community/developer-guide/regression-testing
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by anyone and no changes requested. |
Next We should make sure that required pipeline tests all pass |
PR approved by at least one committer and no changes requested. |
…y mv (#34274) check groupby in agg-bottom-plan when rewrite and rollup agg query by mv
…terialized view (apache#38958) This is brought by apache#34274 if mv def is select o_orderdate from orders group by o_orderdate; query is as followiing, the result is wrong. select 1 from orders group by o_orderdate;
Proposed changes
Issue Number: close #34155
check groupby in agg-bottom-plan when rewrite and rollup agg query by mv
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...