New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-correlated subquery in having clause #3150
Conversation
newTableRefList.add(inlineViewRef); | ||
FromClause newFromClause = new FromClause(newTableRefList); | ||
SelectStmt result = new SelectStmt(newSelectList, newFromClause, newWherePredicate, null, null, null, | ||
LimitElement.NO_LIMIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the origin query is
select cs_item_sk, sum(cs_sales_price) from catalog_sales a group by cs_item_sk
having sum(cs_sales_price) >
(select min(cs_sales_price) from catalog_sales b where a.cs_item_sk = b.cs_item_sk) limit 1;
on limit after rewrite
what about order by ?
This commit support the non-correlated subquery in having clause. For example: select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table) Also the non-scalar subquery is supportted in Doris. For example: select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table group by k2) Doris will check the result row numbers of subquery in executing. If more then one row returned by subquery, the query will thrown exception. The implement method: The entire outer query is regarded as inline view of new query. The subquery in having clause is changed to the where predicate in this new query. After this commit, tpc-ds 23,24,44 are supported. This commit also support the subquery in ArithmeticExpr. For example: select k1 from table where k1=0.9*(select k1 from t);
7e24bec
to
1581b3d
Compare
fe/src/main/java/org/apache/doris/analysis/BinaryPredicate.java
Outdated
Show resolved
Hide resolved
throws AnalysisException { | ||
if (parsedStmt instanceof QueryStmt) { | ||
QueryStmt analyzedStmt = (QueryStmt) parsedStmt; | ||
Preconditions.checkNotNull(analyzedStmt.analyzer); | ||
rewriteQueryStatement(analyzedStmt, analyzer); | ||
parsedStmt = rewriteQueryStatement(analyzedStmt, analyzer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use a new local variable and return it. Not to return a parameter
variable.
And I think you can just return here.
return rewriteQueryStatement(analyzedStmt, analyzer);
throws AnalysisException { | ||
Preconditions.checkNotNull(stmt); | ||
if (stmt instanceof SelectStmt) { | ||
rewriteSelectStatement((SelectStmt) stmt, analyzer); | ||
stmt = rewriteSelectStatement((SelectStmt) stmt, analyzer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stmt = rewriteSelectStatement((SelectStmt) stmt, analyzer); | |
return rewriteSelectStatement((SelectStmt) stmt, analyzer); |
stmt.sqlString_ = null; | ||
if (LOG.isDebugEnabled()) LOG.debug("rewritten stmt: " + stmt.toSql()); | ||
return stmt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not returning the parameter
, use local variable instead.
* Inline view of subquery: | ||
* from (select b.cs_item_sk, min(cs_sales_price) from catalog_sales b group by cs_item_sk) c | ||
* Rewritten correlated predicate: | ||
* where c.cs_item_sk = a.cs_item_sk and a.sum_cs_sales_price > c.min(cs_sales_price) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a final stmt? Is that
select cs_item_sk, a.sum_cs_sales_price from
(select cs_item_sk, sum(cs_sales_price) sum_cs_sales_price from catalog_sales group by cs_item_sk) a
join
(select b.cs_item_sk, min(cs_sales_price) from catalog_sales b group by cs_item_sk) c
where c.cs_item_sk = a.cs_item_sk and a.sum_cs_sales_price > c.min(cs_sales_price);
|
||
// rewrite where subquery | ||
result = rewriteSelectStatement(result, analyzer); | ||
LOG.info("The final stmt is " + result.toSql()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.info("The final stmt is " + result.toSql()); | |
LOG.debug("The final stmt is " + result.toSql()); |
} | ||
|
||
@Test | ||
public void testRewriteHavingClauseSubqueries() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this case to the queryPlanTest to reduce the running time of FE ut.
And how about adding more tests to cover this case? such as query with limit ?
And could you add the final rewritten stmt in comment:
select empid, x from
(select empid, sum(salary) x from tbl group by empid) v
where v.x > (select avg(salary) from tbl);
Co-Authored-By: Mingyu Chen <morningman.cmy@gmail.com>
Co-Authored-By: Mingyu Chen <morningman.cmy@gmail.com>
…cubator-doris into subquery_in_having
public boolean isCorrelatedPredicate(List<TupleId> tupleIdList) { | ||
if ((this instanceof BinaryPredicate || this instanceof SlotRef) && !this.isBoundByTupleIds(tupleIdList)) { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you add “this instanceof BinaryPredicate ”?
only slotref has override isBoundByTupleIds function.
@@ -95,13 +95,21 @@ | |||
// if we have grouping extensions like cube or rollup or grouping sets | |||
private GroupingInfo groupingInfo; | |||
|
|||
// having clause which has been analyzed | |||
// For example: select k1, sum(k2) a from t group by k1 having a>1; | |||
// this parameter: having sum(t.k2) > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// this parameter: having sum(t.k2) > 1 | |
// this parameter: sum(t.k2) > 1 |
* having predicate: table.k1 > 1 | ||
*/ | ||
/* | ||
* TODO(ml): support substitute outer column in subquery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* TODO(ml): support substitute outer column in subquery | |
* TODO(ml): support substitute outer column in correlated subquery |
/* | ||
* All of columns of result and having clause are replaced by new slot ref which is bound by top tuple of agg info. | ||
* For example: | ||
* ResultExprs: SlotRef(k1), FunctionCall(sum(SlotRef(k2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you add FunctionCall to decorate sum(SlotRef(k2))?And There is no function at below “After rewritten” 。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Em...The "after rewritten" is mean that after Expr.substitueList
as following. This commit is used to analyze what happen with this line resultExprs = Expr.substituteList(resultExprs, combinedSmap, analyzer, false);
if (whereClause == null) { | ||
return false; | ||
} | ||
return whereClause.isCorrelatedPredicate(getTableRefIds()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you only analyze where clause? what other clasue?for example having clasue,selectList。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has been removed.
throws AnalysisException { | ||
Preconditions.checkNotNull(stmt); | ||
if (stmt instanceof SelectStmt) { | ||
rewriteSelectStatement((SelectStmt) stmt, analyzer); | ||
return rewriteSelectStatement((SelectStmt) stmt, analyzer); | ||
} else if (stmt instanceof SetOperationStmt) { | ||
rewriteUnionStatement((SetOperationStmt) stmt, analyzer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what‘s the difference between 70 line and 74 line ? why 70 line add return, and 74 line not return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rewriteUnionStatement is a function without return parameters.
return result; | ||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you’d better add “corelated having subqueries are not supported” here。
* For example: | ||
* Query: select cs_item_sk, sum(cs_sales_price) from catalog_sales a group by cs_item_sk having ...; | ||
* Inline view: | ||
* from (select cs_item_sk $ColumnA, sum(cs_sales_price) $ColumnB from catalog_sales a group by cs_item_sk) $TableA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline view does not need to write the ‘from’ keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid of missing understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit support the non-correlated subquery in having clause.
For example:
select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table)
Also the non-scalar subquery is supported in Doris.
For example:
select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table group by k2)
Doris will check the result row numbers of subquery in executing.
If more then one row returned by subquery, the query will throw exception.
The implement method:
The entire outer query is regarded as inline view of new query.
The subquery in having clause is changed to the where predicate in this new query.
After this commit, tpc-ds 23,24,44 are supported.
This commit also support the subquery in ArithmeticExpr.
For example:
select k1 from table where k1=0.9*(select k1 from t);