Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-correlated subquery in having clause #3150

Merged
merged 15 commits into from Mar 25, 2020

Conversation

EmmyMiao87
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 commented Mar 19, 2020

This commit support the non-correlated subquery in having clause.
For example:
select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table)

Also the non-scalar subquery is supported in Doris.
For example:
select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table group by k2)
Doris will check the result row numbers of subquery in executing.
If more then one row returned by subquery, the query will throw exception.

The implement method:
The entire outer query is regarded as inline view of new query.
The subquery in having clause is changed to the where predicate in this new query.

After this commit, tpc-ds 23,24,44 are supported.

This commit also support the subquery in ArithmeticExpr.
For example:
select k1 from table where k1=0.9*(select k1 from t);

@EmmyMiao87 EmmyMiao87 added the area/sql/compatibility Issues or PRs related to the SQL compatibililty label Mar 19, 2020
@EmmyMiao87 EmmyMiao87 added this to In progress in TPC-DS via automation Mar 19, 2020
@EmmyMiao87
Copy link
Contributor Author

#2848

@EmmyMiao87 EmmyMiao87 linked an issue Mar 19, 2020 that may be closed by this pull request
newTableRefList.add(inlineViewRef);
FromClause newFromClause = new FromClause(newTableRefList);
SelectStmt result = new SelectStmt(newSelectList, newFromClause, newWherePredicate, null, null, null,
LimitElement.NO_LIMIT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the origin query is

select cs_item_sk, sum(cs_sales_price) from catalog_sales a group by cs_item_sk
     having sum(cs_sales_price) >
     (select min(cs_sales_price) from catalog_sales b where a.cs_item_sk = b.cs_item_sk) limit 1;

on limit after rewrite
what about order by ?

This commit support the non-correlated subquery in having clause.
For example:
select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table)

Also the non-scalar subquery is supportted in Doris.
For example:
select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table group by k2)
Doris will check the result row numbers of subquery in executing.
If more then one row returned by subquery, the query will thrown exception.

The implement method:
The entire outer query is regarded as inline view of new query.
The subquery in having clause is changed to the where predicate in this new query.

After this commit, tpc-ds 23,24,44 are supported.

This commit also support the subquery in ArithmeticExpr.
For example:
select k1  from table where k1=0.9*(select k1 from t);
throws AnalysisException {
if (parsedStmt instanceof QueryStmt) {
QueryStmt analyzedStmt = (QueryStmt) parsedStmt;
Preconditions.checkNotNull(analyzedStmt.analyzer);
rewriteQueryStatement(analyzedStmt, analyzer);
parsedStmt = rewriteQueryStatement(analyzedStmt, analyzer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use a new local variable and return it. Not to return a parameter variable.
And I think you can just return here.

return rewriteQueryStatement(analyzedStmt, analyzer);

throws AnalysisException {
Preconditions.checkNotNull(stmt);
if (stmt instanceof SelectStmt) {
rewriteSelectStatement((SelectStmt) stmt, analyzer);
stmt = rewriteSelectStatement((SelectStmt) stmt, analyzer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stmt = rewriteSelectStatement((SelectStmt) stmt, analyzer);
return rewriteSelectStatement((SelectStmt) stmt, analyzer);

stmt.sqlString_ = null;
if (LOG.isDebugEnabled()) LOG.debug("rewritten stmt: " + stmt.toSql());
return stmt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning the parameter, use local variable instead.

* Inline view of subquery:
* from (select b.cs_item_sk, min(cs_sales_price) from catalog_sales b group by cs_item_sk) c
* Rewritten correlated predicate:
* where c.cs_item_sk = a.cs_item_sk and a.sum_cs_sales_price > c.min(cs_sales_price)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a final stmt? Is that

select cs_item_sk, a.sum_cs_sales_price from
(select cs_item_sk, sum(cs_sales_price) sum_cs_sales_price from catalog_sales group by cs_item_sk) a
join
(select b.cs_item_sk, min(cs_sales_price) from catalog_sales b group by cs_item_sk) c
where c.cs_item_sk = a.cs_item_sk and a.sum_cs_sales_price > c.min(cs_sales_price);


// rewrite where subquery
result = rewriteSelectStatement(result, analyzer);
LOG.info("The final stmt is " + result.toSql());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.info("The final stmt is " + result.toSql());
LOG.debug("The final stmt is " + result.toSql());

}

@Test
public void testRewriteHavingClauseSubqueries() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this case to the queryPlanTest to reduce the running time of FE ut.
And how about adding more tests to cover this case? such as query with limit ?

And could you add the final rewritten stmt in comment:

select empid, x from
(select empid, sum(salary) x from tbl group by empid) v
where v.x > (select avg(salary) from tbl);

@EmmyMiao87 EmmyMiao87 closed this Mar 24, 2020
TPC-DS automation moved this from In progress to Done Mar 24, 2020
@EmmyMiao87 EmmyMiao87 reopened this Mar 24, 2020
TPC-DS automation moved this from Done to In progress Mar 24, 2020
public boolean isCorrelatedPredicate(List<TupleId> tupleIdList) {
if ((this instanceof BinaryPredicate || this instanceof SlotRef) && !this.isBoundByTupleIds(tupleIdList)) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you add “this instanceof BinaryPredicate ”?
only slotref has override isBoundByTupleIds function.

@@ -95,13 +95,21 @@
// if we have grouping extensions like cube or rollup or grouping sets
private GroupingInfo groupingInfo;

// having clause which has been analyzed
// For example: select k1, sum(k2) a from t group by k1 having a>1;
// this parameter: having sum(t.k2) > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// this parameter: having sum(t.k2) > 1
// this parameter: sum(t.k2) > 1

* having predicate: table.k1 > 1
*/
/*
* TODO(ml): support substitute outer column in subquery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* TODO(ml): support substitute outer column in subquery
* TODO(ml): support substitute outer column in correlated subquery

/*
* All of columns of result and having clause are replaced by new slot ref which is bound by top tuple of agg info.
* For example:
* ResultExprs: SlotRef(k1), FunctionCall(sum(SlotRef(k2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you add FunctionCall to decorate sum(SlotRef(k2))?And There is no function at below “After rewritten” 。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em...The "after rewritten" is mean that after Expr.substitueList as following. This commit is used to analyze what happen with this line resultExprs = Expr.substituteList(resultExprs, combinedSmap, analyzer, false);

if (whereClause == null) {
return false;
}
return whereClause.isCorrelatedPredicate(getTableRefIds());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you only analyze where clause? what other clasue?for example having clasue,selectList。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has been removed.

throws AnalysisException {
Preconditions.checkNotNull(stmt);
if (stmt instanceof SelectStmt) {
rewriteSelectStatement((SelectStmt) stmt, analyzer);
return rewriteSelectStatement((SelectStmt) stmt, analyzer);
} else if (stmt instanceof SetOperationStmt) {
rewriteUnionStatement((SetOperationStmt) stmt, analyzer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what‘s the difference between 70 line and 74 line ? why 70 line add return, and 74 line not return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewriteUnionStatement is a function without return parameters.

return result;
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you’d better add “corelated having subqueries are not supported” here。

* For example:
* Query: select cs_item_sk, sum(cs_sales_price) from catalog_sales a group by cs_item_sk having ...;
* Inline view:
* from (select cs_item_sk $ColumnA, sum(cs_sales_price) $ColumnB from catalog_sales a group by cs_item_sk) $TableA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline view does not need to write the ‘from’ keyword

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid of missing understanding.

@EmmyMiao87 EmmyMiao87 closed this Mar 25, 2020
TPC-DS automation moved this from In progress to Done Mar 25, 2020
@EmmyMiao87 EmmyMiao87 reopened this Mar 25, 2020
TPC-DS automation moved this from Done to In progress Mar 25, 2020
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit b2518fc into apache:master Mar 25, 2020
TPC-DS automation moved this from In progress to Done Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql/compatibility Issues or PRs related to the SQL compatibililty
Projects
No open projects
TPC-DS
  
Done
Development

Successfully merging this pull request may close these issues.

Support subquery in having clause
4 participants