-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNION ALLs in MSQ #14981
UNION ALLs in MSQ #14981
Conversation
|
extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQFaultsTest.java
Outdated
Show resolved
Hide resolved
+ "SELECT * FROM foo\n") | ||
.setExpectedRowSignature(rowSignature) | ||
.setExpectedDataSource("foo1") | ||
.setExpectedMSQFault(QueryNotSupportedFault.instance()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm… I am confused about why this yields a QueryNotSupportedFault
. Shouldn't it fail to plan, and generate a planner error instead of an MSQ fault?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does plan using the UnionDataSource
, which then goes into MSQ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment on how it is getting planned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this test can be removed. It should be planned using a UnionDataSource.
Can we also add the NativeQuery for assertion ?
extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQFaultsTest.java
Outdated
Show resolved
Hide resolved
DruidException.Persona.ADMIN, | ||
DruidException.Category.INVALID_INPUT, | ||
"general" | ||
).expectMessageIs("Query planning failed for unknown reason, our best guess is this " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this error coming from? Looking at the code for the union rule, I would think it can't happen, because it's generated by isCompatible
, which isn't called when the ALLOW_TOP_LEVEL_UNION_ALL
feature is missing. The error should be something about UNION ALL
being unsupported for this engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tries to plan the query using the UnionDataSourceRule, goes into the isCompatible
then, and then rewrites the already set planning error.
This gets executed using UnionDataSourceRule
since the column names match, isCompatible
returns and not the top-level union all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename the two test cases that I added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments and renamed the test cases. Hope they clarify the confusion
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidUnionRule.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFreeUnionDataSourceRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFreeUnionDataSourceRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFreeUnionDataSourceRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFreeUnionDataSourceRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFreeUnionDataSourceRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFreeUnionDataSourceRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidFreeUnionDataSourceRel.java
Fixed
Show fixed
Hide fixed
// No need to set the planning error here | ||
return false; | ||
} | ||
if (!firstColumnNames.equals(secondColumnNames)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to casting Calcite might change the name to something like EXPR$0. In such a case this does not allow the onMatch to trigger. What's our plan of action for handling such cases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am debating whether to act stringently and only allow the same names and types (no implicit casts) when using union all with this rule as well. Is there a way to realize that there is an implicit cast done, if so, then we can build something around it to remap it to the original variable name, otherwise it would be a hassle for the user since he won't be able to reference the original column.
Sidenote: I am debating whether to pull the planning changes out of the PR into their separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the union is nested inside a subquery, it would be difficult for the upper callers to reference it once it has changed into the form EXPR$0, due to implicit casting. Therefore if we can identify that it has been cast implicitly, then we should remap it back to the original column name, else I am debating that we should include name and the type check,
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidFreeUnionDataSourceRule.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the initial code. Left some initial review.
@@ -36,13 +37,17 @@ | |||
import java.util.function.Function; | |||
import java.util.stream.Collectors; | |||
|
|||
/** | |||
* TODO(laksh): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a java doc here.
+ "SELECT * FROM foo\n") | ||
.setExpectedRowSignature(rowSignature) | ||
.setExpectedDataSource("foo1") | ||
.setExpectedMSQFault(QueryNotSupportedFault.instance()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this test can be removed. It should be planned using a UnionDataSource.
Can we also add the NativeQuery for assertion ?
*/ | ||
@Test | ||
@Override | ||
public void testUnionIsUnplannable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still required ?
} | ||
|
||
@Test | ||
public void testUnionOnSubqueries() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this test can be marked with ignore till we have the new calcite rule in place.
@@ -170,6 +171,18 @@ public static DataSourcePlan forDataSource( | |||
minStageNumber, | |||
broadcast | |||
); | |||
} else if (dataSource instanceof UnionDataSource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the MSQ known issues and the docs where ever we are calling union all as unsupported in MSQ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes lgtm!!
|
||
/** | ||
* Doesn't pass through Druid however the planning error is different as it rewrites to a union datasource. | ||
* This test is disabled because MSQ wants to support union datasources, and it makes little sense to add highly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems outdated.
if (!(input instanceof TableDataSource)) { | ||
throw DruidException.defensive("should be table"); | ||
} | ||
return Iterables.getOnlyElement(input.getTableNames()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:Lets avoid using Iterables.getOnlyElement(). Lets use CollectionUtils.getOnlyElement()
MSQ now supports UNION ALL with UnionDataSource
MSQ now supports UNION ALL with UnionDataSource
Description
(Note: The description might use union and union all interchangeably unless specified, both of which mean union all in SQL)
This PR updates the following:
UnionDataSource
can have data sources apart from theTableDataSource
. This will be used for MSQ only, since MSQ, in theory, can plan arbitrary unions. Also, it is required to plan unions in MSQ in the DataSourcePlan.UnionDataSource
in MSQ. This will provide MSQ feature parity with the native engine, as it will allow the1
MSQ needs to plan the individual data sources of the union and perform a replace operation so that each data source can be represented by the input specs that it requires. This warrants
UnionDataSource
to accept other data sources as its children. The current methods in theUnionDataSource
have been refactored to perform the original checks only when the data source is used in certain contexts, like the native stack.2
MSQ currently doesn't support UNION queries. However, in the query stack, there are two types of UNIONs:
QueryNotSupported
fault, which is the expected behaviorHowever 2) is executed sequentially by the SQL layer and the results are appended sequentially. For a simple query like
SQL would execute
SELECT * FROM foo
andSELECT * FROM foo2
and concat the results together.This works fine for working with engines producing results synchronously like
sql-native
where we return the results, however, for MSQ, which produces results asynchronously, the concatenation logic doesn't work as expected since don't wait for the query to finish, fetch the results and submit the second query.To make matters worse, the SQL layer submits the first query, gets the query ID back as the result, and then executes the second query (that fails). Therefore we only submit the partial query successfully and we might even get the incorrect results back.
This PR introduces the engine feature
ALLOW_TOP_LEVEL_UNION_ALL
that dictates whether the planner can plan the query using top-level union alls. MSQ disallows this, so the queries are forced to plan using the union data source, which will return query not supported exception.This flag will also be useful once we start supporting unions in MSQ, which we'd want to exclusively execute using
UnionDataSource
, and the flag would seamlessly tie in with the query paths we'd wanna take when planning unions then.With the change, the following query:
native tasks plan query as before (top-level union all)
MSQ tasks plan can't plan query with top-level union all, therefore use the
UnionDataSource
to plan the query, which then ultimately fails withQueryNotSupported
in MSQ3
Check out the changes in the
DataSourcePlan
which allows the union to be planned in the SQL stack4
TBD
Release note
MSQ can execute UNION ALL queries with UnionDataSource.
This PR has: