-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disabling rollup on post agg operators for MSQ sql based ingestion. #13179
Conversation
This PR is related to : #13180 |
|
List<String> outputColumns = columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()); | ||
if (outputColumns == null || outputColumns.size() != 1) { | ||
throw new ISE( | ||
"Unable to run the statement in roll up mode. Please try disabling the rollup mode. Check SQL-based ingestion docs for instructions."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Unable to run the statement in roll up mode. Please try disabling the rollup mode. Check SQL-based ingestion docs for instructions."); | |
"Cannot handle <aggregator-name> in the rollup mode. You can disable the rollup mode or use different aggregators. Please refer to SQL-based ingestion docs for more details."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also log the outputColumns and any other info for troubleshooting.
@cryptoe thank you for looking into this issue. As a user, I'm somewhat confused - it appears as if ingesting records using multi-stage SQL does not support rollups on post-aggregators such as a sketch intersection. What are the consequences of this? For example, if my SQL statement is as follows:
Is the rollup automatic, or does it replicate the rollup configuration of the source table? I'm trying to understand where the rollup is implied in the statement above. |
If tomorrow you decide to partition by month, using reindex, then druid needs to understand how to merge the THETA_SKETCH_INTERSECT across various days. There are always limits to this.
https://druid.apache.org/docs/latest/multi-stage-query/concepts.html#rollup the webconsole automatically figures out if the statement is a rollup statement or not. |
# Conflicts: # extensions-core/multi-stage-query/src/test/java/org/apache/druid/msq/exec/MSQInsertTest.java
List<String> outputColumns = columnMappings.getOutputColumnsForQueryColumn(aggregatorFactory.getName()); | ||
if (outputColumns == null || outputColumns.size() != 1) { | ||
throw new ISE( | ||
"Cannot use aggregator [%s] with input fields [%s] in the rollup mode. It might be using a post aggregator. Please check the native plan to figure out more information about the aggregator. You can disable the rollup mode or use a different aggregator. Please refer to SQL-based ingestion docs for more details.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error should really be in the SQL layer, for a couple reasons:
- It's a validation-style error that doesn't require actually executing the query. So if we can detect this problem in the SQL layer, users will get their error faster, and won't have tasks launched needlessly.
- We can refer to SQL concepts and SQL names rather than native concepts like "aggregator" and native names like
a0
.
There's also a couple of issues with actionability that should be fixed when moving this error to the SQL layer:
- It suggests "disable the rollup mode" as a way to fix the problem, but "rollup mode" is not a thing in MSQ. As we mention in
multi-stage-query/concepts.md
, rollup is something that MSQ does automatically when certain conditions are met. The user would need to switch on aggregation finalization, which may not be what they want. - It suggests "refer to SQL-based ingestion docs", but doesn't provide a link or a hint on what to look for. Doc references should be URLs.
- It suggests "check the native plan", but doesn't say how to do that. (Although I think when we move this to SQL layer, we won't need this part.)
There's three places we can check things in the SQL layer:
- In the SQL validation phase (after parsing, prior to optimizing), i.e. in
validate
inDruidPlanner
orIngestHandler
. - Immediately prior to translation from logical plan to Druid plan, i.e. in
MSQTaskSqlEngine#buildQueryMakerForInsert
- Immediately prior to execution, i.e. in
MSQTaskQueryMaker
(the latest possible time to validate something before a controller task is launched)
Ideally we validate as much as possible as early as possible. Also, ideally, we validate it in a place where we can know the SQL name of the agg function (e.g. from an instance of SqlAggFunction
). That'll let us include the SQL name in the error message.
I'm not totally sure which place is best for this. My guess is (1) or (2), since by (3) we've gone pretty much fully native. Perhaps @paul-rogers would have some advice as he has some experience with the validation stack.
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request/issue has been closed due to lack of activity. If you think that |
While running query :
We get an unknown error :
As in SQL based ingestion with roll up mode enabled, we do not know how to do post aggregator based ingestion, I have disabled the code path which does that now.
Instead is nudges user to to disable the rollup mode by following instructions here : https://druid.apache.org/docs/24.0.0/multi-stage-query/concepts.html#rollup
Fixed the bug ...
Key changed/added classes in this PR
ControllerImpl
This PR has: