-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Morph QueryMakerFactory into SqlEngine. #12897
Conversation
Groundwork for introducing an indexing-service-task-based SQL engine under the umbrella of apache#12262. Also includes some other changes related to improving error behavior. Main changes: 1) Elevate the QueryMakerFactory interface (an extension point that allows customization of how queries are made) into SqlEngine. SQL engines can influence planner behavior through EngineFeatures, and can fully control the mechanics of query execution using QueryMakers. 2) Remove the server-wide QueryMakerFactory choice, in favor of the choice being made by the SQL entrypoint. The indexing-service-task-based SQL engine would be associated with its own entrypoint, like /druid/v2/sql/task. Other changes: 1) Adjust DruidPlanner to try either DRUID or BINDABLE convention based on analysis of the planned rels; never try both. In particular, we no longer try BINDABLE when DRUID fails. This simplifies the logic and improves error messages. 2) Adjust error message "Cannot build plan for query" to omit the SQL query text. Useful because the text can be quite long, which makes it easy to miss the text about the problem. 3) Add a feature to block context parameters used internally by the SQL planner from being supplied by end users. 4) Add a feature to enable adding row signature to the context for Scan queries. This is useful in building the task-based engine. 5) Add saffron.properties file that turns off sets and graphviz dumps in "cannot plan" errors. Significantly reduces log spam on the Broker.
fyi, @paul-rogers this patch has some conflicts with #12845 as it is working in the same general area of the code base. I will resolve them if yours gets merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments
LGTM overall
+1 non binding
public RelDataType resultTypeForInsert(RelDataTypeFactory typeFactory, RelDataType validatedRowType) | ||
{ | ||
// Can't have views of INSERT or REPLACE statements. | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think we prefer UOE over java.lang.UnsupportedOperationException. Should we add a message also ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UOE's major value over UnsupportedOperationException is that it has a built-in String.format, which has a cleaner source-code look than using StringUtils.format
. This exception doesn't use string formatting, however, so it doesn't need to use UOE.
return super.matches(call); | ||
} else { | ||
plannerContext.setPlanningError("SQL query requires scanning external datasources that is not suported."); | ||
plannerContext.setPlanningError( | ||
"Cannot use '%s' with the current SQL engine.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add what is the current sql engine ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but as usual in CS: naming is the hard part 🙂. I think we can add engine names in the future. I thought it would involve some discussion about what to call them in user-facing messages, so better to have a dedicated PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps at least include the class name and we can improve it later. Else, we'll wonder what the "current engine" is if we get this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like including Java class names in user-facing error messages: it seems leaky. But I get why both reviewers here want to add something other than "current engine".
Maybe we can try having the debate about names now, and see if we can reach agreement quickly.
Suggestion: native
for the current sole production engine, and msq-task
for the task-based multi-stage-capable engine we plan to introduce in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a name()
method, and went with native
for the name.
@@ -43,17 +43,26 @@ public ExternalTableScanRule(final PlannerContext plannerContext) | |||
@Override | |||
public boolean matches(RelOptRuleCall call) | |||
{ | |||
if (plannerContext.getQueryMaker().feature(QueryFeature.CAN_READ_EXTERNAL_DATA)) { | |||
if (plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much more readable!!
return false; | ||
} | ||
} | ||
|
||
@Override | ||
public void onMatch(final RelOptRuleCall call) | ||
{ | ||
if (!plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) { | ||
// Not called because "matches" returns false. | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: UOE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that isn't needed if you don't have to format a string.
/** | ||
* Can execute INSERT and REPLACE statements. | ||
*/ | ||
CAN_INSERT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we change this to CAN_INSERT_OR_REPLACE ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is shorter? 🙂
I could go either way on this one to be honest. I don't have a super strong argument for one vs. the other. For that reason I'm inclined to leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the value as suggested by @cryptoe, perhaps make this more granular for each statement type: CAN_SELECT, CAN_INSERT, CAN_REPLACE, CAN_UPSERT, CAN_DELETE, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The granular-by-statement-type is really what I had in mind here. I just also made two unspoken assumptions:
- All engines support SELECT so there is no feature for it
- All engines support INSERT if and only if they also support REPLACE, so there is no need for CAN_INSERT and CAN_REPLACE to be different
I suppose it's fine to get rid of these assumptions and go with CAN_SELECT, CAN_INSERT, CAN_REPLACE. I was trying to keep it neat and simple, but maybe I failed, since it generated a bunch of comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By popular demand I've elected to go with more explicit choices: CAN_SELECT, CAN_INSERT, and CAN_REPLACE. I've also made the error messages nicer. Now, errors about statements like this look like:
Cannot execute INSERT with SQL engine 'native'.
return planWithDruidConvention(rootQueryRel, parsed.getExplainNode(), parsed.getInsertOrReplace()); | ||
if (hasBindableTables) { | ||
// Consider BINDABLE convention if necessary. Used for metadata tables. | ||
if (!parsed.isSelect()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be parsed.getInsertOrReplace()!=null ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. Right now, it's equivalent, so your question is really about future proofing. I thought about it, but BINDABLE only supports SELECT and always will only support SELECT. So we do want to reject any non-SELECT statements. If we add other statements beyond INSERT / REPLACE, then the error message wouldn't make sense here, but I thought that was slightly better than the logic not making sense. Either way, I think it's a very minor point, since we'd need to update this code regardless when we add more statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this will change in a PR that will be added after #12845 is in. We'll refactor this code into "handlers" for each statement type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, no use worrying about it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -57,9 +58,10 @@ public SqlLifecycleFactory( | |||
this.defaultQueryConfig = defaultQueryConfig.get(); | |||
} | |||
|
|||
public SqlLifecycle factorize() | |||
public SqlLifecycle factorize(final SqlEngine engine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, engine selection is up to the client. In the PR #12845 we'll need to pass this into the three factory method for the three kinds of statements. There is code that chooses the factory. It could be that factory selection affects other attributes.
So, I wonder, should there be a "factory selector" that can sniff the sql info and make a choice? Or, should there be a planner factory per engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm imagining that the engine is tied to the endpoint; I mentioned in the PR description that I'm imagining the task-based SQL engine being at /druid/v2/sql/task
. So the idea is the two endpoints would use the same SqlLifecycleFactory with different SqlEngines. In #12845 the SqlLifecycleFactory becomes a SqlStatementFactory. I was thinking the same thing would work: each endpoint would use the same SqlStatementFactory, but with different SqlEngines.
The idea is that there is one universal SqlLifecycleFactory, which ensures that common stuff like validation, authorization, logging, works the same way across all SQL engines. The SqlEngine interface plugs into that, and gets full control over the actual execution, as well as the ability to do some light customization of planning.
Thoughts? Do you see a better way?
return super.matches(call); | ||
} else { | ||
plannerContext.setPlanningError("SQL query requires scanning external datasources that is not suported."); | ||
plannerContext.setPlanningError( | ||
"Cannot use '%s' with the current SQL engine.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps at least include the class name and we can improve it later. Else, we'll wonder what the "current engine" is if we get this error.
) | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion, move this into the engine as validateContext()
so that code here does not need to now what rules each engine might enforce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I'll do it.
return planWithDruidConvention(rootQueryRel, parsed.getExplainNode(), parsed.getInsertOrReplace()); | ||
if (hasBindableTables) { | ||
// Consider BINDABLE convention if necessary. Used for metadata tables. | ||
if (!parsed.isSelect()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this will change in a PR that will be added after #12845 is in. We'll refactor this code into "handlers" for each statement type.
} else { | ||
assert parsed.insertOrReplace != null; | ||
rowType = engine.resultTypeForInsert(typeFactory, rootQueryRel.validatedRowType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly the kind of silliness that the handler refactoring will address.
/** | ||
* Can execute INSERT and REPLACE statements. | ||
*/ | ||
CAN_INSERT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the value as suggested by @cryptoe, perhaps make this more granular for each statement type: CAN_SELECT, CAN_INSERT, CAN_REPLACE, CAN_UPSERT, CAN_DELETE, ...
/** | ||
* Queries of type {@link org.apache.druid.query.timeboundary.TimeBoundaryQuery} are usable. | ||
*/ | ||
CAN_RUN_TIME_BOUNDARY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Since these items are logically part of a feature set, maybe just use the feature name itself. TOPN_QUERY, SELECT, TIME_BOUNDARY_QUERY, etc.
@paul-rogers I've resolved the conflicts with #12845. The resolution involved creating a SqlStatementFactoryFactory, as an alternative to passing around the SqlEngines everywhere. Let me know what you think 😳 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes LGTM. I have one comment about the changes in what error message we throw.
return false; | ||
} | ||
} | ||
|
||
@Override | ||
public void onMatch(final RelOptRuleCall call) | ||
{ | ||
if (!plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) { | ||
// Not called because "matches" returns false. | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that isn't needed if you don't have to format a string.
} else { | ||
// Re-phrase since planning errors are more like hints | ||
errorMessage = "Possible error: " + errorMessage; | ||
} | ||
// Finally, add the query itself to error message that user will get. | ||
return StringUtils.format("Cannot build plan for query: %s. %s", plannerContext.getSql(), errorMessage); | ||
return StringUtils.format("Cannot build plan for query. %s", errorMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having the error message overshadowed by the large query text is indeed annoying. I thought about this before but didn't make any change. One major reason is that many times users are not issuing queries themselves. They are using a BI tool that is building and issuing queries behind the scenes. In such a case, knowing the actual query that was run can be very useful for troubleshooting.
what do you think? An alternative is to change the order a bit and have errorMessage
appear before the actual query text. Or we log the whole thing (which we might be doing already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One major reason is that many times users are not issuing queries themselves. They are using a BI tool that is building and issuing queries behind the scenes. In such a case, knowing the actual query that was run can be very useful for troubleshooting.
That's a good point. Including the query text, but swapping position so it's after the error message, makes sense to me, so the end user gets a copy of the query if it was made through some app. (I think we also do log it, but end users typically don't have access to server logs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up here: #12903
there is going to be one more side effect of this PR btw. Right now, if for whatever reason, we can't plan a query with inline data using druid convention, they would still work. Now we only plan them using the druid convention. e.g. does the query This might be fine since that's not a good use case for druid. I know about this particular query because @vogievetsky was trying out all kinds of such queries on druid sometime back. He was getting a bad exception for this query and I fixed it by tweaking the error handling a bit. |
import javax.servlet.http.HttpServletRequest; | ||
|
||
/** | ||
* Factory factories: when design patterns go too far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh
That's true, |
Two changes: 1) Restore the text of the SQL query. It was removed in apache#12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.
@gianm, looks great. Thanks for merging into the "statement" PR, amazing you could do the merge so quickly. Your "factory factory" is a good solution. In fact, it solves another problem: unit tests tend to want to change the planner config for different tests. The current code jumps through hoops to create all the static objects when the planner config changes. Your "factory factory" creates a handy place to allow us to change the planner config per test. |
Two changes: 1) Restore the text of the SQL query. It was removed in #12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.
Two changes: 1) Restore the text of the SQL query. It was removed in #12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.
Two changes: 1) Restore the text of the SQL query. It was removed in apache#12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.
Groundwork for introducing an indexing-service-task-based SQL engine
under the umbrella of #12262. Also includes some other changes related
to improving error behavior.
Main changes:
Elevate the QueryMakerFactory interface (an extension point that allows
customization of how queries are made) into SqlEngine. SQL engines
can influence planner behavior through EngineFeatures, and can fully
control the mechanics of query execution using QueryMakers.
Remove the server-wide QueryMakerFactory choice, in favor of the choice
being made by the SQL entrypoint. The indexing-service-task-based
SQL engine would be associated with its own entrypoint, like
/druid/v2/sql/task.
Other changes:
Adjust DruidPlanner to try either DRUID or BINDABLE convention based
on analysis of the planned rels; never try both. In particular, we
no longer try BINDABLE when DRUID fails. This simplifies the logic
and improves error messages.
Adjust error message "Cannot build plan for query" to omit the SQL
query text. Useful because the text can be quite long, which makes it
easy to miss the text about the problem.
Add a feature to block context parameters used internally by the SQL
planner from being supplied by end users.
Add a feature to enable adding row signature to the context for
Scan queries. This is useful in building the task-based engine.
Add saffron.properties file that turns off sets and graphviz dumps
in "cannot plan" errors. Significantly reduces log spam on the Broker.