Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Morph QueryMakerFactory into SqlEngine. #12897

Merged
merged 8 commits into from
Aug 15, 2022

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Aug 12, 2022

Groundwork for introducing an indexing-service-task-based SQL engine
under the umbrella of #12262. Also includes some other changes related
to improving error behavior.

Main changes:

  1. Elevate the QueryMakerFactory interface (an extension point that allows
    customization of how queries are made) into SqlEngine. SQL engines
    can influence planner behavior through EngineFeatures, and can fully
    control the mechanics of query execution using QueryMakers.

  2. Remove the server-wide QueryMakerFactory choice, in favor of the choice
    being made by the SQL entrypoint. The indexing-service-task-based
    SQL engine would be associated with its own entrypoint, like
    /druid/v2/sql/task.

Other changes:

  1. Adjust DruidPlanner to try either DRUID or BINDABLE convention based
    on analysis of the planned rels; never try both. In particular, we
    no longer try BINDABLE when DRUID fails. This simplifies the logic
    and improves error messages.

  2. Adjust error message "Cannot build plan for query" to omit the SQL
    query text. Useful because the text can be quite long, which makes it
    easy to miss the text about the problem.

  3. Add a feature to block context parameters used internally by the SQL
    planner from being supplied by end users.

  4. Add a feature to enable adding row signature to the context for
    Scan queries. This is useful in building the task-based engine.

  5. Add saffron.properties file that turns off sets and graphviz dumps
    in "cannot plan" errors. Significantly reduces log spam on the Broker.

Groundwork for introducing an indexing-service-task-based SQL engine
under the umbrella of apache#12262. Also includes some other changes related
to improving error behavior.

Main changes:

1) Elevate the QueryMakerFactory interface (an extension point that allows
   customization of how queries are made) into SqlEngine. SQL engines
   can influence planner behavior through EngineFeatures, and can fully
   control the mechanics of query execution using QueryMakers.

2) Remove the server-wide QueryMakerFactory choice, in favor of the choice
   being made by the SQL entrypoint. The indexing-service-task-based
   SQL engine would be associated with its own entrypoint, like
   /druid/v2/sql/task.

Other changes:

1) Adjust DruidPlanner to try either DRUID or BINDABLE convention based
   on analysis of the planned rels; never try both. In particular, we
   no longer try BINDABLE when DRUID fails. This simplifies the logic
   and improves error messages.

2) Adjust error message "Cannot build plan for query" to omit the SQL
   query text. Useful because the text can be quite long, which makes it
   easy to miss the text about the problem.

3) Add a feature to block context parameters used internally by the SQL
   planner from being supplied by end users.

4) Add a feature to enable adding row signature to the context for
   Scan queries. This is useful in building the task-based engine.

5) Add saffron.properties file that turns off sets and graphviz dumps
   in "cannot plan" errors. Significantly reduces log spam on the Broker.
@gianm
Copy link
Contributor Author

gianm commented Aug 12, 2022

fyi, @paul-rogers this patch has some conflicts with #12845 as it is working in the same general area of the code base. I will resolve them if yours gets merged first.

@abhishekagarwal87 abhishekagarwal87 added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Aug 12, 2022
Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments
LGTM overall
+1 non binding

public RelDataType resultTypeForInsert(RelDataTypeFactory typeFactory, RelDataType validatedRowType)
{
// Can't have views of INSERT or REPLACE statements.
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we prefer UOE over java.lang.UnsupportedOperationException. Should we add a message also ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UOE's major value over UnsupportedOperationException is that it has a built-in String.format, which has a cleaner source-code look than using StringUtils.format. This exception doesn't use string formatting, however, so it doesn't need to use UOE.

return super.matches(call);
} else {
plannerContext.setPlanningError("SQL query requires scanning external datasources that is not suported.");
plannerContext.setPlanningError(
"Cannot use '%s' with the current SQL engine.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add what is the current sql engine ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but as usual in CS: naming is the hard part 🙂. I think we can add engine names in the future. I thought it would involve some discussion about what to call them in user-facing messages, so better to have a dedicated PR for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps at least include the class name and we can improve it later. Else, we'll wonder what the "current engine" is if we get this error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like including Java class names in user-facing error messages: it seems leaky. But I get why both reviewers here want to add something other than "current engine".

Maybe we can try having the debate about names now, and see if we can reach agreement quickly.

Suggestion: native for the current sole production engine, and msq-task for the task-based multi-stage-capable engine we plan to introduce in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a name() method, and went with native for the name.

@@ -43,17 +43,26 @@ public ExternalTableScanRule(final PlannerContext plannerContext)
@Override
public boolean matches(RelOptRuleCall call)
{
if (plannerContext.getQueryMaker().feature(QueryFeature.CAN_READ_EXTERNAL_DATA)) {
if (plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much more readable!!

return false;
}
}

@Override
public void onMatch(final RelOptRuleCall call)
{
if (!plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) {
// Not called because "matches" returns false.
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: UOE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that isn't needed if you don't have to format a string.

/**
* Can execute INSERT and REPLACE statements.
*/
CAN_INSERT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we change this to CAN_INSERT_OR_REPLACE ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is shorter? 🙂

I could go either way on this one to be honest. I don't have a super strong argument for one vs. the other. For that reason I'm inclined to leave it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the value as suggested by @cryptoe, perhaps make this more granular for each statement type: CAN_SELECT, CAN_INSERT, CAN_REPLACE, CAN_UPSERT, CAN_DELETE, ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The granular-by-statement-type is really what I had in mind here. I just also made two unspoken assumptions:

  • All engines support SELECT so there is no feature for it
  • All engines support INSERT if and only if they also support REPLACE, so there is no need for CAN_INSERT and CAN_REPLACE to be different

I suppose it's fine to get rid of these assumptions and go with CAN_SELECT, CAN_INSERT, CAN_REPLACE. I was trying to keep it neat and simple, but maybe I failed, since it generated a bunch of comments!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By popular demand I've elected to go with more explicit choices: CAN_SELECT, CAN_INSERT, and CAN_REPLACE. I've also made the error messages nicer. Now, errors about statements like this look like:

Cannot execute INSERT with SQL engine 'native'.

return planWithDruidConvention(rootQueryRel, parsed.getExplainNode(), parsed.getInsertOrReplace());
if (hasBindableTables) {
// Consider BINDABLE convention if necessary. Used for metadata tables.
if (!parsed.isSelect()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be parsed.getInsertOrReplace()!=null ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Right now, it's equivalent, so your question is really about future proofing. I thought about it, but BINDABLE only supports SELECT and always will only support SELECT. So we do want to reject any non-SELECT statements. If we add other statements beyond INSERT / REPLACE, then the error message wouldn't make sense here, but I thought that was slightly better than the logic not making sense. Either way, I think it's a very minor point, since we'd need to update this code regardless when we add more statements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this will change in a PR that will be added after #12845 is in. We'll refactor this code into "handlers" for each statement type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, no use worrying about it!

Copy link
Contributor

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm, yes this will collide heavily with #12845. I understand this PR is on a tight deadline. While it won't be much fun, this can go in first, and I'll redo #12845 on top of this one.

@@ -57,9 +58,10 @@ public SqlLifecycleFactory(
this.defaultQueryConfig = defaultQueryConfig.get();
}

public SqlLifecycle factorize()
public SqlLifecycle factorize(final SqlEngine engine)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, engine selection is up to the client. In the PR #12845 we'll need to pass this into the three factory method for the three kinds of statements. There is code that chooses the factory. It could be that factory selection affects other attributes.

So, I wonder, should there be a "factory selector" that can sniff the sql info and make a choice? Or, should there be a planner factory per engine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm imagining that the engine is tied to the endpoint; I mentioned in the PR description that I'm imagining the task-based SQL engine being at /druid/v2/sql/task. So the idea is the two endpoints would use the same SqlLifecycleFactory with different SqlEngines. In #12845 the SqlLifecycleFactory becomes a SqlStatementFactory. I was thinking the same thing would work: each endpoint would use the same SqlStatementFactory, but with different SqlEngines.

The idea is that there is one universal SqlLifecycleFactory, which ensures that common stuff like validation, authorization, logging, works the same way across all SQL engines. The SqlEngine interface plugs into that, and gets full control over the actual execution, as well as the ability to do some light customization of planning.

Thoughts? Do you see a better way?

return super.matches(call);
} else {
plannerContext.setPlanningError("SQL query requires scanning external datasources that is not suported.");
plannerContext.setPlanningError(
"Cannot use '%s' with the current SQL engine.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps at least include the class name and we can improve it later. Else, we'll wonder what the "current engine" is if we get this error.

)
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion, move this into the engine as validateContext() so that code here does not need to now what rules each engine might enforce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I'll do it.

return planWithDruidConvention(rootQueryRel, parsed.getExplainNode(), parsed.getInsertOrReplace());
if (hasBindableTables) {
// Consider BINDABLE convention if necessary. Used for metadata tables.
if (!parsed.isSelect()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this will change in a PR that will be added after #12845 is in. We'll refactor this code into "handlers" for each statement type.

} else {
assert parsed.insertOrReplace != null;
rowType = engine.resultTypeForInsert(typeFactory, rootQueryRel.validatedRowType);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the kind of silliness that the handler refactoring will address.

/**
* Can execute INSERT and REPLACE statements.
*/
CAN_INSERT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the value as suggested by @cryptoe, perhaps make this more granular for each statement type: CAN_SELECT, CAN_INSERT, CAN_REPLACE, CAN_UPSERT, CAN_DELETE, ...

/**
* Queries of type {@link org.apache.druid.query.timeboundary.TimeBoundaryQuery} are usable.
*/
CAN_RUN_TIME_BOUNDARY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since these items are logically part of a feature set, maybe just use the feature name itself. TOPN_QUERY, SELECT, TIME_BOUNDARY_QUERY, etc.

@gianm
Copy link
Contributor Author

gianm commented Aug 14, 2022

@paul-rogers I've resolved the conflicts with #12845. The resolution involved creating a SqlStatementFactoryFactory, as an alternative to passing around the SqlEngines everywhere. Let me know what you think 😳

Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes LGTM. I have one comment about the changes in what error message we throw.

return false;
}
}

@Override
public void onMatch(final RelOptRuleCall call)
{
if (!plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) {
// Not called because "matches" returns false.
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that isn't needed if you don't have to format a string.

} else {
// Re-phrase since planning errors are more like hints
errorMessage = "Possible error: " + errorMessage;
}
// Finally, add the query itself to error message that user will get.
return StringUtils.format("Cannot build plan for query: %s. %s", plannerContext.getSql(), errorMessage);
return StringUtils.format("Cannot build plan for query. %s", errorMessage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having the error message overshadowed by the large query text is indeed annoying. I thought about this before but didn't make any change. One major reason is that many times users are not issuing queries themselves. They are using a BI tool that is building and issuing queries behind the scenes. In such a case, knowing the actual query that was run can be very useful for troubleshooting.

what do you think? An alternative is to change the order a bit and have errorMessage appear before the actual query text. Or we log the whole thing (which we might be doing already)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major reason is that many times users are not issuing queries themselves. They are using a BI tool that is building and issuing queries behind the scenes. In such a case, knowing the actual query that was run can be very useful for troubleshooting.

That's a good point. Including the query text, but swapping position so it's after the error message, makes sense to me, so the end user gets a copy of the query if it was made through some app. (I think we also do log it, but end users typically don't have access to server logs.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up here: #12903

@abhishekagarwal87
Copy link
Contributor

abhishekagarwal87 commented Aug 15, 2022

there is going to be one more side effect of this PR btw. Right now, if for whatever reason, we can't plan a query with inline data using druid convention, they would still work. Now we only plan them using the druid convention. e.g. does the query select (1, 2) still work with these changes?

This might be fine since that's not a good use case for druid. I know about this particular query because @vogievetsky was trying out all kinds of such queries on druid sometime back. He was getting a bad exception for this query and I fixed it by tweaking the error handling a bit.

import javax.servlet.http.HttpServletRequest;

/**
* Factory factories: when design patterns go too far.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh

@vogievetsky vogievetsky merged commit 6c5a431 into apache:master Aug 15, 2022
@gianm gianm deleted the sql-engine branch August 15, 2022 15:48
@gianm
Copy link
Contributor Author

gianm commented Aug 15, 2022

there is going to be one more side effect of this PR btw. Right now, if for whatever reason, we can't plan a query with inline data using druid convention, they would still work. Now we only plan them using the druid convention. e.g. does the query select (1, 2) still work with these changes?

This might be fine since that's not a good use case for druid. I know about this particular query because @vogievetsky was trying out all kinds of such queries on druid sometime back. He was getting a bad exception for this query and I fixed it by tweaking the error handling a bit.

That's true, select (1, 2) won't run anymore, and I actually had to update one test case due to this. I think it's OK though: it's weird that some tableless SELECT queries use Druid expressions and some use Calcite's own interpreter. Especially because these are not equivalent: for example, Druid expressions have the full range of Druid functions but Calcite's own interpreter does not. Better if tableless SELECT queries all go through Druid expressions. If we need select (1, 2) to run then IMO we should add the necessary type support to Druid expressions.

gianm added a commit to gianm/druid that referenced this pull request Aug 15, 2022
Two changes:

1) Restore the text of the SQL query. It was removed in apache#12897, but
   then it was later pointed out that the text is helpful for end
   users querying Druid through tools that do not show the SQL queries
   that they are making.

2) Adjust wording slightly, from "Cannot build plan for query" to
   "Query not supported". This will be clearer to most users. Generally
   the reason we get these errors is due to unsupported SQL constructs.
@paul-rogers
Copy link
Contributor

@gianm, looks great. Thanks for merging into the "statement" PR, amazing you could do the merge so quickly.

Your "factory factory" is a good solution. In fact, it solves another problem: unit tests tend to want to change the planner config for different tests. The current code jumps through hoops to create all the static objects when the planner config changes. Your "factory factory" creates a handy place to allow us to change the planner config per test.

@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
abhishekagarwal87 pushed a commit that referenced this pull request Aug 29, 2022
Two changes:

1) Restore the text of the SQL query. It was removed in #12897, but
   then it was later pointed out that the text is helpful for end
   users querying Druid through tools that do not show the SQL queries
   that they are making.

2) Adjust wording slightly, from "Cannot build plan for query" to
   "Query not supported". This will be clearer to most users. Generally
   the reason we get these errors is due to unsupported SQL constructs.
abhishekagarwal87 pushed a commit that referenced this pull request Aug 29, 2022
Two changes:

1) Restore the text of the SQL query. It was removed in #12897, but
   then it was later pointed out that the text is helpful for end
   users querying Druid through tools that do not show the SQL queries
   that they are making.

2) Adjust wording slightly, from "Cannot build plan for query" to
   "Query not supported". This will be clearer to most users. Generally
   the reason we get these errors is due to unsupported SQL constructs.
599166320 pushed a commit to 599166320/apache-druid that referenced this pull request Aug 29, 2022
Two changes:

1) Restore the text of the SQL query. It was removed in apache#12897, but
   then it was later pointed out that the text is helpful for end
   users querying Druid through tools that do not show the SQL queries
   that they are making.

2) Adjust wording slightly, from "Cannot build plan for query" to
   "Query not supported". This will be clearer to most users. Generally
   the reason we get these errors is due to unsupported SQL constructs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying Area - SQL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants