SQL: Morph QueryMakerFactory into SqlEngine. #12897

gianm · 2022-08-12T10:04:22Z

Groundwork for introducing an indexing-service-task-based SQL engine
under the umbrella of #12262. Also includes some other changes related
to improving error behavior.

Main changes:

Elevate the QueryMakerFactory interface (an extension point that allows
customization of how queries are made) into SqlEngine. SQL engines
can influence planner behavior through EngineFeatures, and can fully
control the mechanics of query execution using QueryMakers.
Remove the server-wide QueryMakerFactory choice, in favor of the choice
being made by the SQL entrypoint. The indexing-service-task-based
SQL engine would be associated with its own entrypoint, like
/druid/v2/sql/task.

Other changes:

Adjust DruidPlanner to try either DRUID or BINDABLE convention based
on analysis of the planned rels; never try both. In particular, we
no longer try BINDABLE when DRUID fails. This simplifies the logic
and improves error messages.
Adjust error message "Cannot build plan for query" to omit the SQL
query text. Useful because the text can be quite long, which makes it
easy to miss the text about the problem.
Add a feature to block context parameters used internally by the SQL
planner from being supplied by end users.
Add a feature to enable adding row signature to the context for
Scan queries. This is useful in building the task-based engine.
Add saffron.properties file that turns off sets and graphviz dumps
in "cannot plan" errors. Significantly reduces log spam on the Broker.

Groundwork for introducing an indexing-service-task-based SQL engine under the umbrella of apache#12262. Also includes some other changes related to improving error behavior. Main changes: 1) Elevate the QueryMakerFactory interface (an extension point that allows customization of how queries are made) into SqlEngine. SQL engines can influence planner behavior through EngineFeatures, and can fully control the mechanics of query execution using QueryMakers. 2) Remove the server-wide QueryMakerFactory choice, in favor of the choice being made by the SQL entrypoint. The indexing-service-task-based SQL engine would be associated with its own entrypoint, like /druid/v2/sql/task. Other changes: 1) Adjust DruidPlanner to try either DRUID or BINDABLE convention based on analysis of the planned rels; never try both. In particular, we no longer try BINDABLE when DRUID fails. This simplifies the logic and improves error messages. 2) Adjust error message "Cannot build plan for query" to omit the SQL query text. Useful because the text can be quite long, which makes it easy to miss the text about the problem. 3) Add a feature to block context parameters used internally by the SQL planner from being supplied by end users. 4) Add a feature to enable adding row signature to the context for Scan queries. This is useful in building the task-based engine. 5) Add saffron.properties file that turns off sets and graphviz dumps in "cannot plan" errors. Significantly reduces log spam on the Broker.

gianm · 2022-08-12T10:13:41Z

fyi, @paul-rogers this patch has some conflicts with #12845 as it is working in the same general area of the code base. I will resolve them if yours gets merged first.

cryptoe

Minor comments
LGTM overall
+1 non binding

cryptoe · 2022-08-12T17:17:40Z

sql/src/main/java/org/apache/druid/sql/calcite/view/ViewSqlEngine.java

+  public RelDataType resultTypeForInsert(RelDataTypeFactory typeFactory, RelDataType validatedRowType)
+  {
+    // Can't have views of INSERT or REPLACE statements.
+    throw new UnsupportedOperationException();


Nit: I think we prefer UOE over java.lang.UnsupportedOperationException. Should we add a message also ?

UOE's major value over UnsupportedOperationException is that it has a built-in String.format, which has a cleaner source-code look than using StringUtils.format. This exception doesn't use string formatting, however, so it doesn't need to use UOE.

cryptoe · 2022-08-12T17:49:09Z

sql/src/main/java/org/apache/druid/sql/calcite/external/ExternalTableScanRule.java

      return super.matches(call);
    } else {
-      plannerContext.setPlanningError("SQL query requires scanning external datasources that is not suported.");
+      plannerContext.setPlanningError(
+          "Cannot use '%s' with the current SQL engine.",


Should we add what is the current sql engine ?

I thought about it, but as usual in CS: naming is the hard part 🙂. I think we can add engine names in the future. I thought it would involve some discussion about what to call them in user-facing messages, so better to have a dedicated PR for that.

Perhaps at least include the class name and we can improve it later. Else, we'll wonder what the "current engine" is if we get this error.

I don't like including Java class names in user-facing error messages: it seems leaky. But I get why both reviewers here want to add something other than "current engine".

Maybe we can try having the debate about names now, and see if we can reach agreement quickly.

Suggestion: native for the current sole production engine, and msq-task for the task-based multi-stage-capable engine we plan to introduce in the future?

I added a name() method, and went with native for the name.

cryptoe · 2022-08-12T17:49:28Z

sql/src/main/java/org/apache/druid/sql/calcite/external/ExternalTableScanRule.java

@@ -43,17 +43,26 @@ public ExternalTableScanRule(final PlannerContext plannerContext)
  @Override
  public boolean matches(RelOptRuleCall call)
  {
-    if (plannerContext.getQueryMaker().feature(QueryFeature.CAN_READ_EXTERNAL_DATA)) {
+    if (plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) {


This looks much more readable!!

cryptoe · 2022-08-12T17:49:47Z

sql/src/main/java/org/apache/druid/sql/calcite/external/ExternalTableScanRule.java

      return false;
    }
  }

  @Override
  public void onMatch(final RelOptRuleCall call)
  {
+    if (!plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) {
+      // Not called because "matches" returns false.
+      throw new UnsupportedOperationException();


that isn't needed if you don't have to format a string.

cryptoe · 2022-08-12T17:52:28Z

sql/src/main/java/org/apache/druid/sql/calcite/run/EngineFeature.java

+  /**
+   * Can execute INSERT and REPLACE statements.
+   */
+  CAN_INSERT,


Nit: should we change this to CAN_INSERT_OR_REPLACE ?

This one is shorter? 🙂

I could go either way on this one to be honest. I don't have a super strong argument for one vs. the other. For that reason I'm inclined to leave it.

If we change the value as suggested by @cryptoe, perhaps make this more granular for each statement type: CAN_SELECT, CAN_INSERT, CAN_REPLACE, CAN_UPSERT, CAN_DELETE, ...

The granular-by-statement-type is really what I had in mind here. I just also made two unspoken assumptions:

All engines support SELECT so there is no feature for it

All engines support INSERT if and only if they also support REPLACE, so there is no need for CAN_INSERT and CAN_REPLACE to be different

I suppose it's fine to get rid of these assumptions and go with CAN_SELECT, CAN_INSERT, CAN_REPLACE. I was trying to keep it neat and simple, but maybe I failed, since it generated a bunch of comments!

By popular demand I've elected to go with more explicit choices: CAN_SELECT, CAN_INSERT, and CAN_REPLACE. I've also made the error messages nicer. Now, errors about statements like this look like:

Cannot execute INSERT with SQL engine 'native'.

cryptoe · 2022-08-12T17:56:07Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

-      return planWithDruidConvention(rootQueryRel, parsed.getExplainNode(), parsed.getInsertOrReplace());
+      if (hasBindableTables) {
+        // Consider BINDABLE convention if necessary. Used for metadata tables.
+        if (!parsed.isSelect()) {


Should this be parsed.getInsertOrReplace()!=null ?

That's a good question. Right now, it's equivalent, so your question is really about future proofing. I thought about it, but BINDABLE only supports SELECT and always will only support SELECT. So we do want to reject any non-SELECT statements. If we add other statements beyond INSERT / REPLACE, then the error message wouldn't make sense here, but I thought that was slightly better than the logic not making sense. Either way, I think it's a very minor point, since we'd need to update this code regardless when we add more statements.

All this will change in a PR that will be added after #12845 is in. We'll refactor this code into "handlers" for each statement type.

In that case, no use worrying about it!

paul-rogers

@gianm, yes this will collide heavily with #12845. I understand this PR is on a tight deadline. While it won't be much fun, this can go in first, and I'll redo #12845 on top of this one.

paul-rogers · 2022-08-12T19:47:51Z

sql/src/main/java/org/apache/druid/sql/SqlLifecycleFactory.java

@@ -57,9 +58,10 @@ public SqlLifecycleFactory(
    this.defaultQueryConfig = defaultQueryConfig.get();
  }

-  public SqlLifecycle factorize()
+  public SqlLifecycle factorize(final SqlEngine engine)


With this change, engine selection is up to the client. In the PR #12845 we'll need to pass this into the three factory method for the three kinds of statements. There is code that chooses the factory. It could be that factory selection affects other attributes.

So, I wonder, should there be a "factory selector" that can sniff the sql info and make a choice? Or, should there be a planner factory per engine?

I'm imagining that the engine is tied to the endpoint; I mentioned in the PR description that I'm imagining the task-based SQL engine being at /druid/v2/sql/task. So the idea is the two endpoints would use the same SqlLifecycleFactory with different SqlEngines. In #12845 the SqlLifecycleFactory becomes a SqlStatementFactory. I was thinking the same thing would work: each endpoint would use the same SqlStatementFactory, but with different SqlEngines.

The idea is that there is one universal SqlLifecycleFactory, which ensures that common stuff like validation, authorization, logging, works the same way across all SQL engines. The SqlEngine interface plugs into that, and gets full control over the actual execution, as well as the ability to do some light customization of planning.

Thoughts? Do you see a better way?

paul-rogers · 2022-08-12T19:49:04Z

sql/src/main/java/org/apache/druid/sql/calcite/external/ExternalTableScanRule.java

      return super.matches(call);
    } else {
-      plannerContext.setPlanningError("SQL query requires scanning external datasources that is not suported.");
+      plannerContext.setPlanningError(
+          "Cannot use '%s' with the current SQL engine.",


Perhaps at least include the class name and we can improve it later. Else, we'll wonder what the "current engine" is if we get this error.

paul-rogers · 2022-08-12T19:54:52Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

+            )
+        );
+      }
+    }


Suggestion, move this into the engine as validateContext() so that code here does not need to now what rules each engine might enforce.

That's a good idea. I'll do it.

paul-rogers · 2022-08-12T19:56:03Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

-      return planWithDruidConvention(rootQueryRel, parsed.getExplainNode(), parsed.getInsertOrReplace());
+      if (hasBindableTables) {
+        // Consider BINDABLE convention if necessary. Used for metadata tables.
+        if (!parsed.isSelect()) {


All this will change in a PR that will be added after #12845 is in. We'll refactor this code into "handlers" for each statement type.

paul-rogers · 2022-08-12T19:56:39Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

+      } else {
+        assert parsed.insertOrReplace != null;
+        rowType = engine.resultTypeForInsert(typeFactory, rootQueryRel.validatedRowType);
+      }


This is exactly the kind of silliness that the handler refactoring will address.

paul-rogers · 2022-08-12T19:58:41Z

sql/src/main/java/org/apache/druid/sql/calcite/run/EngineFeature.java

+  /**
+   * Can execute INSERT and REPLACE statements.
+   */
+  CAN_INSERT,


If we change the value as suggested by @cryptoe, perhaps make this more granular for each statement type: CAN_SELECT, CAN_INSERT, CAN_REPLACE, CAN_UPSERT, CAN_DELETE, ...

paul-rogers · 2022-08-12T20:00:13Z

sql/src/main/java/org/apache/druid/sql/calcite/run/EngineFeature.java

+  /**
+   * Queries of type {@link org.apache.druid.query.timeboundary.TimeBoundaryQuery} are usable.
+   */
+  CAN_RUN_TIME_BOUNDARY,


Nit: Since these items are logically part of a feature set, maybe just use the feature name itself. TOPN_QUERY, SELECT, TIME_BOUNDARY_QUERY, etc.

gianm · 2022-08-14T08:54:36Z

@paul-rogers I've resolved the conflicts with #12845. The resolution involved creating a SqlStatementFactoryFactory, as an alternative to passing around the SqlEngines everywhere. Let me know what you think 😳

abhishekagarwal87

changes LGTM. I have one comment about the changes in what error message we throw.

abhishekagarwal87 · 2022-08-15T05:14:27Z

sql/src/main/java/org/apache/druid/sql/calcite/external/ExternalTableScanRule.java

      return false;
    }
  }

  @Override
  public void onMatch(final RelOptRuleCall call)
  {
+    if (!plannerContext.engineHasFeature(EngineFeature.CAN_READ_EXTERNAL_DATA)) {
+      // Not called because "matches" returns false.
+      throw new UnsupportedOperationException();


that isn't needed if you don't have to format a string.

abhishekagarwal87 · 2022-08-15T05:21:58Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java

    } else {
      // Re-phrase since planning errors are more like hints
      errorMessage = "Possible error: " + errorMessage;
    }
    // Finally, add the query itself to error message that user will get.
-    return StringUtils.format("Cannot build plan for query: %s. %s", plannerContext.getSql(), errorMessage);
+    return StringUtils.format("Cannot build plan for query. %s", errorMessage);


having the error message overshadowed by the large query text is indeed annoying. I thought about this before but didn't make any change. One major reason is that many times users are not issuing queries themselves. They are using a BI tool that is building and issuing queries behind the scenes. In such a case, knowing the actual query that was run can be very useful for troubleshooting.

what do you think? An alternative is to change the order a bit and have errorMessage appear before the actual query text. Or we log the whole thing (which we might be doing already)

One major reason is that many times users are not issuing queries themselves. They are using a BI tool that is building and issuing queries behind the scenes. In such a case, knowing the actual query that was run can be very useful for troubleshooting.

That's a good point. Including the query text, but swapping position so it's after the error message, makes sense to me, so the end user gets a copy of the query if it was made through some app. (I think we also do log it, but end users typically don't have access to server logs.)

Follow up here: #12903

abhishekagarwal87 · 2022-08-15T05:41:46Z

there is going to be one more side effect of this PR btw. Right now, if for whatever reason, we can't plan a query with inline data using druid convention, they would still work. Now we only plan them using the druid convention. e.g. does the query select (1, 2) still work with these changes?

This might be fine since that's not a good use case for druid. I know about this particular query because @vogievetsky was trying out all kinds of such queries on druid sometime back. He was getting a bad exception for this query and I fixed it by tweaking the error handling a bit.

clintropolis · 2022-08-15T05:24:17Z

sql/src/main/java/org/apache/druid/sql/SqlStatementFactoryFactory.java

+import javax.servlet.http.HttpServletRequest;
+
+/**
+ * Factory factories: when design patterns go too far.


gianm · 2022-08-15T15:52:55Z

there is going to be one more side effect of this PR btw. Right now, if for whatever reason, we can't plan a query with inline data using druid convention, they would still work. Now we only plan them using the druid convention. e.g. does the query select (1, 2) still work with these changes?

This might be fine since that's not a good use case for druid. I know about this particular query because @vogievetsky was trying out all kinds of such queries on druid sometime back. He was getting a bad exception for this query and I fixed it by tweaking the error handling a bit.

That's true, select (1, 2) won't run anymore, and I actually had to update one test case due to this. I think it's OK though: it's weird that some tableless SELECT queries use Druid expressions and some use Calcite's own interpreter. Especially because these are not equivalent: for example, Druid expressions have the full range of Druid functions but Calcite's own interpreter does not. Better if tableless SELECT queries all go through Druid expressions. If we need select (1, 2) to run then IMO we should add the necessary type support to Druid expressions.

Two changes: 1) Restore the text of the SQL query. It was removed in apache#12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.

paul-rogers · 2022-08-15T16:53:16Z

@gianm, looks great. Thanks for merging into the "statement" PR, amazing you could do the merge so quickly.

Your "factory factory" is a good solution. In fact, it solves another problem: unit tests tend to want to change the planner config for different tests. The current code jumps through hoops to create all the static objects when the planner config changes. Your "factory factory" creates a handy place to allow us to change the planner config per test.

Two changes: 1) Restore the text of the SQL query. It was removed in #12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.

Two changes: 1) Restore the text of the SQL query. It was removed in apache#12897, but then it was later pointed out that the text is helpful for end users querying Druid through tools that do not show the SQL queries that they are making. 2) Adjust wording slightly, from "Cannot build plan for query" to "Query not supported". This will be clearer to most users. Generally the reason we get these errors is due to unsupported SQL constructs.

gianm added Area - Querying Area - SQL labels Aug 12, 2022

abhishekagarwal87 added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Aug 12, 2022

Fixes from CI.

d694c67

cryptoe approved these changes Aug 12, 2022

View reviewed changes

paul-rogers reviewed Aug 12, 2022

View reviewed changes

gianm added 3 commits August 13, 2022 21:12

Changes from review.

54506d2

Merge branch 'master' into sql-engine

34de081

Can vectorize, now that join-to-filter is on by default.

71daec5

gianm mentioned this pull request Aug 14, 2022

Refactor SqlLifecycle into statement classes #12845

Merged

5 tasks

gianm added 2 commits August 14, 2022 01:42

Merge branch 'master' into sql-engine

34bf9a9

Checkstyle! And variable renames!

f95b523

Remove throws from test.

d8ab26e

abhishekagarwal87 approved these changes Aug 15, 2022

View reviewed changes

clintropolis approved these changes Aug 15, 2022

View reviewed changes

vogievetsky merged commit 6c5a431 into apache:master Aug 15, 2022

gianm deleted the sql-engine branch August 15, 2022 15:48

gianm mentioned this pull request Aug 15, 2022

Adjust SQL "cannot plan" error message. #12903

Merged

abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022

techdocsmith mentioned this pull request Aug 31, 2022

[Draft] 24.0 Release notes #12825

Closed

abhishekagarwal87 mentioned this pull request Sep 8, 2022

Test issue [Please ignore] #13055

Closed

SQL: Morph QueryMakerFactory into SqlEngine. #12897

SQL: Morph QueryMakerFactory into SqlEngine. #12897

Conversation

gianm commented Aug 12, 2022

gianm commented Aug 12, 2022 • edited

cryptoe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-rogers Aug 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-rogers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-rogers Aug 12, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm commented Aug 14, 2022

abhishekagarwal87 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhishekagarwal87 commented Aug 15, 2022 • edited

Choose a reason for hiding this comment

gianm commented Aug 15, 2022

paul-rogers commented Aug 15, 2022

gianm commented Aug 12, 2022 •

edited

paul-rogers Aug 12, 2022 •

edited

paul-rogers Aug 12, 2022 •

edited

abhishekagarwal87 commented Aug 15, 2022 •

edited