Decouple logical planning and native query generation in SQL planning#14232
Decouple logical planning and native query generation in SQL planning#14232cheddar merged 12 commits intoapache:masterfrom
Conversation
| } | ||
|
|
||
| @Override | ||
| public boolean equals(Object obj) |
Check notice
Code scanning / CodeQL
Confusing overloading of methods
| return queryList; | ||
| } | ||
|
|
||
| public List<DruidTable> getQueryTables() |
Check notice
Code scanning / CodeQL
Exposing internal representation
| return partialDruidQuery; | ||
| } | ||
|
|
||
| public List<PartialDruidQuery> getQueryList() |
Check notice
Code scanning / CodeQL
Exposing internal representation
| @Override | ||
| public RelNode visit(RelNode other) | ||
| { | ||
| if (other instanceof TableScan) { |
Check notice
Code scanning / CodeQL
Chain of 'instanceof' tests
| /** | ||
| * {@link DruidLogicalNode} convention node for {@link Aggregate} plan node. | ||
| */ | ||
| public class DruidAggregate extends Aggregate implements DruidLogicalNode |
Check failure
Code scanning / CodeQL
No clone method
| /** | ||
| * {@link DruidLogicalNode} convention node for {@link TableScan} plan node. | ||
| */ | ||
| public class DruidTableScan extends TableScan implements DruidLogicalNode |
Check failure
Code scanning / CodeQL
No clone method
|
Hi Rohan - can you write a separate GH proposal that adds the motivation behind these changes with a bit more details? Seems like this is the first PR towards that end goal, you have in mind. The proposal can also include a short overview of the upcoming changes. |
|
+1 for design proposal |
sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerFactory.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/PlannerFactory.java
Outdated
Show resolved
Hide resolved
|
There are requests for a design doc. The description of this PR attempts to lay out the explanation (though it's very terse and might be hard to understand if you aren't familiar with the problems and work that has been done). The basic part of this PR is that it is
The new method "decouples" planning of native queries into a 2-phase process
The DAG is returned from the SQL planner and then that DAG is used to build the native query. The reason that we are doing this is to make it easier to iterate and work with SQL for Druid. Druid's native query planning follows a relatively uncommon pattern of building up the physical execution plan as the output of the volcano planner. On the plus side, this was done so that we could leverage the volcano planner to explore different methods of physical planning in case something wasn't natively plannable. On the negative side, this makes it incredibly difficult to actually work with the SQL planning as it exists in Druid. This is seen through the following symptoms:
The change itself is not impacting the current behavior and is being done as an add-on. Once we have it passing all of the existing tests, then we will be able to switch the default planning mode over to it and deprecate the old native query planning. Given that this isn't actually changing behaviors and that it's not fundamentally changing any sort of public API, I believe it should be safe to iterate on this work quite rapidly. |
|
I validated that this won't impact the current code paths, it requires a specific context parameter to be set to opt-in and will not impact any current behaviors. As such, I'm going to go ahead and merge this so that we can get it in and keep iterating on making more tests pass. |
|
Those problems are indeed relatable. I am wondering if the new design will help with some other problems that you didn't intend to solve immediately. The one thing I had in mind was plan caching. The plan generation can be costly and sometimes forces us to switch to native queries in favor of high QPS. I am wondering if that will be easier to do in this new de-coupled mode where druid conversion happens after the DAG is generated. |
|
@rohangarg - also what does that intermediate DAG look like? can you post an example? |
|
@abhishekagarwal87 : An example of the intermediate DAG looks like : This is for the sql query : |
This patch starts the process of decoupling logical planning and native query generation in SQL planning. Currently, the DAG optimization and the native query generation are both done simultaneously in the planner via calcite optimization and conversion rules.
That coupling leads to difficulty in obtaining a logical DAG from the planner if needed, as well as the debugging and logging for erroneous queries also becomes very difficult to do or comprehend.
The patch decouples the two things by allowing logical planning to happen via the calcite planner and let it generate a DAG which contains
RelNodenodes as per a defined logical Druid convention. Post that, the DAG is converted to a native query through a visitor over the producedRelNodetree.The patch is put behind a feature flag
plannerStrategywhich can have two modesCOUPLEDandDECOUPLED. Currently, the default mode isCOUPLEDwhich means that the mode of planning currently is same as before.The new planning mode is enabled via setting the mode as
DECOUPLED.Further, the feature is WIP in the sense that some operators (like join/window/union) haven't been rewritten to allow planning through the new mechanism. Also, in the currently converted operators (scan/filter/project/sort/aggregate), there are a few test failures in the corner cases. Both of those things are planned to be fixed in future patches for the feature.