Add pinot-query-planner module #8340

walterddr · 2022-03-11T21:32:45Z

Summary

initial commit for the multi-stage query planner.

Design Doc

https://docs.google.com/document/d/10-vL_bUrI-Pi2oYudWyUlQl9Kf0cLrW-Z8hGczkCPik/edit#heading=h.f7j5q82j0slb

TODO

create a better performance serialization format for StagePlan
Address type system and parser/validator TODOs to support all existing Pinot SQL.

pinot-core/src/main/java/org/apache/pinot/core/routing/RouteManager.java

pinot-query-planner/src/main/java/org/apache/pinot/query/catalog/PinotCatalog.java

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/nodes/MailboxReceiveNode.java

mcvsubbu · 2022-03-11T23:44:13Z

Please add a link to the design doc in your PRs, thanks.

codecov-commenter · 2022-03-16T16:33:46Z

Codecov Report

❗ No coverage uploaded for pull request base (multi_stage_query_engine@66de3ba). Click here to learn what that means.
The diff coverage is n/a.

@@                     Coverage Diff                     @@
##             multi_stage_query_engine    #8340   +/-   ##
===========================================================
  Coverage                            ?   30.47%           
===========================================================
  Files                               ?     1642           
  Lines                               ?    86111           
  Branches                            ?    12999           
===========================================================
  Hits                                ?    26246           
  Misses                              ?    57485           
  Partials                            ?     2380

Flag	Coverage Δ
integration1	`28.63% <0.00%> (?)`
integration2	`27.20% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 66de3ba...f8dab0c. Read the comment docs.

pinot-broker/src/main/java/org/apache/pinot/broker/routing/RoutingManager.java

pinot-core/src/main/java/org/apache/pinot/core/routing/RouteManager.java

pinot-query-planner/src/main/java/org/apache/calcite/jdbc/CalciteSchemaBuilder.java

siddharthteotia · 2022-03-16T21:10:00Z

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

+
+
+/**
+ * The {@code QueryEnvironment} contains the main entrypoint for query planning.


Just for completeness and readers to be aware, can you also add info into javadoc on the mapping between QueryEnvironment and a SQL query ? Is this created on a per query basis ?

this actually is a good question on a 2nd thought. some of the components are actually not reusable. let me rethink this design

Discussed offline with @walterddr - This is global but there is serialization needed at the Calcite planner level. So there is still a major TODO to make this scalable. Probably it is ok to instantiate the planner on each call (and eat that cost) to planQuery() and still keep QueryEnvironment global to avoid catalog instantiation per query

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

siddharthteotia · 2022-03-17T00:10:53Z

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

+    SqlNode validated = _validator.validate(parsed);
+    if (null == validated || !validated.getKind().belongsTo(SqlKind.QUERY)) {
+      throw new IllegalArgumentException(
+          String.format("unsupported SQL query, cannot validate out a valid sql from:\n%s", parsed));


Can we include the original SQL query as well ?

original query will be wrapped in upper-level planQuery

Basically I wanted to include the original string SQL query in string. May be SqlNode.toString() takes care of that ?

yes. one will not directly call validate. when it goes throw planQuery it will print the original query string. b/c exception will be caught and rethrow with the sqlString attached to the message.

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

siddharthteotia · 2022-03-17T00:29:05Z

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

+ *
+ * <p>It provide the higher level entry interface to convert a SQL string into a {@link QueryPlan}.
+ */
+public class QueryEnvironment {


Should this be called as PinotQueryPlanner or QueryPlanner since it is not just an environment holder and does the entire planning ?

the reason why I called it environment is because it holds some stateful info during the planning and it is not stateless. but let me think more on the naming for this (and javadoc)

pinot-query-planner/src/main/java/org/apache/pinot/query/context/PlannerContext.java

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

siddharthteotia · 2022-03-17T00:41:18Z

pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeFactory.java

+        return this.createSqlType(SqlTypeName.VARCHAR);
+      case BYTES:
+        return this.createSqlType(SqlTypeName.VARBINARY);
+      case JSON:


JSON is a recognized type in Pinot so we should not throw Unsup for that ?

JSON will also be implemented as Struct type i suppose.

TODO is fine for now I guess. We must handle JSON since it's a first class type in Pinot now

siddharthteotia · 2022-03-17T00:41:44Z

pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeFactory.java

+    return builder.build();
+  }
+
+  private RelDataType toRelDataType(FieldSpec fieldSpec) {


How do we handle/factor array / MV ?

calcite supports Map, Array and Struct type. but we are throwing here until operator support is added.

siddharthteotia · 2022-03-17T01:01:25Z

pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeFactory.java

+/**
+ * Extends Java-base TypeFactory from Calcite.
+ */
+public class TypeFactory extends JavaTypeFactoryImpl {


Not sure why we need to extend from JavaTypeFactoryImpl instead of RelDataTypeFactory

The interface is experimental and subject to change in future as per Calcite.

JavaTypeFactory is not experimental but the purpose of that interface seems to be to map a type / recordType to a java class ? Why do we need that model ?

this is for conveniency on implementing the JavaTypeFactory. but yes we can definitely create our own clean impl in future optimization

May be add a TODO here ?

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

pinot-query-planner/src/main/java/org/apache/pinot/query/catalog/PinotCatalog.java

pinot-common/src/main/java/org/apache/pinot/common/config/provider/TableCache.java

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

siddharthteotia · 2022-03-17T07:48:19Z

pinot-query-planner/src/main/java/org/apache/pinot/query/validate/Validator.java

+public class Validator extends SqlValidatorImpl {
+
+  public Validator(SqlOperatorTable opTab, SqlValidatorCatalogReader catalogReader, RelDataTypeFactory typeFactory) {
+    super(opTab, catalogReader, typeFactory, Config.DEFAULT);


The current CalciteSqlParser code in Pinot uses SqlConformanceEnum.BABEL and IIUC it was done during migration from PQL to SQL to relax few things on syntax and semantics.

Should we use BABEL here as well instead of DEFAULT ?

given PQL is deprecated. i dont think we should use BABEL solely because of this.

What I meant to say was after we migrated to SQL, we use BABEL conformance in CalciteSqlParser code.

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/nodes/StageNode.java

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java

siddharthteotia · 2022-03-17T15:33:09Z

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/nodes/MailboxSendNode.java

+import org.apache.calcite.rel.RelDistribution;
+
+
+public class MailboxSendNode extends AbstractStageNode {


Based on the discussion thread in design doc, I think we should have an abstraction of ExchangeNode. ExchangeNode should encapsulate sender and receiver node.

Similarly, there should be an abstraction for sender and receiver themselves.

Something like following....

Exchange

BroadcastExchange

SingleMergeExchange

HashPartitionExchange

Sender

BroadcastSender

SingleSender

HashPartitionSender

Receiver

OrderedReceiver

UnorderedReceiver

BroadcastExchange encapsulates

BroadcastSender

SomeReceiver

HashPartitionExchange encapsulates

HashPartitionSender

SomeReceiver

So ideally MailboxSend and MailboxReceive should be modeled as sender and receiver abstractions respectively as opposed to concrete implementations imo

Yes I agree that we need to add more attributes to the stage nodes.
should we consider start simple and add attributes to the ExchangeNode?

to me the only thing we need to separate is SendExchangeNode and ReceiveExchangeNode. the items you mentioned above can be inferred by the Exchange.Type

e.g. a SendExchangeNode with Exchange.Type == BROADCAST result in a broadcastSender

benefit of having this is we can add more attributes to the ExchangeNode without exploding the combination of possible attributes. say later we want to have a HashPartitionButOrderedWithinPartitionSender

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/nodes/TableScanNode.java

siddharthteotia · 2022-03-17T18:45:17Z

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/nodes/AbstractStageNode.java

+
+public abstract class AbstractStageNode implements StageNode {
+
+  protected final String _stageId;


int (to reduce heap usage) ? or do we think this is arbitrary bytes and String is better ?

Discussed offline - we will consider this during serialization design

adding TODO to the PR description

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/nodes/AbstractStageNode.java

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/StageNodeConverter.java

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/StagePlanner.java

siddharthteotia · 2022-03-17T21:31:03Z

pinot-query-planner/src/main/java/org/apache/pinot/query/planner/StagePlanner.java

+    return new QueryPlan(_queryStageMap, _stageMetadataMap);
+  }
+
+  // non-threadsafe


I had a thread safety related question on QueryEnvironment. If that class is instantiated per compiled query, then it implies calls to StagePlanner should be thread safe ?

the non-threadsafe-ness comes from more on the calcite's planner (and RelNode). my interpretation for this is - there cannot be 2 queries in planning at the same time. but the planner can be reused.

- fix calcite upgrade compilation issue - fix query compilation runtime after calcite 1.29 upgrade - linter

siddharthteotia · 2022-03-24T00:30:12Z

pinot-query-planner/src/main/java/org/apache/pinot/query/catalog/PinotCatalog.java

+  @Override
+  public Expression getExpression(@Nullable SchemaPlus parentSchema, String name) {
+    requireNonNull(parentSchema, "parentSchema");
+    return Schemas.subSchemaExpression(parentSchema, name, getClass());


We have a flat namespace as of now so we don't support sub-schema and the calcite root schema is created with empty name so what is this code doing with sub-schema ?

this code is a default implementation. in our case it is as good as returning null since we don't support it.

siddharthteotia · 2022-03-24T00:32:44Z

pinot-query-planner/src/main/java/org/apache/pinot/query/catalog/PinotCatalog.java

+
+  @Override
+  public RelProtoDataType getType(String name) {
+    return null;


@walterddr For this and all below functions, we should ideally throw UnsupOperationException instead of returning null or empty list as we probably can't predict from where and all calcite planning code will call them and if it does, better to fail the compilation through this exception

according to https://calcite.apache.org/javadocAggregate/org/apache/calcite/schema/Schema.html. calcite excepts null if not found. same logic exist in its default https://calcite.apache.org/javadocAggregate/org/apache/calcite/schema/impl/AbstractSchema.html

siddharthteotia · 2022-03-24T01:11:41Z

pinot-query-planner/src/main/java/org/apache/pinot/query/catalog/PinotCatalog.java

+
+  @Override
+  public Schema getSubSchema(String name) {
+    return null;


@walterddr It looks like Calcite doesn't expect this to be null ?

As per calcite docs, during query validation, calcite will call getSubSchema() on the registered root schema and then on the retrieved Schema, it will call getTable(schemaName) to get Table / PinotTable ?

Our root schema should be PinotCatalog but based on the above, I wonder how query validation is going to work when this function is invoked ?

On the other hand, if we create a dummy root schema with exactly one child / sub-schema as PinotCatalog, then this seems to work

dummyRootSchema.getSubSchema("Pinot")

returns instance of PinotCatalog

catalog.getTable(tableName)

returns corresponding PinotTable

this is only true if we registered the user-facing Schema class and hoisted the contents up to CalciteSchema

for example, the user overrided schema class contains tables and user-defined functions, one can register those by extract all the tables into CalciteSchema.tableMap, functions into CalciteSchema.functionMap, etc.

This is not ideal for pinot because PinotCatalog is actually backed by TableCache, which is sort of a ever changing list of tables.
Therefore we use the SimpleCalciteSchema which doesn't go through the protected member variables inside CalciteSchema, instead directly falls through to the user-facing schema to acquire the data.
e.g. instead of getTable() { return tableMap.get(tableName); } it instead directly calls the Schema.getTable().

This way we dont have to create a calcite schema object everytime a new query comes in. one of the reason why we can have one query environment and reuse it on multiple queries.

obviously there's a drawback, if the schema/table changes in the middle of planning there potentially can be a race condition. but IMO we are better of in this case fail the query and retry since schema/table config change doens't happen so often --> the overhead to recreate an entire planner context takes more valuable E2E latency overhead.

siddharthteotia

lgtm

* add pinot-query-planner - fix calcite upgrade compilation issue - fix query compilation runtime after calcite 1.29 upgrade - linter * address diff comments and add more TODOs Co-authored-by: Rong Rong <rongr@startree.ai>

yupeng9 reviewed Mar 11, 2022

View reviewed changes

walterddr changed the title ~~query planner~~ Add pinot-query-planner module Mar 14, 2022

walterddr force-pushed the pr_query_planner branch from bf796b8 to 105a255 Compare March 16, 2022 15:09

siddharthteotia reviewed Mar 16, 2022

View reviewed changes

pinot-broker/src/main/java/org/apache/pinot/broker/routing/RoutingManager.java Outdated Show resolved Hide resolved

siddharthteotia reviewed Mar 16, 2022

View reviewed changes

pinot-broker/src/main/java/org/apache/pinot/broker/routing/RoutingManager.java Outdated Show resolved Hide resolved

siddharthteotia reviewed Mar 16, 2022

View reviewed changes

pinot-core/src/main/java/org/apache/pinot/core/routing/RouteManager.java Outdated Show resolved Hide resolved

siddharthteotia reviewed Mar 16, 2022

View reviewed changes

pinot-query-planner/src/main/java/org/apache/calcite/jdbc/CalciteSchemaBuilder.java Show resolved Hide resolved

siddharthteotia reviewed Mar 16, 2022

View reviewed changes

pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java Show resolved Hide resolved