Support SQL in Pinot. #4219

kishoreg · 2019-05-17T21:46:48Z

While existing PQL is almost similar to SQL, it has some nuances that make it difficult for users to query pinot. Our eventual goal should be able to support most of the sql features on a single table (excluding joins). This will allow tools such as superset, grafana, etc to be work seamlessly on top of Pinot.

Currently, adding a new feature requires us to change pql grammar, broker request, and the execution layer. This means there is a high turn around time for each feature.

The right end state would be - "change only the execution layer to add new features - grammar and broker request should be complete and flexible". I am proposing we move to calcite and leverage the full sql support and parser that comes with Calcite.

Proposal

Create PinotQuery IDL(thrift)- this should represent the pql/sql. Additional information such as debugoptions should not be part of this IDL
Embed PinotQuery inside BrokerRequest.
AST Parser to convert PQL to PinotQuery

For backward compatibility, populate old objects in BrokerRequest (selection aggregation groupby etc) from pinot query.

Change the execution layer to use BrokerRequest.pinotQuery
Introduce another endpoint that takes calcite sql and converts it into brokerrequest

Jackie-Jiang · 2019-05-17T22:37:21Z

Suggest changing Enhanced Broker Request to Pinot Query to make them totally separate from each other. No need to nest them IMO.

For steps of migration:

Implement PQL -> Pinot Query and Pinot Query -> Broker Request
Support Pinot Query in Execution Layer (probably through a separate endpoint)
Implement SQL -> Pinot Query
(Optional) Remove PQL and Broker Request

kishoreg · 2019-05-18T03:49:52Z

@Jackie-Jiang yes, I would love to do this and this was how I started. The problem was that brokerrequest was very widely used in the code and it was very hard to replace it with pinot query in one shot.

That's why we put pinotquery inside brokerrequest which kind of makes sense if you think request consisting of query, debugoptions etc.

One key goal was to never have something like this in the query execution code.

//use broker request
} else {
  //pinot query
}

If we keep old fields (selection, aggInfoList etc) and pinotQuery in the brokerRequest, we can migrate to pinotQuery in steps - operator by operator. For e.g. solve only the selection operator then transform operator and so on.

I agree with you only doubling the size of brokerRequest until we migrate the execution layer to use brokerrequest.pinotquery at which point we can delete the references to Selection, etc. I don't think this will add significant overhead in terms of network or cpu usage. We have the validation flag to test this easily.

Final Structure of BrokerRequest will be as follows

  PinotQuery pinotQuery;
  bool enableTrace;
  Map<String,String> requestOptions;
}```

Jackie-Jiang · 2019-05-18T22:22:51Z

@kishoreg Short term wise, we can nest PinotQuery into BrokerRequest so we can support the functionalities step by step if the overhead is not an issue. Long term wise, I'll still suggest replacing BrokerRequest with PinotQuery once all the functionalities are supported. enableTrace should be inside the debugOptions, and requestOptions should be part of queryOptions from PinotQuery.

kishoreg · 2019-05-21T16:45:28Z

@Jackie-Jiang, yes once we migrate the execution layer to work on PinotQuery, we can definitely move over to PinotQuery if needed.

The way I see it is broker takes in a bunch of request params - pql, enableTrace, debugOptions and we will add few more things in the future. The brokerRequest we have today makes sense because it captures everything. The only difference is that while enableTrace and debugOptions have 1:1 mapping in brokerRequest, pql is split into many different fields. Adding PinotQuery will encapsulate all elements of the pql/sql within it.

Anyways, we can revisit this when we refactor the execution layer to use PinotQuery.

walterddr · 2019-05-22T22:07:21Z

Hi here, I was mostly a user for pinot but pretty new to the development - really glad to see that Pinot is adopting ANSI SQL through Calcite.

One question: based on the design description, seems like only the parser phase of the Calcite project is being considered here (Stringify SQL -> SQLNode); is there any plan to consider adopting more of the Calcite capacity, e.g. the optimizer?

sunithabeeram · 2019-05-22T23:28:16Z

@kishoreg - I missed seeing this issue earlier. It will be good to see the reasons for going the Calcite route. Based on internal discussions, it appears that leveraging Calcite for just parsing wouldn't yield a lot of benefit and infact might be harder to maintain than the corresponding antlr grammar based one.

Also, it will be good define how close to sql compliance we want to get and how PQL will evolve - will everything have to fit into sql parlance?

kishoreg · 2019-05-23T00:30:11Z

@walterddr the plan is to only use the parser phase of calcite. I don't see using the optimizers in calcite will provide any value because of the following

Most of the optimizers in calcite optimize the logical plan (suitable for complex joins/nested queries etc). We cannot leverage them since Pinot does not support joins and nested queries
Pinot comes up with per segment optimized physical plan based on each segment's metadata. Calcite does not support this.

kishoreg · 2019-05-23T00:35:56Z

@sunithabeeram Yes, we are planning to use calcite just for parsing. I had the same opinion but I realized that it is more flexible and complete than the PQL grammar. Every feature we need to add in Pinot, we end up touching the grammar, the parser, and the execution layer. For e.g. supporting regex, expression, alias, map syntax etc. It's just a distraction and additional code to maintain.
We already tried converting sqlNode to BrokerRequest and it worked well for all the queries we have. The only exception is the options we have in Pinot and we will have to handle that separately.
Calcite is widely adopted and supporting those queries will allow us Pinot to be used easily with other UI tools.

walterddr · 2019-05-23T17:10:56Z

Thanks for the explanation @kishoreg . Yeah I agree that many of the optimizations might not work for Pinot and can understand why this could be a good idea to separate the grammar changes from the other core changes.

One of advantages, as you also mentioned, is the optimizer's capability to simplify and optimize logical plan. This might open up some possibilities when writing standard SQL: for example, nested queries that are more readable, but ultimately can be compiled as a simple query.

Regarding metadata support, I wounldn't say there's no support on calcite but I think this is definitely a limitation. Similar problem has been raised in Flink's calcite metadata handler as well: https://issues.apache.org/jira/browse/FLINK-11822.

kishoreg · 2019-05-24T17:31:54Z

@walterddr I looked at the Jira link. If my understanding is right, I think the metadata it refers to is global and not per segment. In the case of Pinot, the physical plan is optimized on a per segment basis based on its metadata (cardinality, indexing, star-tree, dictionary etc)

Having said that, in future if we start supporting nested queries and look up joins, we might consider using the optimizers that come with calcite.

walterddr · 2019-05-28T19:57:32Z

Thanks @kishoreg, I am not familiar with the usage of per-segment data stats in Calcite and not sure if that can be easily supported.
Thank you for the prompt clarification! they make perfect sense!

sunny19930321 · 2019-05-29T06:26:42Z

@kishoreg @Jackie-Jiang ,I am also a pinot user at present. When will this be online? I think it's great and I'm looking forward to it

siddharthteotia · 2019-06-27T06:17:53Z

I believe using Calcite can prove to be useful in the long run as we converge closer to standard SQL support and achieving parity with general OLAP systems. Calcite's rich grammar, parser, semantic analysis and logical planning optimization rules will all help towards Pinot supporting a wide variety of complex SQL queries, including nested queries, query rewriting, materialized view substitution etc.

I say logical planning because physical planning or execution planning is something that Pinot will have to do because the standard operator tree will then be optimized and divided into per segment execution fragment.

As part of SQL support, what do people think about supporting some flavor of distributed joins? especially broadcast join which can be very helpful for large star schema joins having a large fact table and multiple smaller dimension tables and we can broadcast the smaller dimension table to each node and do a local in-memory hash join.

kishoreg · 2019-06-27T15:17:33Z

We have deliberately stayed away from implementing joins within Pinot. Having said that, lookup join is definitely on the table. If you are interested in contributing, please create an issue and start the discussion.

sunny19930321 · 2019-06-28T08:09:31Z

Can we use Calcite to support adding, subtracting, multiplying and dividing aggregates,For example: SUM(a) /1024, SUM(a) /SUM(b) /1024, (SUM(a) * 1000) /SUM(b), etc. I think these functions are very useful for connecting to open source visualized superset, grafana, redash and metabase

xiangfu0 · 2019-11-26T20:15:57Z

Proposed the changes in execution layer in order to simplify the sql rollout.
https://docs.google.com/document/d/1uNIq0cybUtVtdtJ38-4ewFNEQorbg-2KYr-CMSj6H_8/edit?usp=sharing

…filter (#6056) Add `FilterOptimizer` which supports optimizing query filter from both `BrokerRequest` and `PinotQuery`. `FilterOptimizer` will replace `FilterQueryTreeOptimizer` which only works on `BrokerRequest` query filter. In order to fully support SQL (#4219), the query optimizer should perform the same optimization to `PinotQuery` as `BrokerRequest`. Add `FlattenAndOrFilterOptimizer` to replace `FlattenNestedPredicatesFilterQueryTreeOptimizer`, and removes the limitation of flatten depth Add `MergeEqInFilterOptimizer` to replace `MultipleOrEqualitiesToInClauseFilterQueryTreeOptimizer` Add `MergeRangeFilterOptimizer` to replace `RangeMergeOptimizer`, and supports merging range for all single-value columns (based on schema) This PR only adds the new code. The following PR will wire the new code and remove the old code.

kishoreg · 2020-10-03T16:00:19Z

This is done and is available since 0.3.0

s-sahay added the feature label Jun 4, 2019

kishoreg self-assigned this Jun 27, 2019

s-sahay added the In Progress This work is in progress. label Oct 18, 2019

Jackie-Jiang mentioned this issue Sep 24, 2020

Add FilterOptimizer which supports optimizing query filter from both BrokerRequest and PinotQuery #6056

Merged

kishoreg removed the In Progress This work is in progress. label Oct 3, 2020

kishoreg closed this as completed Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support SQL in Pinot. #4219

Support SQL in Pinot. #4219

kishoreg commented May 17, 2019

Jackie-Jiang commented May 17, 2019

kishoreg commented May 18, 2019 •

edited

Loading

Jackie-Jiang commented May 18, 2019

kishoreg commented May 21, 2019

walterddr commented May 22, 2019

sunithabeeram commented May 22, 2019

kishoreg commented May 23, 2019

kishoreg commented May 23, 2019

walterddr commented May 23, 2019

kishoreg commented May 24, 2019

walterddr commented May 28, 2019

sunny19930321 commented May 29, 2019

siddharthteotia commented Jun 27, 2019

kishoreg commented Jun 27, 2019

sunny19930321 commented Jun 28, 2019 •

edited

Loading

xiangfu0 commented Nov 26, 2019

kishoreg commented Oct 3, 2020

Support SQL in Pinot. #4219

Support SQL in Pinot. #4219

Comments

kishoreg commented May 17, 2019

Jackie-Jiang commented May 17, 2019

kishoreg commented May 18, 2019 • edited Loading

Jackie-Jiang commented May 18, 2019

kishoreg commented May 21, 2019

walterddr commented May 22, 2019

sunithabeeram commented May 22, 2019

kishoreg commented May 23, 2019

kishoreg commented May 23, 2019

walterddr commented May 23, 2019

kishoreg commented May 24, 2019

walterddr commented May 28, 2019

sunny19930321 commented May 29, 2019

siddharthteotia commented Jun 27, 2019

kishoreg commented Jun 27, 2019

sunny19930321 commented Jun 28, 2019 • edited Loading

xiangfu0 commented Nov 26, 2019

kishoreg commented Oct 3, 2020

kishoreg commented May 18, 2019 •

edited

Loading

sunny19930321 commented Jun 28, 2019 •

edited

Loading