-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SQL in Pinot. #4219
Comments
Suggest changing Enhanced Broker Request to Pinot Query to make them totally separate from each other. No need to nest them IMO. For steps of migration:
|
@Jackie-Jiang yes, I would love to do this and this was how I started. The problem was that brokerrequest was very widely used in the code and it was very hard to replace it with pinot query in one shot. That's why we put pinotquery inside brokerrequest which kind of makes sense if you think request consisting of query, debugoptions etc. One key goal was to never have something like this in the query execution code.
If we keep old fields (selection, aggInfoList etc) and pinotQuery in the brokerRequest, we can migrate to pinotQuery in steps - operator by operator. For e.g. solve only the selection operator then transform operator and so on. I agree with you only doubling the size of brokerRequest until we migrate the execution layer to use brokerrequest.pinotquery at which point we can delete the references to Selection, etc. I don't think this will add significant overhead in terms of network or cpu usage. We have the validation flag to test this easily. Final Structure of BrokerRequest will be as follows
|
@kishoreg Short term wise, we can nest PinotQuery into BrokerRequest so we can support the functionalities step by step if the overhead is not an issue. Long term wise, I'll still suggest replacing BrokerRequest with PinotQuery once all the functionalities are supported. enableTrace should be inside the debugOptions, and requestOptions should be part of queryOptions from PinotQuery. |
@Jackie-Jiang, yes once we migrate the execution layer to work on PinotQuery, we can definitely move over to PinotQuery if needed. The way I see it is broker takes in a bunch of request params - pql, enableTrace, debugOptions and we will add few more things in the future. The brokerRequest we have today makes sense because it captures everything. The only difference is that while enableTrace and debugOptions have 1:1 mapping in brokerRequest, pql is split into many different fields. Adding PinotQuery will encapsulate all elements of the pql/sql within it. Anyways, we can revisit this when we refactor the execution layer to use PinotQuery. |
Hi here, I was mostly a user for pinot but pretty new to the development - really glad to see that Pinot is adopting ANSI SQL through Calcite. One question: based on the design description, seems like only the parser phase of the Calcite project is being considered here (Stringify SQL -> SQLNode); is there any plan to consider adopting more of the Calcite capacity, e.g. the optimizer? |
@kishoreg - I missed seeing this issue earlier. It will be good to see the reasons for going the Calcite route. Based on internal discussions, it appears that leveraging Calcite for just parsing wouldn't yield a lot of benefit and infact might be harder to maintain than the corresponding antlr grammar based one. Also, it will be good define how close to sql compliance we want to get and how PQL will evolve - will everything have to fit into sql parlance? |
@walterddr the plan is to only use the parser phase of calcite. I don't see using the optimizers in calcite will provide any value because of the following
|
@sunithabeeram Yes, we are planning to use calcite just for parsing. I had the same opinion but I realized that it is more flexible and complete than the PQL grammar. Every feature we need to add in Pinot, we end up touching the grammar, the parser, and the execution layer. For e.g. supporting regex, expression, alias, map syntax etc. It's just a distraction and additional code to maintain. |
Thanks for the explanation @kishoreg . Yeah I agree that many of the optimizations might not work for Pinot and can understand why this could be a good idea to separate the grammar changes from the other core changes. One of advantages, as you also mentioned, is the optimizer's capability to simplify and optimize logical plan. This might open up some possibilities when writing standard SQL: for example, nested queries that are more readable, but ultimately can be compiled as a simple query. Regarding metadata support, I wounldn't say there's no support on calcite but I think this is definitely a limitation. Similar problem has been raised in Flink's calcite metadata handler as well: https://issues.apache.org/jira/browse/FLINK-11822. |
@walterddr I looked at the Jira link. If my understanding is right, I think the metadata it refers to is global and not per segment. In the case of Pinot, the physical plan is optimized on a per segment basis based on its metadata (cardinality, indexing, star-tree, dictionary etc) Having said that, in future if we start supporting nested queries and look up joins, we might consider using the optimizers that come with calcite. |
Thanks @kishoreg, I am not familiar with the usage of per-segment data stats in Calcite and not sure if that can be easily supported. |
@kishoreg @Jackie-Jiang ,I am also a pinot user at present. When will this be online? I think it's great and I'm looking forward to it |
I believe using Calcite can prove to be useful in the long run as we converge closer to standard SQL support and achieving parity with general OLAP systems. Calcite's rich grammar, parser, semantic analysis and logical planning optimization rules will all help towards Pinot supporting a wide variety of complex SQL queries, including nested queries, query rewriting, materialized view substitution etc. I say logical planning because physical planning or execution planning is something that Pinot will have to do because the standard operator tree will then be optimized and divided into per segment execution fragment. As part of SQL support, what do people think about supporting some flavor of distributed joins? especially broadcast join which can be very helpful for large star schema joins having a large fact table and multiple smaller dimension tables and we can broadcast the smaller dimension table to each node and do a local in-memory hash join. |
We have deliberately stayed away from implementing joins within Pinot. Having said that, lookup join is definitely on the table. If you are interested in contributing, please create an issue and start the discussion. |
Can we use Calcite to support adding, subtracting, multiplying and dividing aggregates,For example: SUM(a) /1024, SUM(a) /SUM(b) /1024, (SUM(a) * 1000) /SUM(b), etc. I think these functions are very useful for connecting to open source visualized superset, grafana, redash and metabase |
Proposed the changes in execution layer in order to simplify the sql rollout. |
…filter (#6056) Add `FilterOptimizer` which supports optimizing query filter from both `BrokerRequest` and `PinotQuery`. `FilterOptimizer` will replace `FilterQueryTreeOptimizer` which only works on `BrokerRequest` query filter. In order to fully support SQL (#4219), the query optimizer should perform the same optimization to `PinotQuery` as `BrokerRequest`. Add `FlattenAndOrFilterOptimizer` to replace `FlattenNestedPredicatesFilterQueryTreeOptimizer`, and removes the limitation of flatten depth Add `MergeEqInFilterOptimizer` to replace `MultipleOrEqualitiesToInClauseFilterQueryTreeOptimizer` Add `MergeRangeFilterOptimizer` to replace `RangeMergeOptimizer`, and supports merging range for all single-value columns (based on schema) This PR only adds the new code. The following PR will wire the new code and remove the old code.
This is done and is available since 0.3.0 |
While existing PQL is almost similar to SQL, it has some nuances that make it difficult for users to query pinot. Our eventual goal should be able to support most of the sql features on a single table (excluding joins). This will allow tools such as superset, grafana, etc to be work seamlessly on top of Pinot.
Currently, adding a new feature requires us to change pql grammar, broker request, and the execution layer. This means there is a high turn around time for each feature.
The right end state would be - "change only the execution layer to add new features - grammar and broker request should be complete and flexible". I am proposing we move to calcite and leverage the full sql support and parser that comes with Calcite.
Proposal
For backward compatibility, populate old objects in BrokerRequest (selection aggregation groupby etc) from pinot query.
The text was updated successfully, but these errors were encountered: