New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IN_SUBQUERY support #6022
Add IN_SUBQUERY support #6022
Conversation
773e80a
to
bf921c8
Compare
bf921c8
to
670e3a9
Compare
670e3a9
to
440e54f
Compare
Why do we need a transform function to support subquery? Can we use the standard SQL syntax of providing the subquery inside parentheses to clearly distinguish between outer and inner queries. |
@siddharthteotia Good question. Currently Pinot does not support nested query or join (both query parser and engine), so we use transform function as a work-around to achieve the same semantic. In the future we might want to provide the native nested query and join support, or provide a translation layer where such queries can be rewritten into the format in this PR. The scope of supporting nested query and join is way larger than this PR, so right now we force users to use the transform function for the id_set subquery. |
2a02b47
to
6777f23
Compare
Codecov Report
@@ Coverage Diff @@
## master #6022 +/- ##
==========================================
+ Coverage 66.44% 73.07% +6.62%
==========================================
Files 1075 1236 +161
Lines 54773 58510 +3737
Branches 8168 8671 +503
==========================================
+ Hits 36396 42757 +6361
+ Misses 15700 12927 -2773
- Partials 2677 2826 +149
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
6777f23
to
594aeba
Compare
594aeba
to
84bb1c7
Compare
Description
Add
IN_SUBQUERY
transform function to supportIDSET
aggregation function as the subquery. The subquery is handled as a separate query on broker side.E.g. The following 2 queries can be combined into one query:
SELECT ID_SET(col) FROM table WHERE date = 20200901
SELECT DISTINCT_COUNT(col), date FROM table WHERE IN_ID_SET(col, '<serializedIdSet>') = 1 GROUP BY date
->
SELECT DISTINCT_COUNT(col), date FROM table WHERE IN_SUBQUERY(col, 'SELECT ID_SET(col) FROM table WHERE date = 20200901') = 1 GROUP BY date