Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-15757: Allow EXISTS/NOT EXISTS correlated subquery with aggregates #2039

Merged
merged 5 commits into from Mar 10, 2021

Conversation

kasakrisz
Copy link
Contributor

What changes were proposed in this pull request?

  1. When transforming subquery AST to calcite plan check whether the subquery is a full aggregation: has aggregate functions without group by clause
  2. When building subquery expression return true literal if subquery is a full aggregation having EXISTS operator instead of RexSubQuery

Why are the changes needed?

To support queries like:

SELECT t1.bigint_col
FROM alltypestiny t1
WHERE EXISTS
  (SELECT SUM(smallint_col) AS int_col
   FROM alltypestiny
   WHERE t1.date_string_col = string_col AND t1.timestamp_col = timestamp_col)
GROUP BY t1.bigint_col

Does this PR introduce any user-facing change?

Yes. Current Hive implementation throws exception when compiling queries like the example above. With this patch query compiles and runs.

How was this patch tested?

mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=subquery_full_aggregate.q -pl itests/qtest -Pitests

@jcamachor
Copy link
Contributor

jcamachor commented Mar 5, 2021

I gave +1 based on the code but I have just seen there are some test failures. Please let me know when those are addressed and I can take a look again. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants