SPARK-2180: support HAVING clauses in Hive queries #1136

willb · 2014-06-19T19:02:11Z

This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to HiveQuerySuite.

AmplabJenkins · 2014-06-19T19:04:56Z

Merged build triggered.

AmplabJenkins · 2014-06-19T19:05:06Z

Merged build started.

AmplabJenkins · 2014-06-19T20:21:31Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-19T20:21:31Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15917/

rxin · 2014-06-19T20:30:00Z

Any idea why the having test from Hive is not runnable?

willb · 2014-06-19T20:58:14Z

@rxin, I'm not 100% sure but I think it's a problem with local map/reduce (the stack trace isn't too informative, but it's the same as the one for tests that are blacklisted due to missing local map/reduce).

I have another commit to push here (adding a semantic exception when HAVING is specified without GROUP BY and test coverage for same).

AmplabJenkins · 2014-06-19T21:19:57Z

Merged build triggered.

AmplabJenkins · 2014-06-19T21:20:06Z

Merged build started.

AmplabJenkins · 2014-06-19T21:21:33Z

Merged build finished.

AmplabJenkins · 2014-06-19T21:21:33Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15925/

AmplabJenkins · 2014-06-19T21:39:56Z

Merged build triggered.

AmplabJenkins · 2014-06-19T21:40:01Z

Merged build started.

AmplabJenkins · 2014-06-19T22:53:34Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-19T22:53:34Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15928/

rxin · 2014-06-20T00:50:13Z

Thanks, @willb. There is at least one problem I found. - I think you'd need to add a cast to the having expression. Otherwise try run the following:
select key, count(*) c from src group by key having c

In Hive this returns nothing, but in Spark SQL with this patch it throws a runtime exception failing to cast integer to boolean.

rxin · 2014-06-20T01:03:47Z

To be more specific, I think you can always add a cast that cast the having expression to boolean, and then we have SimplifyCasts in the optimizer that would remove unnecessary casts.

willb · 2014-06-20T01:11:18Z

Thanks for the catch, @rxin! I'll make the change and add tests for it.

willb · 2014-06-20T02:49:14Z

So I've added a cast in cases in which non-boolean expressions are supplied to having expressions. It appears that Cast(_, BooleanType) isn't idempotent, though -- if you apply it to a Boolean (say, x > 4), it will translate that to NOT ((x > 4) = 0). This seems like a bug, but it's possible that I'm missing the reason why it should work that way. Should I change Cast so that casting an X to X is a no-op?

(Checking the type of a variable during parse doesn't work, so I wind up with a different exception in examples like the one you posted. I'll either need to fix the behavior of Cast or delay adding the cast until I have type information.)

rxin · 2014-06-20T03:41:54Z

That's definitely a bug - I will take a look at it later.

willb · 2014-06-20T03:53:21Z

Thanks! I'm happy to put together a preliminary patch as well, but probably won't be able to take a look until tomorrow morning.

rxin · 2014-06-20T04:53:45Z

I found the issue and fixed it. Will push out a pull request soon.

If you can just add the boolean cast (always add it - no need to check if the type is already boolean since once I fix the bug, the extra cast on boolean value will be removed), that'd be great.

rxin · 2014-06-20T05:37:05Z

Here's the patch: #1144

rxin · 2014-06-20T07:11:18Z

BTW I really want this to go into 1.0.1, which will probably have a release candidate soon. So if you have a chance to rebase your PR and add the cast, please do. Thanks a lot, @willb!

This is a simple test for HAVING clauses.

AmplabJenkins · 2014-06-20T13:14:58Z

Merged build triggered.

AmplabJenkins · 2014-06-20T13:15:07Z

Merged build started.

willb · 2014-06-20T13:24:49Z

Thanks for the quick review and patch, @rxin!

AmplabJenkins · 2014-06-20T14:32:04Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-06-20T14:32:04Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15959/

yhuai · 2014-06-20T19:08:04Z

I tried having.q in hive, I got an error on running SELECT key FROM src GROUP BY key HAVING max(value) > "val_255". The reason is that the output of an Aggregate only has selectExpressions.

rxin · 2014-06-20T20:38:53Z

I'm going to merge this in master & branch-1.0. I will create a separate ticket to track progress on HAVING. Basically there are two things missing:

HAVING without GROUP BY should just become a normal WHERE
HAVING should be able to contain aggregate expressions that don't appear in the aggregation list. This test contains that: https://github.com/apache/hive/blob/trunk/ql/src/test/queries/clientpositive/having.q

This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to `HiveQuerySuite`. Author: William Benton <willb@redhat.com> Closes #1136 from willb/SPARK-2180 and squashes the following commits: 3bbaf26 [William Benton] Added casts to HAVING expressions 83f1340 [William Benton] scalastyle fixes 18387f1 [William Benton] Add test for HAVING without GROUP BY b880bef [William Benton] Added semantic error for HAVING without GROUP BY 942428e [William Benton] Added test coverage for SPARK-2180. 56084cc [William Benton] Add support for HAVING clauses in Hive queries. (cherry picked from commit 171ebb3) Signed-off-by: Reynold Xin <rxin@apache.org>

willb · 2014-06-20T20:43:07Z

@rxin, re: the former, seems like most implementations signal this as an error.

rxin · 2014-06-20T20:54:04Z

There are databases that support that, and it seems to me a very simple change (actually just removing the check code you added is probably enough).

rxin · 2014-06-20T20:54:33Z

BTW two follow up tickets created:

https://issues.apache.org/jira/browse/SPARK-2225

https://issues.apache.org/jira/browse/SPARK-2226

Let me know if you'd like to work on them.

willb · 2014-06-20T20:55:28Z

OK, I wasn't sure if strict Hive compatibility was the goal. I'm happy to take these tickets. Thanks again!

rxin · 2014-06-20T21:02:54Z

I actually did 2225 already. I will assign 2226 to you. Thanks!

This PR extends Spark's HiveQL support to handle HAVING clauses in aggregations. The HAVING test from the Hive compatibility suite doesn't appear to be runnable from within Spark, so I added a simple comparable test to `HiveQuerySuite`. Author: William Benton <willb@redhat.com> Closes apache#1136 from willb/SPARK-2180 and squashes the following commits: 3bbaf26 [William Benton] Added casts to HAVING expressions 83f1340 [William Benton] scalastyle fixes 18387f1 [William Benton] Add test for HAVING without GROUP BY b880bef [William Benton] Added semantic error for HAVING without GROUP BY 942428e [William Benton] Added test coverage for SPARK-2180. 56084cc [William Benton] Add support for HAVING clauses in Hive queries.

…ute URI: ${system:user.name%7D (apache#1136) Co-authored-by: Egor Krivokon <>

willb added 6 commits June 20, 2014 07:24

Add support for HAVING clauses in Hive queries.

56084cc

Added test coverage for SPARK-2180.

942428e

This is a simple test for HAVING clauses.

Added semantic error for HAVING without GROUP BY

b880bef

Add test for HAVING without GROUP BY

18387f1

scalastyle fixes

83f1340

Added casts to HAVING expressions

3bbaf26

asfgit closed this in 171ebb3 Jun 20, 2014

udaynpusa pushed a commit to mapr/spark that referenced this pull request Jan 30, 2024

MapR [SPARK-1216] java.net.URISyntaxException: Relative path in absol…

f71f47f

…ute URI: ${system:user.name%7D (apache#1136) Co-authored-by: Egor Krivokon <>

SPARK-2180: support HAVING clauses in Hive queries #1136

SPARK-2180: support HAVING clauses in Hive queries #1136

Conversation

willb commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

rxin commented Jun 19, 2014

willb commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

AmplabJenkins commented Jun 19, 2014

rxin commented Jun 20, 2014

rxin commented Jun 20, 2014

willb commented Jun 20, 2014

willb commented Jun 20, 2014

rxin commented Jun 20, 2014

willb commented Jun 20, 2014

rxin commented Jun 20, 2014

rxin commented Jun 20, 2014

rxin commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

willb commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

AmplabJenkins commented Jun 20, 2014

yhuai commented Jun 20, 2014

rxin commented Jun 20, 2014

willb commented Jun 20, 2014

rxin commented Jun 20, 2014

rxin commented Jun 20, 2014

willb commented Jun 20, 2014

rxin commented Jun 20, 2014