Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33131][SQL] Fix grouping sets with having clause can not resolve qualified col name #30029

Closed
wants to merge 1 commit into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Oct 13, 2020

What changes were proposed in this pull request?

Correct the resolution of having clause.

Why are the changes needed?

Grouping sets construct new aggregate lost the qualified name of grouping expression. Here is a example:

-- Works resolved by `ResolveReferences`
select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 1

-- Works because of the extra expression c1
select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1

-- Failed
select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1

It wroks with Aggregate without grouping sets through ResolveReferences, but Grouping sets not works since the exprId has been changed.

Does this PR introduce any user-facing change?

Yes, bug fix.

How was this patch tested?

add test.

SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY GROUPING SETS(t.c1) HAVING t.c1 = 1;
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY CUBE(t.c1) HAVING t.c1 = 1;
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY ROLLUP(t.c1) HAVING t.c1 = 1;
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY t.c1 HAVING t.c1 = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is just ensure we support it.

@ulysses-you
Copy link
Contributor Author

@maropu @cloud-fan do you have time to take a look ?

val exprMap = extraAggExprs.zip(
newChild.asInstanceOf[Aggregate].aggregateExpressions.takeRight(
extraAggExprs.length)).toMap
// We also add the original aggregateExpressions in Map to avoid extraAggExprs is empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong if extraAggExprs is empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we resolved the having condition but extraAggExprs is empty, that means the resolution is in original aggregateExpressions.

We need use original aggregateExpressions mapping having condition NamedExpression to the new since grouping sets constructAggregate will build the new alias for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. The comment says

        // Since the exprId of extraAggExprs will be changed in the constructed aggregate, and the
        // aggregateExpressions keeps the input order. So here we build an exprMap to resolve the
        // condition again.

I don't see why we need to include the original aggregateExpressions in exprMap.

Copy link
Contributor Author

@ulysses-you ulysses-you Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is right if we not return the resolved having condition when exprMap is empty.

This pr break the behavior that return it. If we not add the original aggregateExpressions, we can't mapping the resolved having condition since exprMap is empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make the comment clearer?

@@ -2315,7 +2315,8 @@ class Analyzer(
if (aggregateExpressions.nonEmpty) {
Some(aggregateExpressions.toSeq, transformedAggregateFilter)
} else {
None
// We need to return the resolved having condition
Some(Seq.empty, transformedAggregateFilter.asInstanceOf[Alias].child)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we don't trim the Alias here?



-- !query
SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY t.c1 HAVING t.c1 = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this test can pass even in master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this test is just for better coverage.

@SparkQA
Copy link

SparkQA commented Oct 14, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34354/

@SparkQA
Copy link

SparkQA commented Oct 14, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34354/

@SparkQA
Copy link

SparkQA commented Oct 14, 2020

Test build #129748 has finished for PR 30029 at commit d3893b2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

Seems this way will introduce unexpect thing.

We want to return the resolved having condition which is extra expression or exists in grouping expression. Current pr implementation will break such sql

select c1, count(*) as c2 from values(1, 2) as t(c1, c2) group by c1 having c1 = 1 and c2 = 1;

Let me find an another way to solve it.

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34398/

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34398/

@@ -470,7 +470,7 @@ class Analyzer(
*/
private def constructGroupByAlias(groupByExprs: Seq[Expression]): Seq[Alias] = {
groupByExprs.map {
case e: NamedExpression => Alias(e, e.name)()
case e: NamedExpression => Alias(e, e.name)(qualifier = e.qualifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this is a good catch! do we have this bug in 3.0 and 2.4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems it's a long time bug.

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Test build #129791 has finished for PR 30029 at commit a4c2028.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

GA passed, thanks, merging to master!

@cloud-fan cloud-fan closed this in 3ae1520 Oct 16, 2020
@cloud-fan
Copy link
Contributor

@ulysses-you can you send backport PRs for 3.0 and 2.4? thanks!

@ulysses-you
Copy link
Contributor Author

@cloud-fan sure will do it tomorrow.

dongjoon-hyun pushed a commit that referenced this pull request Oct 17, 2020
…resolve qualified col name

This is [#30029](#30029) backport for branch-3.0.

### What changes were proposed in this pull request?

Correct the resolution of having clause.

### Why are the changes needed?

Grouping sets construct new aggregate lost the qualified name of grouping expression. Here is a example:
```
-- Works resolved by `ResolveReferences`
select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 1

-- Works because of the extra expression c1
select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1

-- Failed
select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1
```

It wroks with `Aggregate` without grouping sets through `ResolveReferences`, but Grouping sets not works since the exprId has been changed.

### Does this PR introduce _any_ user-facing change?

Yes, bug fix.

### How was this patch tested?

add test.

Closes #30077 from ulysses-you/SPARK-33131-branch-3.0.

Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Oct 18, 2020
…resolve qualified col name

This is [#30029](#30029) backport for branch-2.4.

### What changes were proposed in this pull request?

Correct the resolution of having clause.

### Why are the changes needed?

Grouping sets construct new aggregate lost the qualified name of grouping expression. Here is a example:
```
-- Works resolved by `ResolveReferences`
select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 1

-- Works because of the extra expression c1
select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1

-- Failed
select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1
```

It wroks with `Aggregate` without grouping sets through `ResolveReferences`, but Grouping sets not works since the exprId has been changed.

### Does this PR introduce _any_ user-facing change?

Yes, bug fix.

### How was this patch tested?

add test.

Closes #30075 from ulysses-you/SPARK-33131-branch-2.4.

Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…resolve qualified col name

This is [apache#30029](apache#30029) backport for branch-3.0.

### What changes were proposed in this pull request?

Correct the resolution of having clause.

### Why are the changes needed?

Grouping sets construct new aggregate lost the qualified name of grouping expression. Here is a example:
```
-- Works resolved by `ResolveReferences`
select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 1

-- Works because of the extra expression c1
select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1

-- Failed
select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having t1.c1 = 1
```

It wroks with `Aggregate` without grouping sets through `ResolveReferences`, but Grouping sets not works since the exprId has been changed.

### Does this PR introduce _any_ user-facing change?

Yes, bug fix.

### How was this patch tested?

add test.

Closes apache#30077 from ulysses-you/SPARK-33131-branch-3.0.

Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@ulysses-you ulysses-you deleted the SPARK-33131 branch February 19, 2021 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants