Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8972][SQL]Incorrect result for rollup #7343

Closed
wants to merge 4 commits into from

Conversation

chenghao-intel
Copy link
Contributor

We don't support the complex expression keys in the rollup/cube, and we even will not report it if we have the complex group by keys, that will cause very confusing/incorrect result.

e.g. SELECT key%100 FROM src GROUP BY key %100 with ROLLUP

This PR adds an additional project during the analyzing for the complex GROUP BY keys, and that projection will be the child of Expand, so to Expand, the GROUP BY KEY are always the simple key(attribute names).

@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #37024 has finished for PR 7343 at commit c695760.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

cc @marmbrus @yhuai

@yhuai
Copy link
Contributor

yhuai commented Jul 13, 2015

Can you explain what is the problem in the JIRA?

@chenghao-intel
Copy link
Contributor Author

Thanks you @yhuai , I've updated the description of the bug, and also add more unit tests for the fixing.

@SparkQA
Copy link

SparkQA commented Jul 13, 2015

Test build #37112 has finished for PR 7343 at commit 34def69.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Jul 13, 2015

@chenghao-intel Thanks for the update. I find complex group by keys is a confusing term. If I understand correctly, we do not handle expressions other than AttributeReferences, right? Also, with your change, seems we will change the group by expressions (e.g. from key % 100 to key)? What will happen if an expression involves multiple columns (e.g. key1 + key2)?

@chenghao-intel
Copy link
Contributor Author

That's correct, complex group by keys is the expression other than AttributeReferences.

I am not just simply change the group by keys, but with an alias. e.g.

SELECT key+key FROM src GROUP BY key+key WITH ROLLUP 
==>
SELECT k FROM (SELECT key + key as k FROM src) t GROUP BY k WITH ROLLUP

nonAttributeGroupByExpression += alias
(other, alias.toAttribute) // (Aliased complex expression, the associated attribute)
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add toMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we want to keep the order group by expressions in the Aggregate operator, let's just keep it as it used to be.

@yhuai
Copy link
Contributor

yhuai commented Jul 13, 2015

oh, i see. Yeah, you are right.

Can we use another term for complex group by keys? It is confusing and something like key1 % 100 is not complex.

@chenghao-intel
Copy link
Contributor Author

@yhuai thanks for the reviewing, any better idea for the term complex group by keys? Or should I just call it non-attribute expressions?

@SparkQA
Copy link

SparkQA commented Jul 13, 2015

Test build #1050 has finished for PR 7343 at commit 34def69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37300 has finished for PR 7343 at commit 827873f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// find all of the non-attribute expressions in the GROUP BY keys
val nonAttributeGroupByExpressions = new ArrayBuffer[Alias]()

// The pair of (the non-attributes expression, associated attribute (alias))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not very accurate since you also have named expressions as keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will update it.

@yhuai
Copy link
Contributor

yhuai commented Jul 15, 2015

The fix looks good to me.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37313 has finished for PR 7343 at commit 1ebbb59.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

seems not related.
retest this please

@chenghao-intel
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #21 has finished for PR 7343 at commit 1ebbb59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 15, 2015

Test build #37340 has finished for PR 7343 at commit 1ebbb59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@chenghao-intel
Copy link
Contributor Author

@yhuai any more comments?

@yhuai
Copy link
Contributor

yhuai commented Jul 16, 2015

LGTM. I am merging it to master.

@asfgit asfgit closed this in e272123 Jul 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants