Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17237][SPARK-17458][SQL][Backport-2.0] Preserve aliases that are given for pivot aggregations #16565

Closed
wants to merge 1 commit into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Jan 13, 2017

What changes were proposed in this pull request?

This pr is to preserve aliases that are given for pivot aggregations to solve the issue reported in SPARK-17237. This pivoting adds backticks (e.g. 3_count(`c`)) in column names and, in some cases,
thes causes analysis exceptions like;

scala> val df = Seq((2, 3, 4), (3, 4, 5)).toDF("a", "x", "y")
scala> df.groupBy("a").pivot("x").agg(count("y"), avg("y")).na.fill(0)
org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`y`)`;
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:134)
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:144)
...

So, this pr also removes these backticks from column names.

How was this patch tested?

Added a test in DataFrameAggregateSuite.

@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71281 has finished for PR 16565 at commit 875e2c0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Could you change the PR title to [SPARK-17237][SQL][Backport-2.0] Remove backticks in a pivot result schema

@maropu
Copy link
Member Author

maropu commented Jan 13, 2017

okay! I'm now looking into the test failure, so just a sec, thanks

@maropu maropu changed the title [SPARK-17237][SQL] Remove backticks in a pivot result schema [SPARK-17237][SQL][Backport-2.0] Remove backticks in a pivot result schema Jan 13, 2017
@SparkQA
Copy link

SparkQA commented Jan 13, 2017

Test build #71290 has finished for PR 16565 at commit e2c2fae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

I checked the change history. Actually, you also backported #15111. Could you please update your PR description and PR title?

@gatorsmile
Copy link
Member

LGTM except one comment

@maropu
Copy link
Member Author

maropu commented Jan 13, 2017

@gatorsmile oh, I see. Is it okay to mix this pr with the fix of #15111? Would it be better to backport #15111 first then, backport this?

@maropu
Copy link
Member Author

maropu commented Jan 15, 2017

@gatorsmile ping

@gatorsmile
Copy link
Member

I think it is fine to do it together. Basically, your PR is to fix the bug of #15111

@maropu
Copy link
Member Author

maropu commented Jan 15, 2017

okay! I'll update them

@maropu maropu changed the title [SPARK-17237][SQL][Backport-2.0] Remove backticks in a pivot result schema [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Alias specified for aggregates in a pivot are not honored Jan 15, 2017
@maropu maropu changed the title [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Alias specified for aggregates in a pivot are not honored [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Preserve aliases that are given for pivot aggregations Jan 15, 2017
@maropu
Copy link
Member Author

maropu commented Jan 15, 2017

@gatorsmile How about this fix? plz check this again?

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

asfgit pushed a commit that referenced this pull request Jan 15, 2017
…re given for pivot aggregations

## What changes were proposed in this pull request?
This pr is to preserve aliases that are given for pivot aggregations to solve the issue reported in `SPARK-17237`. This pivoting adds backticks (e.g. 3_count(\`c\`)) in column names and, in some cases,
thes causes analysis exceptions  like;
```
scala> val df = Seq((2, 3, 4), (3, 4, 5)).toDF("a", "x", "y")
scala> df.groupBy("a").pivot("x").agg(count("y"), avg("y")).na.fill(0)
org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`y`)`;
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:134)
  at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:144)
...
```
So, this pr also removes these backticks from column names.

## How was this patch tested?
Added a test in `DataFrameAggregateSuite`.

Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>

Closes #16565 from maropu/SPARK-17237-3.
@gatorsmile
Copy link
Member

Thanks! Merging to 2.0

Could you please close it?

@maropu
Copy link
Member Author

maropu commented Jan 15, 2017

Okay and thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants