-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-17237][SQL] Remove backticks in a pivot result schema #14812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #64432 has finished for PR 14812 at commit
|
|
Test build #68844 has finished for PR 14812 at commit
|
|
Test build #68860 has finished for PR 14812 at commit
|
|
@gatorsmile Do u have time to check this? Thanks! |
|
Test build #71132 has finished for PR 14812 at commit
|
| limit2Df.select($"id")) | ||
| } | ||
|
|
||
| test("handle missing data after pivoting") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case name is misleading. Maybe just use the PR title here.
|
Sorry, I missed this ping. Could you fix the test case failure? Thanks! |
|
The fix looks good to me. We just need to resolve the test case failure. Thanks! |
|
okay, thanks! I'll check again soon |
|
Test build #71256 has finished for PR 14812 at commit
|
|
@gatorsmile okay, fixed. |
|
LGTM |
## What changes were proposed in this pull request?
Pivoting adds backticks (e.g. 3_count(\`c\`)) in column names and, in some cases,
thes causes analysis exceptions like;
```
scala> val df = Seq((2, 3, 4), (3, 4, 5)).toDF("a", "x", "y")
scala> df.groupBy("a").pivot("x").agg(count("y"), avg("y")).na.fill(0)
org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`y`)`;
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:134)
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:144)
...
```
So, this pr proposes to remove these backticks from column names.
## How was this patch tested?
Added a test in `DataFrameAggregateSuite`.
Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Closes #14812 from maropu/SPARK-17237.
(cherry picked from commit 5585ed9)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
|
Thanks! Merging to master/2.1. Could you please open a PR to backport it to 2.0? |
|
@maropu JIRA is down. Will update the JIRA later. |
|
okay, thanks! |
## What changes were proposed in this pull request?
Pivoting adds backticks (e.g. 3_count(\`c\`)) in column names and, in some cases,
thes causes analysis exceptions like;
```
scala> val df = Seq((2, 3, 4), (3, 4, 5)).toDF("a", "x", "y")
scala> df.groupBy("a").pivot("x").agg(count("y"), avg("y")).na.fill(0)
org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`y`)`;
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:134)
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:144)
...
```
So, this pr proposes to remove these backticks from column names.
## How was this patch tested?
Added a test in `DataFrameAggregateSuite`.
Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Closes apache#14812 from maropu/SPARK-17237.
## What changes were proposed in this pull request?
Pivoting adds backticks (e.g. 3_count(\`c\`)) in column names and, in some cases,
thes causes analysis exceptions like;
```
scala> val df = Seq((2, 3, 4), (3, 4, 5)).toDF("a", "x", "y")
scala> df.groupBy("a").pivot("x").agg(count("y"), avg("y")).na.fill(0)
org.apache.spark.sql.AnalysisException: syntax error in attribute name: `3_count(`y`)`;
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:134)
at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:144)
...
```
So, this pr proposes to remove these backticks from column names.
## How was this patch tested?
Added a test in `DataFrameAggregateSuite`.
Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
Closes apache#14812 from maropu/SPARK-17237.
What changes were proposed in this pull request?
Pivoting adds backticks (e.g. 3_count(`c`)) in column names and, in some cases,
thes causes analysis exceptions like;
So, this pr proposes to remove these backticks from column names.
How was this patch tested?
Added a test in
DataFrameAggregateSuite.