New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct)
is incorrect
#38917
[SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct)
is incorrect
#38917
Conversation
8ae25d5
to
016df1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM nice catch!
@@ -1134,6 +1134,11 @@ class DataFrameSuite extends QueryTest | |||
checkAnswer(approxSummaryDF, approxSummaryResult) | |||
} | |||
|
|||
test("SPARK-41391: Correct the output column name of groupBy.agg(count_distinct)") { | |||
val df = person.groupBy("id").agg(count_distinct(col("name"))) | |||
assert(df.columns === Array("id", "count(DISTINCT name)")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: does it make sense to compare with columns from the SQL example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, will update
|
need to take
|
this PR causes
I believe the analyzer need to be changed to fix this issue, let me close this PR and ping @cloud-fan and @viirya to take a look since I think it's related to #24482.
|
What changes were proposed in this pull request?
correct the output column name of
groupBy.agg(count_distinct)
Why are the changes needed?
before this PR:
[id: bigint, count(value): bigint]
after this PR:
[id: bigint, count(DISTINCT value): bigint]
Does this PR introduce any user-facing change?
the default column name changed
How was this patch tested?
added UT