Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Nov 23, 2022

What changes were proposed in this pull request?

This PR proposes to rename COLUMN_NOT_IN_GROUP_BY_CLAUSE to MISSING_AGGREGATION.

Also, improve its error message.

Why are the changes needed?

The current error class name and its error message doesn't illustrate the error cause and resolution correctly.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

./build/sbt “sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*”

@itholic
Copy link
Contributor Author

itholic commented Nov 24, 2022

Fixed the Python test first since we have unresolved discussion #38769 (comment)

@MaxGekk
Copy link
Member

MaxGekk commented Nov 28, 2022

@itholic Please, rebase on the recent master.

@MaxGekk
Copy link
Member

MaxGekk commented Nov 29, 2022

@itholic Could you fix the test failure. It seems it is related to your changes:

python/pyspark/sql/tests/pandas/test_pandas_udf_grouped_agg.py.test_invalid_args
"COLUMN_NOT_IN_GROUP_BY_CLAUSE" does not match "[MISSING_AGGREGATION] The non-aggregating expression "v" is based on columns which are not participating in the GROUP BY clause.
Add the columns or the expression to the GROUP BY, aggregate the expression, or use "any_value("v")" if you do not care which of the values within a group is returned.;
Aggregate [id#668L], [id#668L, plus_one(v#674)#687 AS plus_one(v)#688]


with QuietTest(self.sc):
with self.assertRaisesRegex(AnalysisException, "nor.*aggregate function"):
with self.assertRaisesRegex(AnalysisException, "[MISSING_AGGREGATION]"):
Copy link
Contributor Author

@itholic itholic Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should test this with similar logic to checkError in Scala side.

Let me create the JIRA and update soon after getting some feedback from community.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic Please, construct AnyValue.

@MaxGekk
Copy link
Member

MaxGekk commented Dec 1, 2022

+1, LGTM. Merging to master.
Thank you, @itholic.

@MaxGekk MaxGekk closed this in 5badb24 Dec 1, 2022
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…GROUP_BY_CLAUSE`

### What changes were proposed in this pull request?

This PR proposes to rename `COLUMN_NOT_IN_GROUP_BY_CLAUSE` to `MISSING_AGGREGATION`.

Also, improve its error message.

### Why are the changes needed?

The current error class name and its error message doesn't illustrate the error cause and resolution correctly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

```
./build/sbt “sql/testOnly org.apache.spark.sql.SQLQueryTestSuite*”
```

Closes apache#38769 from itholic/SPARK-41128.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@itholic itholic deleted the SPARK-41128 branch April 22, 2023 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants