Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41406][SQL] Refactor error message for NUM_COLUMNS_MISMATCH to make it more generic #38937

Closed
wants to merge 4 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Dec 6, 2022

What changes were proposed in this pull request?

The pr aims to refactor error message for NUM_COLUMNS_MISMATCH to make it more generic.

Why are the changes needed?

The changes improve the error framework.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Update existed UT.
Pass GA.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@panbingkun
Copy link
Contributor Author

cc @MaxGekk

@@ -230,10 +230,9 @@ org.apache.spark.sql.AnalysisException
{
"errorClass" : "NUM_COLUMNS_MISMATCH",
"messageParameters" : {
"invalidNumColumns" : "2",
"invalidOrdinalNum" : "second",
"leftNumCols" : "1",
"operator" : "EXCEPTALL",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, could you output it as EXCEPT ALL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -932,7 +932,7 @@
},
"NUM_COLUMNS_MISMATCH" : {
"message" : [
"<operator> can only be performed on tables with the same number of columns, but the first table has <refNumColumns> columns and the <invalidOrdinalNum> table has <invalidNumColumns> columns."
"<operator> expects matching number of columns. But the left side (target) has <leftNumCols> while the right side (source) has <rightNumCols>."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expects matching number of columns -> expect matching numbers of columns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @srielau @srowen Please, review this error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an 'operator' here and can the left always be called the target?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of the operator being UNION (hence left and right).
So:
SET
INSERT
UNION
INTERSECT
EXCEPT
IN (?) -- this may go through struct logic not sure)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid words like left/right and reference/target, and just slightly improve the existing error template by replacing the word table to let's say input:

"<operator> can only be performed on inputs with the same number of columns, but the first input has <firstNumColumns> columns and the <invalidOrdinalNum> input has <invalidNumColumns> columns."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun Could you change the error message, please.

@@ -1011,7 +1012,7 @@ class DataFrameSetOperationsSuite extends QueryTest with SharedSparkSession {
val errMsg = intercept[AnalysisException] {
df1.unionByName(df2)
}.getMessage
assert(errMsg.contains("Union can only be performed on tables with" +
assert(errMsg.contains("UNION can only be performed on tables with" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it another error class, right?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI.

@MaxGekk
Copy link
Member

MaxGekk commented Dec 13, 2022

+1, LGTM. All GAs passed. Merging to master.
Thank you, @panbingkun and @srielau @srowen @bjornjorgensen for review.

@MaxGekk MaxGekk closed this in 0e2d604 Dec 13, 2022
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…to make it more generic

### What changes were proposed in this pull request?
The pr aims to refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
Yes.

### How was this patch tested?
Update existed UT.
Pass GA.

Closes apache#38937 from panbingkun/SPARK-41406.

Authored-by: panbingkun <pbk1982@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants