Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170 #41458

Closed
wants to merge 21 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jun 5, 2023

What changes were proposed in this pull request?

The pr aims to:

  • Refactor PreWriteCheck to use error framework.
  • Make INSERT_COLUMN_ARITY_MISMATCH more generic & avoiding to embed error's text in source code.
  • Assign name to _LEGACY_ERROR_TEMP_1170.
  • In INSERT_PARTITION_COLUMN_ARITY_MISMATCH error message, replace '' with toSQLId for table column name.

Why are the changes needed?

The changes improve the error framework.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Manually test.
  • Pass GA.

query: LogicalPlan): Throwable = {
new AnalysisException(
errorClass = "INSERT_COLUMN_ARITY_MISMATCH",
errorClass = "INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS",
messageParameters = Map(
"tableName" -> tableName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use toSQLId, please.

core/src/main/resources/error/error-classes.json Outdated Show resolved Hide resolved
@@ -1651,6 +1665,11 @@
],
"sqlState" : "46110"
},
"NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT" : {
"message" : [
"<cmd> is not supported, if you want to enable it, please set `spark.sql.catalogImplementation` to `hive`."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, quote the SQL config and its value in the same way as toSQLConf and toSQLConfVal

errorClass = "_LEGACY_ERROR_TEMP_1170",
messageParameters = Map("detail" -> detail))
errorClass = "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT",
messageParameters = Map("cmd" -> cmd))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a SQL statement? If so, quote it by toSQLStmt

"subClass" : {
"NOT_ENOUGH_DATA_COLUMNS" : {
"message" : [
"not enough data columns: ",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detail reason

},
"TOO_MANY_DATA_COLUMNS" : {
"message" : [
"too many data columns: ",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detail reason

@panbingkun panbingkun requested a review from MaxGekk June 7, 2023 06:08
@github-actions github-actions bot added the PYTHON label Jun 8, 2023
"dataColumns" -> "'1', '2', '3'",
"staticPartCols" -> "`b`, `c`",
"tableColumns" -> "`a`, `d`, `b`, `c`",
"dataColumns" -> "`1`, `2`, `3`",
"tableName" -> s"`spark_catalog`.`default`.`$tableName`")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder why $tableName still starts from $?

Copy link
Contributor Author

@panbingkun panbingkun Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is a variable, its value in this case may be 'hive_table' or 'ds_table'.
Perhaps s"spark_catalog.default.${tableName}" can better express this?

"reason" -> "not enough data columns",
"tableColumns" -> "'a', 'b'",
"dataColumns" -> "'a'"))
"tableName" -> toSQLId("unknown"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just quote it by backticks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that in Test, try not to use toSQLId as much as possible? I'm actually a bit confused.😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, need to be consistent to other tests where we just place the expected values.
Second, if you have a bug in toSQLId, the test should catch it. But the reason is the first one.

errorClass = "_LEGACY_ERROR_TEMP_1170",
messageParameters = Map("detail" -> detail))
errorClass = "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT",
messageParameters = Map("cmd" -> toSQLStmt(cmd)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd is not a SQL statement in general, see what is passed to here:

"INSERT OVERWRITE DIRECTORY with the Hive format"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@panbingkun panbingkun requested a review from MaxGekk June 12, 2023 23:07
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for Ci. @panbingkun Could you re-trigger GAs, please.

@panbingkun
Copy link
Contributor Author

Waiting for Ci. @panbingkun Could you re-trigger GAs, please.

Done.

@MaxGekk
Copy link
Member

MaxGekk commented Jun 18, 2023

@panbingkun Could you fix the test failure, please:

[info] - insert by name: mismatch column name *** FAILED *** (161 milliseconds)
[info]   "...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS]" did not equal "...OLUMN_ARITY_MISMATCH[]" (SparkFunSuite.scala:315)
[info]   Analysis:
[info]   "...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS]" -> "...OLUMN_ARITY_MISMATCH[]"
[info]   org.scalatest.exceptions.TestFailedException:

@panbingkun
Copy link
Contributor Author

panbingkun commented Jun 19, 2023

Waiting for Ci. @panbingkun Could you re-trigger GAs, please.

Done, Let's wait for Ci.
@MaxGekk GA is finally going green,😄

@MaxGekk
Copy link
Member

MaxGekk commented Jun 19, 2023

+1, LGTM. Merging to master.
Thank you, @panbingkun.

panbingkun added a commit to panbingkun/spark that referenced this pull request Jun 19, 2023
…_ERROR_TEMP_1170

### What changes were proposed in this pull request?
The pr aims to:
- Refactor `PreWriteCheck` to use error framework.
- Make `INSERT_COLUMN_ARITY_MISMATCH` more generic & avoiding to embed error's text in source code.
- Assign name to _LEGACY_ERROR_TEMP_1170.
- In `INSERT_PARTITION_COLUMN_ARITY_MISMATCH` error message, replace '' with `toSQLId` for table column name.

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test.
- Pass GA.

Closes apache#41458 from panbingkun/refactor_PreWriteCheck.

Lead-authored-by: panbingkun <pbk1982@gmail.com>
Co-authored-by: panbingkun <84731559@qq.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk MaxGekk closed this Jun 20, 2023
dongjoon-hyun pushed a commit that referenced this pull request Jun 25, 2023
### What changes were proposed in this pull request?
#41458 updated `numeric.sql.out` but not update `numeric.sql.out.java21`, this pr updated `numeric.sql.out.java21` for Java 21.

### Why are the changes needed?
Fix golden file for Java 21.

https://github.com/apache/spark/actions/runs/5362442727/jobs/9729315685

```
[info] - postgreSQL/numeric.sql *** FAILED *** (1 minute, 4 seconds)
[info]   postgreSQL/numeric.sql
[info]   Expected "...OLUMN_ARITY_MISMATCH[",
[info]     "sqlState" : "21S01",
[info]     "messageParameters" : {
[info]       "dataColumns" : "'id', 'id', 'val', 'val', '(val * val)'",
[info]       "reason" : "too many data columns",
[info]       "tableColumns" : "'id1', 'id2', 'result']",
[info]       "tableName" :...", but got "...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS",
[info]     "sqlState" : "21S01",
[info]     "messageParameters" : {
[info]       "dataColumns" : "`id`, `id`, `val`, `val`, `(val * val)`",
[info]       "tableColumns" : "`id1`, `id2`, `result`]",
[info]       "tableName" :..." Result did not match for query #474
[info]   INSERT INTO num_result SELECT t1.id, t2.id, t1.val, t2.val, t1.val * t2.val
[info]       FROM num_data t1, num_data t2 (SQLQueryTestSuite.scala:848)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Assertions.assertResult(Assertions.scala:847)
[info]   at org.scalatest.Assertions.assertResult$(Assertions.scala:842)
[info]   at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564)
[info]   at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$readGoldenFileAndCompareResults$3(SQLQueryTestSuite.scala:848)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual checked using Java 21

Closes #41720 from LuciferYang/SPARK-43969-FOLLOWUP-2.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@panbingkun panbingkun deleted the refactor_PreWriteCheck branch July 9, 2023 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants