[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170 #41458

panbingkun · 2023-06-05T07:10:43Z

What changes were proposed in this pull request?

The pr aims to:

Refactor PreWriteCheck to use error framework.
Make INSERT_COLUMN_ARITY_MISMATCH more generic & avoiding to embed error's text in source code.
Assign name to _LEGACY_ERROR_TEMP_1170.
In INSERT_PARTITION_COLUMN_ARITY_MISMATCH error message, replace '' with toSQLId for table column name.

Why are the changes needed?

The changes improve the error framework.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually test.
Pass GA.

…_ERROR_TEMP_1170

MaxGekk · 2023-06-05T09:36:54Z

sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala

      query: LogicalPlan): Throwable = {
    new AnalysisException(
-      errorClass = "INSERT_COLUMN_ARITY_MISMATCH",
+      errorClass = "INSERT_COLUMN_ARITY_MISMATCH.TOO_MANY_DATA_COLUMNS",
      messageParameters = Map(
        "tableName" -> tableName,


Use toSQLId, please.

core/src/main/resources/error/error-classes.json

MaxGekk · 2023-06-05T09:41:01Z

core/src/main/resources/error/error-classes.json

@@ -1651,6 +1665,11 @@
    ],
    "sqlState" : "46110"
  },
+  "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT" : {
+    "message" : [
+      "<cmd> is not supported, if you want to enable it, please set `spark.sql.catalogImplementation` to `hive`."


Please, quote the SQL config and its value in the same way as toSQLConf and toSQLConfVal

MaxGekk · 2023-06-05T09:43:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala

-      errorClass = "_LEGACY_ERROR_TEMP_1170",
-      messageParameters = Map("detail" -> detail))
+      errorClass = "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT",
+      messageParameters = Map("cmd" -> cmd))


Is it a SQL statement? If so, quote it by toSQLStmt

Co-authored-by: Maxim Gekk <max.gekk@gmail.com>

…n/spark into refactor_PreWriteCheck

panbingkun · 2023-06-07T01:05:01Z

core/src/main/resources/error/error-classes.json

+    "subClass" : {
+      "NOT_ENOUGH_DATA_COLUMNS" : {
+        "message" : [
+          "not enough data columns: ",


detail reason

panbingkun · 2023-06-07T01:05:07Z

core/src/main/resources/error/error-classes.json

+      },
+      "TOO_MANY_DATA_COLUMNS" : {
+        "message" : [
+          "too many data columns: ",


detail reason

…_ERROR_TEMP_1170

MaxGekk · 2023-06-12T07:46:53Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala

-          "dataColumns" -> "'1', '2', '3'",
+          "staticPartCols" -> "`b`, `c`",
+          "tableColumns" -> "`a`, `d`, `b`, `c`",
+          "dataColumns" -> "`1`, `2`, `3`",
          "tableName" -> s"`spark_catalog`.`default`.`$tableName`")


Wonder why $tableName still starts from $?

Because it is a variable, its value in this case may be 'hive_table' or 'ds_table'.
Perhaps s"spark_catalog.default.${tableName}" can better express this?

MaxGekk · 2023-06-12T07:48:17Z

sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala

-          "reason" -> "not enough data columns",
-          "tableColumns" -> "'a', 'b'",
-          "dataColumns" -> "'a'"))
+          "tableName" -> toSQLId("unknown"),


just quote it by backticks.

Does it mean that in Test, try not to use toSQLId as much as possible? I'm actually a bit confused.😄

First of all, need to be consistent to other tests where we just place the expected values.
Second, if you have a bug in toSQLId, the test should catch it. But the reason is the first one.

MaxGekk · 2023-06-12T07:55:58Z

sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala

-      errorClass = "_LEGACY_ERROR_TEMP_1170",
-      messageParameters = Map("detail" -> detail))
+      errorClass = "NOT_SUPPORTED_COMMAND_WITHOUT_HIVE_SUPPORT",
+      messageParameters = Map("cmd" -> toSQLStmt(cmd)))


cmd is not a SQL statement in general, see what is passed to here:

"INSERT OVERWRITE DIRECTORY with the Hive format"

…_ERROR_TEMP_1170

MaxGekk

Waiting for Ci. @panbingkun Could you re-trigger GAs, please.

…_ERROR_TEMP_1170

panbingkun · 2023-06-18T13:51:02Z

Waiting for Ci. @panbingkun Could you re-trigger GAs, please.

Done.

MaxGekk · 2023-06-18T16:32:56Z

@panbingkun Could you fix the test failure, please:

[info] - insert by name: mismatch column name *** FAILED *** (161 milliseconds)
[info]   "...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS]" did not equal "...OLUMN_ARITY_MISMATCH[]" (SparkFunSuite.scala:315)
[info]   Analysis:
[info]   "...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS]" -> "...OLUMN_ARITY_MISMATCH[]"
[info]   org.scalatest.exceptions.TestFailedException:

…_ERROR_TEMP_1170

panbingkun · 2023-06-19T04:04:24Z

Waiting for Ci. @panbingkun Could you re-trigger GAs, please.

Done, Let's wait for Ci.
@MaxGekk GA is finally going green，😄

MaxGekk · 2023-06-19T22:43:04Z

+1, LGTM. Merging to master.
Thank you, @panbingkun.

…_ERROR_TEMP_1170 ### What changes were proposed in this pull request? The pr aims to: - Refactor `PreWriteCheck` to use error framework. - Make `INSERT_COLUMN_ARITY_MISMATCH` more generic & avoiding to embed error's text in source code. - Assign name to _LEGACY_ERROR_TEMP_1170. - In `INSERT_PARTITION_COLUMN_ARITY_MISMATCH` error message, replace '' with `toSQLId` for table column name. ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Manually test. - Pass GA. Closes apache#41458 from panbingkun/refactor_PreWriteCheck. Lead-authored-by: panbingkun <pbk1982@gmail.com> Co-authored-by: panbingkun <84731559@qq.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>

### What changes were proposed in this pull request? #41458 updated `numeric.sql.out` but not update `numeric.sql.out.java21`, this pr updated `numeric.sql.out.java21` for Java 21. ### Why are the changes needed? Fix golden file for Java 21. https://github.com/apache/spark/actions/runs/5362442727/jobs/9729315685 ``` [info] - postgreSQL/numeric.sql *** FAILED *** (1 minute, 4 seconds) [info] postgreSQL/numeric.sql [info] Expected "...OLUMN_ARITY_MISMATCH[", [info] "sqlState" : "21S01", [info] "messageParameters" : { [info] "dataColumns" : "'id', 'id', 'val', 'val', '(val * val)'", [info] "reason" : "too many data columns", [info] "tableColumns" : "'id1', 'id2', 'result']", [info] "tableName" :...", but got "...OLUMN_ARITY_MISMATCH[.TOO_MANY_DATA_COLUMNS", [info] "sqlState" : "21S01", [info] "messageParameters" : { [info] "dataColumns" : "`id`, `id`, `val`, `val`, `(val * val)`", [info] "tableColumns" : "`id1`, `id2`, `result`]", [info] "tableName" :..." Result did not match for query #474 [info] INSERT INTO num_result SELECT t1.id, t2.id, t1.val, t2.val, t1.val * t2.val [info] FROM num_data t1, num_data t2 (SQLQueryTestSuite.scala:848) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564) [info] at org.scalatest.Assertions.assertResult(Assertions.scala:847) [info] at org.scalatest.Assertions.assertResult$(Assertions.scala:842) [info] at org.scalatest.funsuite.AnyFunSuite.assertResult(AnyFunSuite.scala:1564) [info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$readGoldenFileAndCompareResults$3(SQLQueryTestSuite.scala:848) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual checked using Java 21 Closes #41720 from LuciferYang/SPARK-43969-FOLLOWUP-2. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

20ec2e9

…_ERROR_TEMP_1170

github-actions bot added CORE SQL labels Jun 5, 2023

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

4eb7a36

…_ERROR_TEMP_1170

MaxGekk requested changes Jun 5, 2023

View reviewed changes

panbingkun and others added 3 commits June 7, 2023 08:42

Merge branch 'master' into refactor_PreWriteCheck

fc8fd9d

Update core/src/main/resources/error/error-classes.json

863e60b

Co-authored-by: Maxim Gekk <max.gekk@gmail.com>

Merge branch 'refactor_PreWriteCheck' of https://github.com/panbingku…

a8769ed

…n/spark into refactor_PreWriteCheck

panbingkun commented Jun 7, 2023

View reviewed changes

panbingkun added 2 commits June 7, 2023 09:06

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

43dfc6c

…_ERROR_TEMP_1170

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

4c71159

…_ERROR_TEMP_1170

panbingkun requested a review from MaxGekk June 7, 2023 06:08

panbingkun added 3 commits June 7, 2023 19:40

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

27592d4

…_ERROR_TEMP_1170

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

f17e0ba

…_ERROR_TEMP_1170

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

807e03a

…_ERROR_TEMP_1170

github-actions bot added the PYTHON label Jun 8, 2023

panbingkun added 3 commits June 9, 2023 09:42

Merge branch 'master' into refactor_PreWriteCheck

409ac28

Merge branch 'master' into refactor_PreWriteCheck

ec055d2

fix it

abd7e16

MaxGekk requested changes Jun 12, 2023

View reviewed changes

panbingkun added 2 commits June 12, 2023 19:10

Merge branch 'master' into refactor_PreWriteCheck

f7e49e3

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

8a05d31

…_ERROR_TEMP_1170

panbingkun requested a review from MaxGekk June 12, 2023 23:07

panbingkun added 2 commits June 13, 2023 22:18

Merge branch 'master' into refactor_PreWriteCheck

ba88418

Merge branch 'master' into refactor_PreWriteCheck

82b9f21

MaxGekk approved these changes Jun 18, 2023

View reviewed changes

panbingkun added 2 commits June 18, 2023 20:01

Merge branch 'master' into refactor_PreWriteCheck

9632455

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

92fddbe

…_ERROR_TEMP_1170

panbingkun added 2 commits June 19, 2023 06:21

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY…

7568af5

…_ERROR_TEMP_1170

Merge branch 'master' into refactor_PreWriteCheck

c8cb9af

MaxGekk closed this Jun 20, 2023

LuciferYang mentioned this pull request Jun 25, 2023

[SPARK-43969][SQL][TESTS][FOLLOWUP] Update numeric.sql.out.java21 #41720

Closed

panbingkun deleted the refactor_PreWriteCheck branch July 9, 2023 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170 #41458

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170 #41458

panbingkun commented Jun 5, 2023 •

edited

Loading

MaxGekk Jun 5, 2023

MaxGekk Jun 5, 2023

MaxGekk Jun 5, 2023

panbingkun Jun 7, 2023

panbingkun Jun 7, 2023

MaxGekk Jun 12, 2023

panbingkun Jun 12, 2023 •

edited

Loading

MaxGekk Jun 12, 2023

panbingkun Jun 12, 2023

MaxGekk Jun 18, 2023

MaxGekk Jun 12, 2023

panbingkun Jun 12, 2023

MaxGekk left a comment

panbingkun commented Jun 18, 2023

MaxGekk commented Jun 18, 2023

panbingkun commented Jun 19, 2023 •

edited

Loading

MaxGekk commented Jun 19, 2023

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170 #41458

[SPARK-43969][SQL] Refactor & Assign names to the error class _LEGACY_ERROR_TEMP_1170 #41458

Conversation

panbingkun commented Jun 5, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panbingkun Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MaxGekk left a comment

Choose a reason for hiding this comment

panbingkun commented Jun 18, 2023

MaxGekk commented Jun 18, 2023

panbingkun commented Jun 19, 2023 • edited Loading

MaxGekk commented Jun 19, 2023

panbingkun commented Jun 5, 2023 •

edited

Loading

panbingkun Jun 12, 2023 •

edited

Loading

panbingkun commented Jun 19, 2023 •

edited

Loading