[WIP][SPARK-47338][SQL] Introduce `UNCLASSIFIED` for default error class by itholic · Pull Request #45457 · apache/spark

itholic · 2024-03-11T08:12:22Z

What changes were proposed in this pull request?

This PR proposes to introduce UNCLASSIFIED for default error class when error class is not defined.

Why are the changes needed?

In Spark, when an errorClass is not explicitly defined for an exception, the method getErrorClass returns null so far.

This behavior can lead to ambiguity and makes debugging more challenging by not providing a clear indication that the error class was not set.

Does this PR introduce any user-facing change?

No API changes, but the user-facing error message will contain UNCLASSIFIED when error class is not specified.

How was this patch tested?

Updated the existing UT (SparkThrowableSuite)

Was this patch authored or co-authored using generative AI tooling?

No

…ault_error_class

MaxGekk · 2024-03-13T05:42:23Z

common/utils/src/main/resources/error/error-classes.json

@itholic Why did you name it with the prefix _LEGACY_? Do you plan to eliminate it in the future?

Because I thought that not having an error class assigned basically meant it was a LEGACY error, but I have not very strong opinion. Do you have any preference? Also cc @srielau FYI

Because I thought that not having an error class assigned basically meant it was a LEGACY error

I would say it is true. SparkException can still be raised w/ just a message since it is not fully ported on error classes. For instance:

spark/common/utils/src/main/scala/org/apache/spark/util/SparkThreadUtils.scala

Lines 51 to 53 in e170252

case NonFatal(t)

if !t.isInstanceOf[TimeoutException] =>

throw new SparkException("Exception thrown in awaitResult: ", t)

Since we know the cases when the error class is not set, how about just name the error class like UNCLASSIFIED

Sounds reasonable to me. Let me address it.

…ault_error_class

MaxGekk

@itholic Please, update PR's title and description according to your changes.

…ault_error_class

itholic · 2024-03-19T08:55:11Z

Updated PR title & description. Let me take a look at the CI failure

…ault_error_class

MaxGekk

Waiting for CI.

xinrong-meng · 2024-03-20T17:32:54Z

LGTM once CI passed, thank you!

…ault_error_class

MaxGekk · 2024-03-26T05:46:41Z

core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala

      pairs.saveAsNewAPIHadoopFile[NewFakeFormatWithCallback]("ignored")
    }
-    assert(e.getCause.getMessage contains "failed to write")
+    assert(e.getCause.getMessage contains "Task failed while writing rows")


how it happens that you have to change this?

That is also my question. I believe this error message should not been affected by current change, but CI keep complaining about this.

So I modified it for testing purposes to see if this would really change the response of CI.

MaxGekk · 2024-03-26T05:50:15Z

sql/core/src/test/resources/sql-tests/results/udtf/udtf.sql.out

 struct<>
 -- !query output
-org.apache.spark.api.python.PythonException
-pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_EXEC_ERROR] User defined table function encountered an error in the 'eval' or 'terminate' method: Column 0 within a returned row had a value of None, either directly or within array/struct/map subfields, but the corresponding column type was declared as non-nullable; please update the UDTF to return a non-None value at this location or otherwise declare the column type as nullable.


The deleted error message seems reasonable. Do you know why it is replaced?

Yeah, I agree that this looks as bit weird.

The reason is that the UDTF_EXEC_ERROR is defined from PySpark side, so technically it is UNCLASSIFIED from JVM logic as it is not defined from error-classes.json.

But the existing error message still shows on user space, such as:

org.apache.spark.SparkException: [UNCLASSIFIED] pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_EXEC_ERROR] User defined table function encountered an error in the 'eval' or 'terminate' method: Column 0 within a returned row had a value of None, either directly or within array/struct/map subfields, but the corresponding column type was declared as non-nullable; please update the UDTF to return a non-None value at this location or otherwise declare the column type as nullable.

I'm not sure if it would be better to keep the existing error message for PythonException or mark it as UNCLASSIFIED.

itholic · 2024-03-26T06:14:35Z

Let me mark it as a draft for now, as I haven't been able to find a clear cause as to why the CI is complaining.

itholic · 2024-07-24T23:59:50Z

Sorry but let me reopen this PR with current master branch since it's staled too long.

itholic added 2 commits March 11, 2024 17:08

[SPARK-47338][SQL] Introduce for default error class

81eeac5

fix test

267522c

github-actions bot added the CORE label Mar 12, 2024

Merge branch 'master' of https://github.com/apache/spark into set_def…

73c102d

…ault_error_class

MaxGekk reviewed Mar 13, 2024

View reviewed changes

fix test

693ddb6

github-actions bot added SQL STRUCTURED STREAMING PYTHON CONNECT labels Mar 13, 2024

Rename & fix test

428d65c

github-actions bot added the DOCS label Mar 14, 2024

Merge branch 'master' of https://github.com/apache/spark into set_def…

b9ba0ab

…ault_error_class

MaxGekk requested changes Mar 18, 2024

View reviewed changes

itholic changed the title ~~[WIP][SPARK-47338][SQL] Introduce _LEGACY_ERROR_UNKNOWN for default error class~~ [WIP][SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class Mar 19, 2024

Merge branch 'master' of https://github.com/apache/spark into set_def…

2f04a01

…ault_error_class

itholic changed the title ~~[WIP][SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class~~ [SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class Mar 19, 2024

itholic marked this pull request as ready for review March 19, 2024 23:55

itholic added 3 commits March 20, 2024 09:16

Merge branch 'master' of https://github.com/apache/spark into set_def…

3a3abbb

…ault_error_class

Merge branch 'master' of https://github.com/apache/spark into set_def…

599f96d

…ault_error_class

Fix E2ETests

8ab0bd7

MaxGekk approved these changes Mar 20, 2024

View reviewed changes

itholic added 2 commits March 20, 2024 16:35

fix test

9705678

regen golden files

0e91c5f

xinrong-meng approved these changes Mar 20, 2024

View reviewed changes

itholic added 3 commits March 21, 2024 17:01

fix test

4ac0f9a

Merge branch 'master' of https://github.com/apache/spark into set_def…

483c02d

…ault_error_class

Merge branch 'master' of https://github.com/apache/spark into set_def…

63d5268

…ault_error_class

itholic added 3 commits March 25, 2024 13:58

fix linter

da97a27

Merge branch 'master' of https://github.com/apache/spark into set_def…

d2cd9db

…ault_error_class

Fix test

36a3811

MaxGekk reviewed Mar 26, 2024

View reviewed changes

itholic changed the title ~~[SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class~~ [WIP][SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class Mar 26, 2024

itholic marked this pull request as draft March 26, 2024 06:14

itholic added 4 commits April 18, 2024 16:32

resolve conflicts

237b4a6

resolve conflcits

bc70d5e

restore

ceb7466

restore

06f6de8

github-actions bot removed the DOCS label Apr 23, 2024

itholic closed this Jul 24, 2024

	case NonFatal(t)
	if !t.isInstanceOf[TimeoutException] =>
	throw new SparkException("Exception thrown in awaitResult: ", t)

Conversation

itholic commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

itholic commented Mar 19, 2024

Uh oh!

MaxGekk left a comment

Choose a reason for hiding this comment

Uh oh!

xinrong-meng commented Mar 20, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

itholic commented Mar 26, 2024

Uh oh!

itholic commented Jul 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

itholic commented Mar 11, 2024 •

edited

Loading