Skip to content

[WIP][SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class#45457

Closed
itholic wants to merge 22 commits intoapache:masterfrom
itholic:set_default_error_class
Closed

[WIP][SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class#45457
itholic wants to merge 22 commits intoapache:masterfrom
itholic:set_default_error_class

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Mar 11, 2024

What changes were proposed in this pull request?

This PR proposes to introduce UNCLASSIFIED for default error class when error class is not defined.

Why are the changes needed?

In Spark, when an errorClass is not explicitly defined for an exception, the method getErrorClass returns null so far.

This behavior can lead to ambiguity and makes debugging more challenging by not providing a clear indication that the error class was not set.

Does this PR introduce any user-facing change?

No API changes, but the user-facing error message will contain UNCLASSIFIED when error class is not specified.

How was this patch tested?

Updated the existing UT (SparkThrowableSuite)

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Mar 12, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic Why did you name it with the prefix _LEGACY_? Do you plan to eliminate it in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I thought that not having an error class assigned basically meant it was a LEGACY error, but I have not very strong opinion. Do you have any preference? Also cc @srielau FYI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I thought that not having an error class assigned basically meant it was a LEGACY error

I would say it is true. SparkException can still be raised w/ just a message since it is not fully ported on error classes. For instance:

case NonFatal(t)
if !t.isInstanceOf[TimeoutException] =>
throw new SparkException("Exception thrown in awaitResult: ", t)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the cases when the error class is not set, how about just name the error class like UNCLASSIFIED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me. Let me address it.

@github-actions github-actions bot added the DOCS label Mar 14, 2024
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itholic Please, update PR's title and description according to your changes.

@itholic itholic changed the title [WIP][SPARK-47338][SQL] Introduce _LEGACY_ERROR_UNKNOWN for default error class [WIP][SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class Mar 19, 2024
@itholic
Copy link
Contributor Author

itholic commented Mar 19, 2024

Updated PR title & description. Let me take a look at the CI failure

@itholic itholic changed the title [WIP][SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class [SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class Mar 19, 2024
@itholic itholic marked this pull request as ready for review March 19, 2024 23:55
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI.

@xinrong-meng
Copy link
Member

LGTM once CI passed, thank you!

pairs.saveAsNewAPIHadoopFile[NewFakeFormatWithCallback]("ignored")
}
assert(e.getCause.getMessage contains "failed to write")
assert(e.getCause.getMessage contains "Task failed while writing rows")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how it happens that you have to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also my question. I believe this error message should not been affected by current change, but CI keep complaining about this.

So I modified it for testing purposes to see if this would really change the response of CI.

struct<>
-- !query output
org.apache.spark.api.python.PythonException
pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_EXEC_ERROR] User defined table function encountered an error in the 'eval' or 'terminate' method: Column 0 within a returned row had a value of None, either directly or within array/struct/map subfields, but the corresponding column type was declared as non-nullable; please update the UDTF to return a non-None value at this location or otherwise declare the column type as nullable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deleted error message seems reasonable. Do you know why it is replaced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that this looks as bit weird.

The reason is that the UDTF_EXEC_ERROR is defined from PySpark side, so technically it is UNCLASSIFIED from JVM logic as it is not defined from error-classes.json.

But the existing error message still shows on user space, such as:

org.apache.spark.SparkException: [UNCLASSIFIED] pyspark.errors.exceptions.base.PySparkRuntimeError: [UDTF_EXEC_ERROR] User defined table function encountered an error in the 'eval' or 'terminate' method: Column 0 within a returned row had a value of None, either directly or within array/struct/map subfields, but the corresponding column type was declared as non-nullable; please update the UDTF to return a non-None value at this location or otherwise declare the column type as nullable.

I'm not sure if it would be better to keep the existing error message for PythonException or mark it as UNCLASSIFIED.

@itholic itholic changed the title [SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class [WIP][SPARK-47338][SQL] Introduce UNCLASSIFIED for default error class Mar 26, 2024
@itholic
Copy link
Contributor Author

itholic commented Mar 26, 2024

Let me mark it as a draft for now, as I haven't been able to find a clear cause as to why the CI is complaining.

@itholic itholic marked this pull request as draft March 26, 2024 06:14
@github-actions github-actions bot removed the DOCS label Apr 23, 2024
@itholic
Copy link
Contributor Author

itholic commented Jul 24, 2024

Sorry but let me reopen this PR with current master branch since it's staled too long.

@itholic itholic closed this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments