Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-45022][SQL] Provide context for dataset API errors #42816

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Sep 5, 2023

What changes were proposed in this pull request?

This PR captures the dataset APIs used by the user code and the call site in the user code and provides better error messages.

E.g. consider the following Spark app SimpleApp.scala:

   1  import org.apache.spark.sql.SparkSession
   2  import org.apache.spark.sql.functions._
   3
   4  object SimpleApp {
   5    def main(args: Array[String]) {
   6      val spark = SparkSession.builder.appName("Simple Application").config("spark.sql.ansi.enabled", true).getOrCreate()
   7      import spark.implicits._
   8
   9      val c = col("a") / col("b")
  10
  11      Seq((1, 0)).toDF("a", "b").select(c).show()
  12
  13      spark.stop()
  14    }
  15  }

After this PR the error message contains the error context (which Spark Dataset API is called from where in the user code) in the following form:

Exception in thread "main" org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== Dataset ==
"div" was called from SimpleApp$.main(SimpleApp.scala:9)

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
	at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:672
...

which is similar to the already provided context in case of SQL queries:

org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 1) ==
a / b
^^^^^

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
	at org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
...

Please note that stack trace in spark-shell doesn't contain meaningful elements:

scala> Thread.currentThread().getStackTrace.foreach(println)
java.base/java.lang.Thread.getStackTrace(Thread.java:1602)
$line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:23)
$line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:27)
$line15.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:29)
$line15.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31)
$line15.$read$$iw$$iw$$iw$$iw.<init>(<console>:33)
$line15.$read$$iw$$iw$$iw.<init>(<console>:35)
$line15.$read$$iw$$iw.<init>(<console>:37)
$line15.$read$$iw.<init>(<console>:39)
$line15.$read.<init>(<console>:41)
$line15.$read$.<init>(<console>:45)
$line15.$read$.<clinit>(<console>)
$line15.$eval$.$print$lzycompute(<console>:7)
$line15.$eval$.$print(<console>:6)
$line15.$eval.$print(<console>)
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
...

so this change doesn't help with that usecase.

Why are the changes needed?

To provide more user friendly errors.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Added new UTs to QueryExecutionAnsiErrorsSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@peter-toth peter-toth changed the title [WIP][SPARK-45022][SQL] Provide context for dataset API errors [WIP][SPARK-45022][SQL][test-java11] Provide context for dataset API errors Sep 5, 2023
@peter-toth peter-toth changed the title [WIP][SPARK-45022][SQL][test-java11] Provide context for dataset API errors [WIP][SPARK-45022][SQL] Provide context for dataset API errors Sep 5, 2023
@MaxGekk MaxGekk closed this Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants