[WIP][SPARK-45022][SQL] Provide context for dataset API errors #42816

MaxGekk · 2023-09-05T08:23:45Z

What changes were proposed in this pull request?

This PR captures the dataset APIs used by the user code and the call site in the user code and provides better error messages.

E.g. consider the following Spark app SimpleApp.scala:

   1  import org.apache.spark.sql.SparkSession
   2  import org.apache.spark.sql.functions._
   3
   4  object SimpleApp {
   5    def main(args: Array[String]) {
   6      val spark = SparkSession.builder.appName("Simple Application").config("spark.sql.ansi.enabled", true).getOrCreate()
   7      import spark.implicits._
   8
   9      val c = col("a") / col("b")
  10
  11      Seq((1, 0)).toDF("a", "b").select(c).show()
  12
  13      spark.stop()
  14    }
  15  }

After this PR the error message contains the error context (which Spark Dataset API is called from where in the user code) in the following form:

Exception in thread "main" org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== Dataset ==
"div" was called from SimpleApp$.main(SimpleApp.scala:9)

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
	at org.apache.spark.sql.catalyst.expressions.DivModLike.eval(arithmetic.scala:672
...

which is similar to the already provided context in case of SQL queries:

org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
== SQL(line 1, position 1) ==
a / b
^^^^^

	at org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:201)
	at org.apache.spark.sql.errors.QueryExecutionErrors.divideByZeroError(QueryExecutionErrors.scala)
...

Please note that stack trace in spark-shell doesn't contain meaningful elements:

scala> Thread.currentThread().getStackTrace.foreach(println)
java.base/java.lang.Thread.getStackTrace(Thread.java:1602)
$line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:23)
$line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:27)
$line15.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:29)
$line15.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31)
$line15.$read$$iw$$iw$$iw$$iw.<init>(<console>:33)
$line15.$read$$iw$$iw$$iw.<init>(<console>:35)
$line15.$read$$iw$$iw.<init>(<console>:37)
$line15.$read$$iw.<init>(<console>:39)
$line15.$read.<init>(<console>:41)
$line15.$read$.<init>(<console>:45)
$line15.$read$.<clinit>(<console>)
$line15.$eval$.$print$lzycompute(<console>:7)
$line15.$eval$.$print(<console>:6)
$line15.$eval.$print(<console>)
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
...

so this change doesn't help with that usecase.

Why are the changes needed?

To provide more user friendly errors.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Added new UTs to QueryExecutionAnsiErrorsSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

[SPARK-45022][SQL] Provide context for dataset API errors

7556ffa

github-actions bot added SQL CORE CONNECT labels Sep 5, 2023

Trigger build

4b75c4d

peter-toth changed the title ~~[WIP][SPARK-45022][SQL] Provide context for dataset API errors~~ [WIP][SPARK-45022][SQL][test-java11] Provide context for dataset API errors Sep 5, 2023

peter-toth changed the title ~~[WIP][SPARK-45022][SQL][test-java11] Provide context for dataset API errors~~ [WIP][SPARK-45022][SQL] Provide context for dataset API errors Sep 5, 2023

MaxGekk closed this Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][SPARK-45022][SQL] Provide context for dataset API errors #42816

[WIP][SPARK-45022][SQL] Provide context for dataset API errors #42816

MaxGekk commented Sep 5, 2023

[WIP][SPARK-45022][SQL] Provide context for dataset API errors #42816

[WIP][SPARK-45022][SQL] Provide context for dataset API errors #42816

Conversation

MaxGekk commented Sep 5, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?