[SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context #43695

MaxGekk · 2023-11-07T06:51:09Z

What changes were proposed in this pull request?

In the PR, I propose to add new SQL config spark.sql.stackTracesInDataFrameContext which defines how many non-Spark stack traces should be captured into DataFrame query context. By default, the config is set to 1.

Why are the changes needed?

To improve user experience with Spark SQL. When users troubleshoot an issue, they might need more stack traces in the DataFrame context. For example:

scala> spark.conf.set("spark.sql.ansi.enabled", true)
scala> spark.conf.set("spark.sql.stackTracesInDataFrameContext", 3)
scala> spark.range(1).select(lit(1) / lit(0)).collect()
org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012
== DataFrame ==
"div" was called from
<init>(<console>:1)
<init>(<console>:16)
.<clinit>(<console>:1)

Does this PR introduce any user-facing change?

No, it doesn't change the default behaviour.

How was this patch tested?

By running the modified test suite:

$ build/sbt "test:testOnly *QueryContextSuite"

Was this patch authored or co-authored using generative AI tooling?

No.

…nf-2

cloud-fan · 2023-11-08T15:15:43Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+      "When it is set to 0, captured one Spark traces and a followed non-Spark trace.")
+    .version("4.0.0")
+    .intConf
+    .checkValue(_ >= 0, "The number of extra thread traces must be non-negative.")


should it be > 0?

This is extra traces, see the config description:

When it is set to 0, captured one Spark traces and a followed non-Spark trace.

For instance

when it is 0, we return 4 and 5
when it is 1, we return 4, 5, 6

I'm a bit confused.
Intuitively, I feel that 0 should represent the absence of non-Spark trace.

Intuitively, I feel that 0 should represent the absence of non-Spark trace.

Actually, it works in this way. Let me modify the config and PR description. The slice method excludes the until index.

I feel that:
default 1: this is consistent with before.
The number of extraOriginTraces should consistent with the number of non-Spark trace size.
Then we use > 0 as the constraint.

cloud-fan · 2023-11-08T15:16:17Z

Can we put a real example in the PR description?

… context ### What changes were proposed in this pull request? In the PR, I propose to include all available stack traces in DataFrame context to the `callSite` field and apparently to the `summary`. For now, DataFrame context contains only one item of stack trace, but later we'll add a config to control the number of items in stack traces (see #43695). ### Why are the changes needed? To improve user experience with Spark SQL while debugging some issue. Users can see all available stack trace, and see from where the issue comes in user code from. ### Does this PR introduce _any_ user-facing change? No, should not. Even if user's code parses the summary. ### How was this patch tested? By running new test suite: ``` $ build/sbt "test:testOnly *QueryContextSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43758 from MaxGekk/output-stack-trace. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…nf-2

cloud-fan · 2023-11-24T03:22:46Z

Can we show the impact to the real error message, instead of val ctx = try { df.select(explode($"*")) } catch { case e: AnalysisException => e.context.head }?

MaxGekk · 2023-11-25T18:04:10Z

Can we show the impact to the real error message

@cloud-fan I added an example, please, take a look at the PR.

cloud-fan · 2023-11-26T07:05:32Z

@MaxGekk not quite related in this PR, but what if the expression creation is different from the df creation? like

val divCol = lit(1) / lit(0)
spark.range(1).select(divCol).collect()

MaxGekk · 2023-11-26T07:54:58Z

@cloud-fan Quite the same:

scala> val divCol = lit(1) / lit(0)
val divCol: org.apache.spark.sql.Column = `/`(1, 0)

scala> spark.range(1).select(divCol).collect()
org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012
== DataFrame ==
"div" was called from
<init>(<console>:1)
<init>(<console>:15)
.<clinit>(<console>:1)

but when I create it in an object:

scala> object Obj1 {
     | val divCol = lit(1) / lit(0)
     | }
object Obj1

scala> spark.range(1).select(Obj1.divCol).collect()
org.apache.spark.SparkArithmeticException: [DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate divisor being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22012
== DataFrame ==
"div" was called from
Obj1$.<init>(<console>:2)
Obj1$lzycompute$1(<console>:1)
Obj1(<console>:1)

MaxGekk · 2023-11-26T13:09:00Z

Merging to master. Thank you, @cloud-fan and @beliefer for review.

beliefer

late LGTM.

Add a SQL config for extra traces in Origin

8bae905

github-actions bot added the SQL label Nov 7, 2023

MaxGekk changed the title ~~[WIP][SQL] Add a SQL config for extra traces in Origin~~ [WIP][SPARK-45826][SQL] Add a SQL config for extra traces in Origin Nov 7, 2023

MaxGekk added 2 commits November 8, 2023 15:57

Merge remote-tracking branch 'origin/master' into df-context-slice-co…

3f247ce

…nf-2

Update a test

aa82801

MaxGekk changed the title ~~[WIP][SPARK-45826][SQL] Add a SQL config for extra traces in Origin~~ [SPARK-45826][SQL] Add a SQL config for extra traces in Origin Nov 8, 2023

MaxGekk marked this pull request as ready for review November 8, 2023 13:44

MaxGekk requested a review from cloud-fan November 8, 2023 13:45

cloud-fan reviewed Nov 8, 2023

View reviewed changes

MaxGekk mentioned this pull request Nov 10, 2023

[SPARK-45886][SQL] Output full stack trace in callSite of DataFrame context #43758

Closed

Merge remote-tracking branch 'origin/master' into df-context-slice-co…

0e5732b

…nf-2

Rename the config

1614db4

MaxGekk changed the title ~~[SPARK-45826][SQL] Add a SQL config for extra traces in Origin~~ [SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context Nov 25, 2023

MaxGekk requested review from beliefer and cloud-fan November 25, 2023 18:01

Adjust tests

71bf7a3

cloud-fan approved these changes Nov 26, 2023

View reviewed changes

MaxGekk closed this in d30c9a9 Nov 26, 2023

beliefer reviewed Nov 27, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context #43695

[SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context #43695

MaxGekk commented Nov 7, 2023 •

edited

Loading

cloud-fan Nov 8, 2023

MaxGekk Nov 8, 2023

beliefer Nov 9, 2023

MaxGekk Nov 9, 2023

beliefer Nov 9, 2023

cloud-fan commented Nov 8, 2023

cloud-fan commented Nov 24, 2023

MaxGekk commented Nov 25, 2023

cloud-fan commented Nov 26, 2023

MaxGekk commented Nov 26, 2023

MaxGekk commented Nov 26, 2023

beliefer left a comment

[SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context #43695

[SPARK-45826][SQL] Add a SQL config for stack traces in DataFrame query context #43695

Conversation

MaxGekk commented Nov 7, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

cloud-fan Nov 8, 2023

Choose a reason for hiding this comment

MaxGekk Nov 8, 2023

Choose a reason for hiding this comment

beliefer Nov 9, 2023

Choose a reason for hiding this comment

MaxGekk Nov 9, 2023

Choose a reason for hiding this comment

beliefer Nov 9, 2023

Choose a reason for hiding this comment

cloud-fan commented Nov 8, 2023

cloud-fan commented Nov 24, 2023

MaxGekk commented Nov 25, 2023

cloud-fan commented Nov 26, 2023

MaxGekk commented Nov 26, 2023

MaxGekk commented Nov 26, 2023

beliefer left a comment

Choose a reason for hiding this comment

MaxGekk commented Nov 7, 2023 •

edited

Loading