Skip to content

[AURON #1471] Add the conversion execution list in the SparkUI#2150

Open
guixiaowen wants to merge 4 commits intoapache:masterfrom
guixiaowen:addConvertListSparkUINew
Open

[AURON #1471] Add the conversion execution list in the SparkUI#2150
guixiaowen wants to merge 4 commits intoapache:masterfrom
guixiaowen:addConvertListSparkUINew

Conversation

@guixiaowen
Copy link
Copy Markdown
Contributor

@guixiaowen guixiaowen commented Apr 1, 2026

Which issue does this PR close?

Closes #1471

Rationale for this change

Add the conversion execution list.
Display the physical execution plan of the SQL, show the version that has been converted to the native execution plan, and also present the reasons for stages that were not converted, helping users visualize the overall transformation process more intuitively.

What changes are included in this PR?

Add the conversion execution list. The core display functions of this list are described as follows:

image Add the conversion execution list. The core display functions of this list are described as follows:

Feature 1: Show the number of queries;
Feature 2: Execution plans that were not converted to native and the reasons for the non-conversion;
Feature 3: Original physical execution plans;
Feature 4: The number of nodes converted to Aurora nodes;
Feature 5: The number of nodes that were not converted.
Another example is to display the UI information through the Spark History Server after running the explain command.

截屏2026-04-02 01 05 32

Are there any user-facing changes?

How was this patch tested?

UT

@guixiaowen guixiaowen changed the title [AURON #1471] Follow #1409 Add the conversion execution list in the S… [AURON #1471] Add the conversion execution list in the SparkUI Apr 1, 2026
@cxzl25 cxzl25 requested a review from Copilot April 2, 2026 02:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Spark UI support for visualizing Auron native-conversion results per SQL execution (physical plan, converted vs fallback nodes, and fallback reasons), backed by new listener events and KVStore data.

Changes:

  • Emit a new AuronPlanFallbackEvent during post-columnar transitions and persist it in the UI KVStore.
  • Introduce AuronExplainUtils to render physical plans with operator IDs and collect fallback reasons.
  • Add a new “Queries” (executions) table to the Auron Spark UI page.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronSparkSessionExtension.scala Posts per-execution fallback/conversion event for the UI.
spark-extension/src/main/scala/org/apache/spark/sql/auron/AuronExplainUtils.scala Generates operator IDs, formats plans, and collects fallback reasons/counts.
spark-extension-shims-spark/src/test/scala/org/apache/spark/sql/execution/BuildInfoInSparkUISuite.scala Adds a new test related to Spark UI conversion behavior (currently not asserting UI state).
auron-spark-ui/src/main/scala/org/apache/spark/sql/execution/ui/AuronSQLAppStatusStore.scala Adds KVStore accessors for execution UI data and defines AuronSQLExecutionUIData.
auron-spark-ui/src/main/scala/org/apache/spark/sql/execution/ui/AuronSQLAppStatusListener.scala Handles SQL start/end + fallback events, writes execution UI data, and implements retention cleanup.
auron-spark-ui/src/main/scala/org/apache/spark/sql/execution/ui/AuronAllExecutionsPage.scala Renders the new executions list table and details expansion in the Auron tab.
auron-spark-ui/src/main/scala/org/apache/auron/spark/ui/AuronEvent.scala Introduces the new listener event AuronPlanFallbackEvent.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +125 to +145
def collect(tmp: QueryPlan[_]): Unit = {
tmp.foreachUp {
case p: ExecutedCommandExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
case p: AdaptiveSparkPlanExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
collect(p.executedPlan)
case p: QueryStageExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
collect(p.plan)
case p: NativeSupports =>
numAuronNodes += 1
p.innerChildren.foreach(collect)
case p: SparkPlan =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
p.innerChildren.foreach(collect)
case _ =>
}
}

collect(plan)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectFallbackNodes traverses the plan multiple times because it uses foreachUp (which already walks the subtree) and also recursively calls collect(...) on children/subplans inside the match arms. This will over-count numAuronNodes and can duplicate fallback collection work (and can degrade significantly on large plans). Refactor to do a single traversal (e.g., one foreachUp/foreach walk) and handle adaptive/query-stage subplans without re-traversing already-visited nodes (use a visited identity set if needed).

Suggested change
def collect(tmp: QueryPlan[_]): Unit = {
tmp.foreachUp {
case p: ExecutedCommandExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
case p: AdaptiveSparkPlanExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
collect(p.executedPlan)
case p: QueryStageExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
collect(p.plan)
case p: NativeSupports =>
numAuronNodes += 1
p.innerChildren.foreach(collect)
case p: SparkPlan =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
p.innerChildren.foreach(collect)
case _ =>
}
}
collect(plan)
// Use an identity-based visited set to avoid re-traversing shared subplans
val visited =
newSetFromMap[QueryPlan[_]](
new java.util.IdentityHashMap[QueryPlan[_], java.lang.Boolean]()
)
def traverse(root: QueryPlan[_]): Unit = {
// Skip traversal if we've already seen this plan instance
if (!visited.add(root)) {
return
}
root.foreachUp {
case p: ExecutedCommandExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
case p: AdaptiveSparkPlanExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
// Traverse the executed plan as a separate root, guarding with visited set
traverse(p.executedPlan)
case p: QueryStageExec =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
// Traverse the underlying stage plan as a separate root, guarding with visited set
traverse(p.plan)
case p: NativeSupports =>
numAuronNodes += 1
case p: SparkPlan =>
handleVanillaSparkPlan(p, fallbackNodeToReason)
case _ =>
}
}
traverse(plan)

Copilot uses AI. Check for mistakes.
*/
package org.apache.spark.sql.execution.ui

import scala.jdk.CollectionConverters.asScalaIteratorConverter
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala.jdk.CollectionConverters is Scala 2.13+ only. This repo appears to support Spark 3.x as well (typically Scala 2.12), so this import can break compilation for Spark 3 builds. Prefer scala.collection.JavaConverters._ / JavaConverters.asScalaIteratorConverter in Scala 2.12, or introduce a small compatibility shim (or Spark-version-specific source) to select the right converters per build target.

Suggested change
import scala.jdk.CollectionConverters.asScalaIteratorConverter
import scala.collection.JavaConverters._

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +45
UIUtils.headerSparkPage(
request,
"Auron",
buildInfoSummary(sqlStore.buildInfo()) ++
buildExecutionsListSummary(sqlStore.executionsList(), request),
parent)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page currently calls sqlStore.executionsList() and passes the full result into the table, and the datasource later sorts that full in-memory sequence. For applications with many SQL executions this can make the Auron tab slow and memory-heavy. Prefer a KVStore-backed paged datasource: use executionsCount() for the total, and fetch only the requested page via executionsList(offset, length) (and ideally push sorting down to the KVStore view when possible), mirroring Spark’s own UI pattern.

Copilot uses AI. Check for mistakes.
Comment on lines +387 to +414
private def descriptionCell(execution: AuronSQLExecutionUIData): Seq[Node] = {
val details = if (execution.description != null && execution.description.nonEmpty) {
val concat = new PlanStringConcat()
concat.append("== Fallback Summary ==\n")
val fallbackSummary = execution.fallbackNodeToReason
.map { case (name, reason) =>
val id = name.substring(0, 3)
val nodeName = name.substring(4)
s"(${id.toInt}) $nodeName: $reason"
}
.mkString("\n")
concat.append(fallbackSummary)
if (execution.fallbackNodeToReason.isEmpty) {
concat.append("No fallback nodes")
}
concat.append("\n\n")
concat.append(execution.fallbackDescription)

<span onclick="this.parentNode.querySelector('.stage-details').classList.toggle('collapsed')"
class="expand-details">
+details
</span> ++
<div class="stage-details collapsed">
<pre>{concat.toString()}</pre>
</div>
} else {
Nil
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “+details” expansion is currently hidden whenever execution.description is empty/null, even though fallback details (reasons/plan text) may still exist and would be useful. Gate details on the presence of fallback/plan content (e.g., fallbackDescription / fallbackNodeToReason.nonEmpty) instead of description.

Copilot uses AI. Check for mistakes.
Comment on lines +457 to +463
val ordering: Ordering[AuronExecutionTableRowData] = sortColumn match {
case "ID" => Ordering.by(_.executionUIData.executionId)
case "Description" => Ordering.by(_.executionUIData.fallbackDescription)
case "Num Auron Nodes" => Ordering.by(_.executionUIData.numAuronNodes)
case "Num Fallback Nodes" => Ordering.by(_.executionUIData.numFallbackNodes)
case unknownColumn => throw new IllegalArgumentException(s"Unknown column: $unknownColumn")
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting by the “Description” column uses fallbackDescription (physical plan text) rather than executionUIData.description. This makes sorting behavior inconsistent with the displayed column label/content. Update the ordering for "Description" to use the description field.

Copilot uses AI. Check for mistakes.
def getAuronParameterOtherTable(request: HttpServletRequest, tableTag: String): String = {
request.getParameterMap.asScala
.filterNot(_._1.startsWith(tableTag))
.map(parameter => parameter._1 + "=" + parameter._2(0))
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query parameter values are concatenated into URLs without URL-encoding. This can break navigation when values contain &, =, spaces, or other reserved characters. Encode both keys and values (or use a Spark UI helper if available) when rebuilding the query string.

Suggested change
.map(parameter => parameter._1 + "=" + parameter._2(0))
.map { case (key, values) =>
val encodedKey = URLEncoder.encode(key, UTF_8.name())
val value = if (values != null && values.nonEmpty) values(0) else ""
val encodedValue = URLEncoder.encode(value, UTF_8.name())
s"$encodedKey=$encodedValue"
}

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +59
test("test convert table in spark UI ") {
withTable("t1") {
sql(
"create table t1 using parquet PARTITIONED BY (part) as select 1 as c1, 2 as c2, 'test test' as part")
val df = sql("select * from t1")
checkAnswer(df, Seq(Row(1, 2, "test test")))
}
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test exercises table creation/querying but does not assert the new Spark UI functionality (that AuronPlanFallbackEvent is posted and persisted as AuronSQLExecutionUIData, and that counts/reasons/plan text are stored). Add assertions against the AuronSQLAppStatusListener/KVStore (or AuronSQLAppStatusStore) to verify at least one execution record is written with expected fields (executionId, num nodes, fallback map/plan description).

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +127
if (SparkEnv.get.conf
.get(SparkAuronConfiguration.UI_ENABLED.key, "true")
.equals("true")) {
val sc = sparkSession.sparkContext
val executionId = sc.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
if (executionId == null) {
logDebug(s"Unknown execution id for plan: $sparkPlan")
return sparkPlan
}
val concat = new PlanStringConcat()
concat.append("== Physical Plan ==\n")

val (numAuronNodes, fallbackNodeToReason) =
AuronExplainUtils.processPlan(sparkPlan, concat.append)

val event = AuronPlanFallbackEvent(
executionId.toLong,
numAuronNodes,
fallbackNodeToReason.size,
concat.toString(),
fallbackNodeToReason)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues here can cause unexpected failures/behavior: (1) UI enablement is checked via a case-sensitive string comparison; prefer a boolean conf read (or a safer parse) so values like TRUE / true behave consistently. (2) executionId.toLong can throw if the local property is non-numeric; guard the parse and skip posting the event (with debug logging) if it cannot be parsed.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants