Skip to content

[SPARK-56587][SQL] Show table names for V2 write nodes in UI#55510

Closed
wangyum wants to merge 2 commits into
apache:masterfrom
wangyum:SPARK-56587
Closed

[SPARK-56587][SQL] Show table names for V2 write nodes in UI#55510
wangyum wants to merge 2 commits into
apache:masterfrom
wangyum:SPARK-56587

Conversation

@wangyum
Copy link
Copy Markdown
Member

@wangyum wangyum commented Apr 23, 2026

What changes were proposed in this pull request?

This PR modifies the physical execution nodes for DataSourceV2 write operations (such as AppendDataExec, OverwriteByExpressionExec, OverwritePartitionsDynamicExec, ReplaceDataExec, and WriteDeltaExec) to accept and store the destination tableName. It then updates the nodeName property in the base V2ExistingTableWriteExec trait to include this table name in its output.

Why are the changes needed?

To improve observability and debuggability in the Spark SQL UI and Explain plans. Previously, V2 write nodes were displayed generically (e.g., AppendData). With this change, the UI will explicitly show the context of the write operation (e.g., AppendData catalog.namespace.table_name), making it much easier for users to understand which tables are being modified.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test and manual test:
image

Was this patch authored or co-authored using generative AI tooling?

No.

wangyum added 2 commits April 23, 2026 16:31
Pass relation names into V2 write execs and render them from V2ExistingTableWriteExec.nodeName so Append, Overwrite, dynamic overwrite, ReplaceData, and WriteDelta all display table context consistently in SQL UI.
write: Write,
rowLevelCommand: RowLevelOperation.Command) extends RowLevelWriteExec {
rowLevelCommand: RowLevelOperation.Command,
tableName: String) extends RowLevelWriteExec {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should define a method in o.a.s.sql.connector.write.Write as a contract to allow the connector to report the table name or something else they want to display as part of the node name, this way, we can display more things, i.e., ReplaceData iceberg cat.ns.t (CoW)

@aokolnychyi do you have any suggestions?

Copy link
Copy Markdown
Member

@pan3793 pan3793 Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a similar api is required for batch scan, for built-in file-based tables, the node name is Scan parquet cat.db.t, while format (e.g., iceberg) is missing in the batch scan node name

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @wangyum and @pan3793 .

I'll merge this for the Apache Spark 4.2.0.

@dongjoon-hyun
Copy link
Copy Markdown
Member

cc @yaooqinn for SPARK-55760 Spark Web UI Modernization, too. I added this as a subtask of SPARK-55760.

@yaooqinn
Copy link
Copy Markdown
Member

yaooqinn commented May 6, 2026

Late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants