Skip to content

[WIP][SPARK-56551][SQL] Add operation metrics for metadata-only DELETE queries in DSv2#55430

Draft
ZiyaZa wants to merge 23 commits intoapache:masterfrom
ZiyaZa:dsv2-metadata-delete-metrics
Draft

[WIP][SPARK-56551][SQL] Add operation metrics for metadata-only DELETE queries in DSv2#55430
ZiyaZa wants to merge 23 commits intoapache:masterfrom
ZiyaZa:dsv2-metadata-delete-metrics

Conversation

@ZiyaZa
Copy link
Copy Markdown
Contributor

@ZiyaZa ZiyaZa commented Apr 20, 2026

What changes were proposed in this pull request?

Added numDeletedRows metric for metadata-only DELETEs.

Why are the changes needed?

For better visibility into what happened as a result of an DELETE query.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Added metric value validation to most DELETE unit tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

ZiyaZa added 23 commits April 1, 2026 17:29
# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteUpdateTable.scala
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RowDeltaUtils.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
#	sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedNoMetadataUpdateTableSuite.scala
* Returns an array of supported custom metrics with name and description.
* By default it returns empty array.
*/
default CustomMetric[] supportedCustomMetrics() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the metrics that will be routed to output of the commands?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to the output, just to plan node metrics. I took a similar approach to what we already had in couple of places (hence the refactored code in this PR). This allows connectors to expose any metrics that will be visible in the plan node.

As for the output of the command, Spark will only use the metrics it understands and expects. In this case, we have only numDeletedRows metric to be exposed as command output, and for this connectors need to use NumDeletedRowsMetric.

* @since 4.2.0
*/
@Evolving
public class NumDeletedRowsMetric extends CustomSumMetric {
Copy link
Copy Markdown
Contributor

@aokolnychyi aokolnychyi Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I fully understand this.

I thought the goal was to allow connectors to expose a set of metrics (whatever those may be) as output without defining any specific custom metrics on the Spark side?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connectors can still expose a set of metrics, this is just one that Spark understands and will expose as command output. Spark needs to specify the name of the metric somehow for all connectors to use, otherwise each connector can come up with it's own naming scheme and Spark wouldn't know what to look for.

Here we could just have a String constant somewhere to store the expected metric name numDeletedRows. Instead I went with this class definition because it seems all connectors will want to use some class like this. But if I misunderstood, I can replace it with just a String constant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm. yea i also wondering, does this particular metric need to flow to Spark?

connector knows how many rows were removed by metadata-only delete and can just persist it in the summary? Is it needed in spark somewhere

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark will need to send this value to the command output when we implement that functionality.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants