[WIP][SPARK-56551][SQL] Add operation metrics for metadata-only DELETE queries in DSv2#55430
[WIP][SPARK-56551][SQL] Add operation metrics for metadata-only DELETE queries in DSv2#55430ZiyaZa wants to merge 23 commits intoapache:masterfrom
Conversation
…e operation column
# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteRowLevelCommand.scala # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteUpdateTable.scala # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RowDeltaUtils.scala # sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala # sql/core/src/test/scala/org/apache/spark/sql/connector/DeltaBasedNoMetadataUpdateTableSuite.scala
| * Returns an array of supported custom metrics with name and description. | ||
| * By default it returns empty array. | ||
| */ | ||
| default CustomMetric[] supportedCustomMetrics() { |
There was a problem hiding this comment.
Are these the metrics that will be routed to output of the commands?
There was a problem hiding this comment.
Not to the output, just to plan node metrics. I took a similar approach to what we already had in couple of places (hence the refactored code in this PR). This allows connectors to expose any metrics that will be visible in the plan node.
As for the output of the command, Spark will only use the metrics it understands and expects. In this case, we have only numDeletedRows metric to be exposed as command output, and for this connectors need to use NumDeletedRowsMetric.
| * @since 4.2.0 | ||
| */ | ||
| @Evolving | ||
| public class NumDeletedRowsMetric extends CustomSumMetric { |
There was a problem hiding this comment.
I am not sure I fully understand this.
I thought the goal was to allow connectors to expose a set of metrics (whatever those may be) as output without defining any specific custom metrics on the Spark side?
There was a problem hiding this comment.
Connectors can still expose a set of metrics, this is just one that Spark understands and will expose as command output. Spark needs to specify the name of the metric somehow for all connectors to use, otherwise each connector can come up with it's own naming scheme and Spark wouldn't know what to look for.
Here we could just have a String constant somewhere to store the expected metric name numDeletedRows. Instead I went with this class definition because it seems all connectors will want to use some class like this. But if I misunderstood, I can replace it with just a String constant.
There was a problem hiding this comment.
hm. yea i also wondering, does this particular metric need to flow to Spark?
connector knows how many rows were removed by metadata-only delete and can just persist it in the summary? Is it needed in spark somewhere
There was a problem hiding this comment.
Spark will need to send this value to the command output when we implement that functionality.
What changes were proposed in this pull request?
Added
numDeletedRowsmetric for metadata-only DELETEs.Why are the changes needed?
For better visibility into what happened as a result of an DELETE query.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Added metric value validation to most DELETE unit tests.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7