Skip to content

[SPARK-56680][SQL] DSv2 INSERT and Insert-Only MERGE Metrics#55586

Open
ZiyaZa wants to merge 10 commits intoapache:masterfrom
ZiyaZa:insert-only-merge-metrics
Open

[SPARK-56680][SQL] DSv2 INSERT and Insert-Only MERGE Metrics#55586
ZiyaZa wants to merge 10 commits intoapache:masterfrom
ZiyaZa:insert-only-merge-metrics

Conversation

@ZiyaZa
Copy link
Copy Markdown
Contributor

@ZiyaZa ZiyaZa commented Apr 28, 2026

What changes were proposed in this pull request?

MERGE and INSERT / write metrics for DSv2.

This PR only handles pure-inserts. Overwrites will be handled in a future PR, as those require some values from the connector.

Why are the changes needed?

For better visibility into what happened as a result of DML queries.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added metric verification to existing tests.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

ZiyaZa added 8 commits April 28, 2026 09:06
# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
#	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
@ZiyaZa ZiyaZa changed the title [WIP][SQL] DSv2 INSERT and Insert-Only MERGE Metrics [SPARK-56680][SQL] DSv2 INSERT and Insert-Only MERGE Metrics Apr 30, 2026
@ZiyaZa ZiyaZa marked this pull request as ready for review April 30, 2026 17:33
ZiyaZa added 2 commits May 1, 2026 15:48
# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala
Copy link
Copy Markdown
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions but looks good otherwise.

/**
* Returns the number of output rows, or -1 if not found.
*/
long numOutputRows();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would calling this numInsertedRows be more consistent with the rest of summaries?

write: Option[Write] = None,
analyzedQuery: Option[LogicalPlan] = None) extends V2WriteCommand with TransactionalWrite {
analyzedQuery: Option[LogicalPlan] = None,
rowLevelCommand: Option[RowLevelOperation.Command] = None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about adding a new node like InsertOnlyMerge and InsertOnlyMergeExec instead of making AppendData that is heavily used aware of the row-level operation?

I vibecoded it locally and it is fairly easy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly here. More like 60/40 to have a new node, to be honest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants