[SPARK-48397][SQL] Add data write time metric to FileFormatDataWriter #46714
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
For FileFormatDataWriter we currently record metrics of "task commit time" and "job commit time" in
org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker#metrics:We may also record the time spent on "data write" (together with the time spent on producing records from the iterator), which is usually one of the major parts of the total duration of a writing operation.
Why are the changes needed?
We find that the write duration is very helpful for us to identify the bottleneck and time skew during the data write, and it also helps on the generic performance tuning.
Does this PR introduce any user-facing change?
Yes, in the SQL page of the Spark History Server (and live UI), a new "data write time" metric is shown on the data write command/operation nodes. For example, a

InsertIntoHadoopFsRelationCommandnode with the newly addeddata write timemetric:How was this patch tested?
Unit test case and manual tests.
Was this patch authored or co-authored using generative AI tooling?
No