[SPARK-26327][SQL][FOLLOW-UP] Refactor the code and restore the metrics name#23328
[SPARK-26327][SQL][FOLLOW-UP] Refactor the code and restore the metrics name#23328gatorsmile wants to merge 3 commits intoapache:masterfrom
Conversation
| } | ||
|
|
||
| /** | ||
| * Send the updated metrics to driver, while this function calling, selectedPartitions has |
| Map("numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"), | ||
| "numFiles" -> SQLMetrics.createMetric(sparkContext, "number of files"), | ||
| "fileListingTime" -> SQLMetrics.createMetric(sparkContext, "file listing time (ms)"), | ||
| "metadataTime" -> SQLMetrics.createMetric(sparkContext, "metadata time"), |
There was a problem hiding this comment.
the original name is more straightfoward to end users who has no idea about file listing.
There was a problem hiding this comment.
I'm not sure we will add more metadata operation and reuse this metrics. Anyway it was not a good idea to do the renaming in a bug fix PR. @xuanyuanking can you create a ticket and send a new PR for renaming?
There was a problem hiding this comment.
Copy that, original thinking is this bug fix is part of https://issues.apache.org/jira/browse/SPARK-26222.
| case _ => | ||
| createNonBucketedReadRDD(readFile, selectedPartitions, relation) | ||
| } | ||
| sendDriverMetrics() |
There was a problem hiding this comment.
Line 313 and line 315, both are calling selectedPartitions. Thus, it is safer to say selectedPartitions is initialized before we send the driver-side metrics
|
LGTM |
|
Test build #100208 has finished for PR 23328 at commit
|
|
Test build #100209 has finished for PR 23328 at commit
|
| val optimizerMetadataTimeNs = relation.location.metadataOpsTimeNs.getOrElse(0L) | ||
| val startTime = System.nanoTime() | ||
| val ret = relation.location.listFiles(partitionFilters, dataFilters) | ||
| driverMetrics("filesNum") = ret.map(_.files.size.toLong).sum |
|
Test build #100217 has finished for PR 23328 at commit
|
|
Thanks! Merged to master. |
…cs name ## What changes were proposed in this pull request? - The original comment about `updateDriverMetrics` is not right. - Refactor the code to ensure `selectedPartitions ` has been set before sending the driver-side metrics. - Restore the original name, which is more general and extendable. ## How was this patch tested? The existing tests. Closes apache#23328 from gatorsmile/followupSpark-26142. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
…cs name ## What changes were proposed in this pull request? - The original comment about `updateDriverMetrics` is not right. - Refactor the code to ensure `selectedPartitions ` has been set before sending the driver-side metrics. - Restore the original name, which is more general and extendable. ## How was this patch tested? The existing tests. Closes apache#23328 from gatorsmile/followupSpark-26142. Authored-by: gatorsmile <gatorsmile@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
updateDriverMetricsis not right.selectedPartitionshas been set before sending the driver-side metrics.How was this patch tested?
The existing tests.