Skip to content
Permalink
Browse files

[SPARK-25674][FOLLOW-UP] Update the stats for each ColumnarBatch

This PR is a follow-up of #22594 . This alternative can avoid the unneeded computation in the hot code path.

- For row-based scan, we keep the original way.
- For the columnar scan, we just need to update the stats after each batch.

N/A

Closes #22731 from gatorsmile/udpateStatsFileScanRDD.

Authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 4cee191)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
  • Loading branch information...
gatorsmile authored and srowen committed Oct 16, 2018
1 parent d87896b commit 0726bc56fce83c3ec30cfbb6c12dfcd68a85cd0f
@@ -85,7 +85,7 @@ class FileScanRDD(
// If we do a coalesce, however, we are likely to compute multiple partitions in the same
// task and in the same thread, in which case we need to avoid override values written by
// previous partitions (SPARK-13071).
private def updateBytesRead(): Unit = {
private def incTaskInputMetricsBytesRead(): Unit = {
inputMetrics.setBytesRead(existingBytesRead + getBytesReadCallback())
}

@@ -114,15 +114,16 @@ class FileScanRDD(
// don't need to run this `if` for every record.
val preNumRecordsRead = inputMetrics.recordsRead
if (nextElement.isInstanceOf[ColumnarBatch]) {
incTaskInputMetricsBytesRead()
inputMetrics.incRecordsRead(nextElement.asInstanceOf[ColumnarBatch].numRows())
} else {
// too costly to update every record
if (inputMetrics.recordsRead %
SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS == 0) {
incTaskInputMetricsBytesRead()
}
inputMetrics.incRecordsRead(1)
}
// The records may be incremented by more than 1 at a time.
if (preNumRecordsRead / SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS !=
inputMetrics.recordsRead / SparkHadoopUtil.UPDATE_INPUT_METRICS_INTERVAL_RECORDS) {
updateBytesRead()
}
nextElement
}

@@ -210,7 +211,7 @@ class FileScanRDD(
}

override def close(): Unit = {
updateBytesRead()
incTaskInputMetricsBytesRead()
updateBytesReadWithFileSize()
InputFileBlockHolder.unset()
}

0 comments on commit 0726bc5

Please sign in to comment.
You can’t perform that action at this time.