Skip to content

Commit

Permalink
Correctly use initializeStats in DataSkippingStatsTracker
Browse files Browse the repository at this point in the history
Add missing call to `initializeStats` inside of `DataSkippingStatsTracker`.

Existing UTs.

## Followup
Backport this to branch-1.2, branch-2.0 in Delta Lake, too.

GitOrigin-RevId: b4e7c08d14d0e8ba6b9c6c7a12cf2e63990e0ad9
  • Loading branch information
scottsand-db authored and tdas committed Aug 11, 2022
1 parent 7344149 commit ed9ff6e
Showing 1 changed file with 4 additions and 1 deletion.
Expand Up @@ -131,14 +131,17 @@ class DeltaTaskStatisticsTracker(

override def newPartition(partitionValues: InternalRow): Unit = { }

protected def initializeAggBuf(buffer: SpecificInternalRow): InternalRow =
initializeStats.target(buffer).apply(EmptyRow)

override def newFile(newFilePath: String): Unit = {
submittedFiles.getOrElseUpdate(newFilePath, {
// `buffer` is a row that will start off by holding the initial values for the agg expressions
// (see the initializeStats: Projection), will then be updated in place every time a new row
// is processed (see updateStats: Projection), and will finally serve as an input for
// computing the per-file result of statsColExpr (see getStats: Projection)
val buffer = new SpecificInternalRow(aggBufferAttrs.map(_.dataType))
buffer
initializeAggBuf(buffer)
})
}

Expand Down

0 comments on commit ed9ff6e

Please sign in to comment.