[GLUTEN-10215][VL] Delta write: Native statistics tracker to eliminate C2R overhead#11419
Conversation
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
07d26c7 to
f7fe766
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
8cb9dab to
e23699c
Compare
|
Run Gluten Clickhouse CI on x86 |
3 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
73521e7 to
8d5ac02
Compare
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
bd9fce1 to
12d26ba
Compare
|
Run Gluten Clickhouse CI on x86 |
12d26ba to
05a72e8
Compare
|
Run Gluten Clickhouse CI on x86 |
05a72e8 to
1a28083
Compare
|
Run Gluten Clickhouse CI on x86 |
1a28083 to
fad9fdd
Compare
|
Run Gluten Clickhouse CI on x86 |
fad9fdd to
f9fca33
Compare
|
Run Gluten Clickhouse CI on x86 |
3 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
4182a6f to
9e0028f
Compare
|
Run Gluten Clickhouse CI on x86 |
9e0028f to
d129061
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
…R overhead The patch adds a native job statistics tracker for Delta write to eliminate C2R overhead. More PR description WIP.
3d26b84 to
d463e92
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
See if it's useful: Analysis of PR #11419 - Partition Key Generation IssuePR OverviewTitle: [GLUTEN-10215][VL] Delta write: Native statistics tracker to eliminate C2R overhead Purpose: Adds a native job statistics tracker for Delta write to eliminate Columnar-to-Row (C2R) conversion overhead by using Velox's native aggregation capabilities. Issue: Wrong Partition Key GenerationRoot Cause AnalysisThe PR introduces a new native statistics tracker ( override def newPartition(partitionValues: InternalRow): Unit = {}Location in patch: Line 633 Problem ExplanationBefore the PR:The fallback tracker properly delegates partition information: override def newPartition(partitionValues: InternalRow): Unit =
delegate.newPartition(partitionValues)After the PR:The native tracker ignores partition values: override def newPartition(partitionValues: InternalRow): Unit = {}Why This Causes Wrong Partition Keys
Impact
SolutionOption 1: Store and Use Partition Values (Recommended)private class GlutenDeltaTaskStatsNativeTracker(
delegate: WriteTaskStatsTracker,
dataCols: Seq[Attribute],
statsColExpr: Expression,
resultThreadRunner: ThreadPoolExecutor)
extends WriteTaskStatsTracker {
private val accumulators = mutable.Map[String, VeloxTaskStatsAccumulator]()
private val fileToPartition = mutable.Map[String, InternalRow]() // ADD THIS
private val evaluator = NativePlanEvaluator.create(
BackendsApiManager.getBackendName,
Map.empty[String, String].asJava)
override def newPartition(partitionValues: InternalRow): Unit = {
// Store current partition values for subsequent file operations
currentPartitionValues = partitionValues // ADD THIS
}
override def newFile(filePath: String): Unit = {
accumulators.getOrElseUpdate(
filePath,
new VeloxTaskStatsAccumulator(evaluator, resultThreadRunner, dataCols, statsColExpr)
)
// Associate file with its partition
if (currentPartitionValues != null) { // ADD THIS
fileToPartition(filePath) = currentPartitionValues.copy()
}
}
override def getFinalStats(taskCommitTime: Long): WriteTaskStats = {
// Use fileToPartition mapping when building statistics
// to ensure correct partition keys
// ... implementation needs to pass partition info to delegate
}
}Option 2: Delegate to Underlying TrackerIf the native tracker doesn't need to handle partitions directly: override def newPartition(partitionValues: InternalRow): Unit = {
delegate.newPartition(partitionValues)
}Option 3: Hybrid ApproachStore partition values AND delegate: private var currentPartitionValues: InternalRow = _
override def newPartition(partitionValues: InternalRow): Unit = {
currentPartitionValues = partitionValues
delegate.newPartition(partitionValues)
}Testing Recommendations
Related Code SectionsComparison with Other Trackers
ConclusionThe issue is in the Fix: Implement proper partition value handling in the native tracker, either by:
The fix should ensure partition values are correctly associated with files and propagated to the final statistics. |
|
Run Gluten Clickhouse CI on x86 |
@FelixYBW Just saw this. This was actually not the root cause, The problem was with |

Description
Currently, there is a C2R converter to get all rows from a columnar batch being written to calculate and gather Delta file statistics. This is inefficient given we expected all operations related to write can be offloaded to native.
The patch adds a native job statistics tracker for Delta write to eliminate this C2R overhead. This tracker is backed by an asynchronous barrier-enabled Velox aggregation task (for more information about Velox task barrier, see this), where all the data to write is globally aggregated into statistics rows, each of which is for one single written Parquet file.
Existing tests under directory
backends-velox/src-delta33/test/scala/org/apache/spark/sql/delta/testcan cover the change.Depends on #11405
Related issue: #10215
Also fixes #11514
Performance
The PR is benchmarked by writing TPC-DS SF10 tables in Delta format. Resource is 8 cores + 20 GiB RAM. Typical size of each written file is ~10 MiB.
Before this PR, Gluten's speedup on Delta write is observed at -2.22%.
After this PR, Gluten's speedup on Delta write is observed at 61.31%.
Detailed benchmark results per table are as follows: