Spark 4.1: native_datafusion bytesRead task metric off by 6-14x vs Spark

Sub-issue of #4098.

## Description

Three tests fail in \`Spark 4.1, JDK 17/auto [exec]\` (\`CometTaskMetricsSuite\`):

- \`native_datafusion scan reports task-level input metrics matching Spark\`
- \`input metrics aggregate across multiple native scans in a join\`
- \`input metrics aggregate across multiple native scans in a union\`

Symptom (one example):

\`\`\`
9.6 was greater than or equal to 0.7, but 9.6 was not less than or equal to 1.3
bytesRead ratio out of range: comet=90498, spark=9427, ratio=9.6
\`\`\`

Two more failures with similar 6.4 and 13.9 ratios.

## Suspected root cause

Spark 4.1 changed what \`inputMetrics.bytesRead\` accounts for, most likely now reports a smaller subset (e.g. only bytes actually read into row buffers, versus full Parquet footer plus row group). Compare \`ParquetFileReader\` / \`PartitionedFile\` accounting between 4.0 and 4.1 and adjust Comet's metric source accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 4.1: native_datafusion bytesRead task metric off by 6-14x vs Spark #4194

Description

Suspected root cause

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Spark 4.1: native_datafusion bytesRead task metric off by 6-14x vs Spark #4194

Description

Description

Suspected root cause

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions