Skip to content

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

This PR adds support for native_iceberg_compat scan implementation with V2 data sources (BatchScanExec with ParquetScan).

Rationale for this change

Previously, V2 Parquet scans always used the legacy BatchReader (the native_comet approach) regardless of the COMET_NATIVE_SCAN_IMPL setting. This was inconsistent with V1 scans which respect the scan impl configuration.

This change enables V2 scans to use the DataFusion-based NativeBatchReader when native_iceberg_compat or auto is specified, which is important for deprecating the legacy mutable buffer code.

What changes are included in this PR?

New Files

  • CometNativeParquetPartitionReaderFactory: A V2 partition reader factory that uses NativeBatchReader (DataFusion-based Parquet reader)
  • CometNativeParquetScan: A V2 scan trait that creates the new reader factory

Behavior

  • V2 scans with auto or native_iceberg_compat: Use CometNativeParquetScan (DataFusion-based reader)
  • V2 scans with native_comet: Use existing CometParquetScan (legacy JNI-based BatchReader)
  • V2 scans with native_datafusion: Fall back to Spark (not yet supported for V2)

How are these changes tested?

  • Updated CometScanRuleSuite with tests for V2 scan behavior
  • Updated ParquetReadV2Suite to verify V2 scans work with the new implementation
  • All existing tests pass

This PR adds support for native_iceberg_compat scan implementation with V2 data sources (BatchScanExec with ParquetScan).

## Changes

### New Files
- `CometNativeParquetPartitionReaderFactory`: A V2 partition reader factory that uses NativeBatchReader (DataFusion-based Parquet reader)
- `CometNativeParquetScan`: A V2 scan trait that creates the new reader factory

### Behavior
- V2 scans with `auto` or `native_iceberg_compat`: Use CometNativeParquetScan (DataFusion-based reader)
- V2 scans with `native_comet`: Use existing CometParquetScan (legacy JNI-based BatchReader)
- V2 scans with `native_datafusion`: Fall back to Spark (not yet supported for V2)

This is consistent with the goal of deprecating the legacy BatchReader code - V2 scans can now use the DataFusion-based reader when using the recommended scan implementations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov-commenter
Copy link

codecov-commenter commented Jan 25, 2026

Codecov Report

❌ Patch coverage is 77.27273% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.32%. Comparing base (f09f8af) to head (b198701).
⚠️ Report is 884 commits behind head on main.

Files with missing lines Patch % Lines
...n/scala/org/apache/comet/rules/CometScanRule.scala 59.01% 11 Missing and 14 partials ⚠️
...uet/CometNativeParquetPartitionReaderFactory.scala 87.34% 6 Missing and 4 partials ⚠️
.../apache/comet/parquet/CometNativeParquetScan.scala 86.11% 1 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3272      +/-   ##
============================================
+ Coverage     56.12%   60.32%   +4.19%     
- Complexity      976     1490     +514     
============================================
  Files           119      177      +58     
  Lines         11743    16246    +4503     
  Branches       2251     2682     +431     
============================================
+ Hits           6591     9800    +3209     
- Misses         4012     5082    +1070     
- Partials       1140     1364     +224     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title [WIP] feat: Add native_iceberg_compat support for V2 Parquet scans feat: Add native_iceberg_compat support for V2 Parquet scans [WIP] Jan 25, 2026
…scan impls

The test was failing when the default scan impl was set to native_datafusion
because V2 scans fall back to Spark in that case. This fix ensures the test
explicitly sets SCAN_AUTO to test the V2 scan replacement behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove changed the title feat: Add native_iceberg_compat support for V2 Parquet scans [WIP] feat: Add native_iceberg_compat support for V2 Parquet scans Jan 25, 2026
@andygrove andygrove marked this pull request as ready for review January 25, 2026 18:14
@andygrove andygrove changed the title feat: Add native_iceberg_compat support for V2 Parquet scans feat: Add V2 scan support for native_iceberg_compat Jan 26, 2026
andygrove and others added 2 commits January 25, 2026 17:31
Add ParquetReadV2NativeIcebergCompatSuite that runs all ParquetReadSuite
tests (278+ tests) with V2 scans and explicit native_iceberg_compat setting.

Note: ParquetReadV2Suite already provides implicit coverage via SCAN_AUTO,
but this suite explicitly sets SCAN_NATIVE_ICEBERG_COMPAT for clarity and
consistency with ParquetReadV2NativeDataFusionSuite.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants