chore: Run Spark SQL tests with `native_datafusion` in CI [WIP] by andygrove · Pull Request #3393 · apache/datafusion-comet

andygrove · 2026-02-04T15:19:22Z

Which issue does this PR close?

N/A - This PR enables running Spark SQL tests with native_datafusion scan in CI.

Rationale for this change

Running Spark SQL tests with native_datafusion scan helps ensure compatibility and catch regressions. This PR enables these tests in CI while ignoring known failing tests that are tracked in separate issues.

What changes are included in this PR?

CI workflow changes: Added native_datafusion scan mode to the Spark SQL test matrix
Test annotations: Added IgnoreCometNativeDataFusion annotations for failing tests, linked to tracking issues:

Issue	Category	Tests
#3311	Schema mismatch / type coercion	ParquetQuerySuite, ParquetIOSuite, ParquetSchemaSuite, ParquetFilterSuite, FileBasedDataSourceSuite
#3312	input_file_name() not supported	UDFSuite, ExtractPythonUDFsSuite
#3313	Static scan metrics	DynamicPartitionPruningSuite
#3315	Parquet V2 / streaming sources	FileDataSourceV2FallBackSuite, StreamingQuerySuite
#3317	Row index metadata	ParquetFileMetadataStructRowIndexSuite
#3319	Bucketed scan	BucketedReadSuite, DisableUnnecessaryBucketedScanSuite
#3320	Predicate pushdown	ParquetFilterSuite

How are these changes tested?

The changes are tested by the CI workflow itself - tests should pass with the known failures ignored.

…sion tests Added annotations for the following tests that fail with native_datafusion scan: DynamicPartitionPruningSuite: - static scan metrics → apache#3313 ParquetQuerySuite, ParquetIOSuite, ParquetSchemaSuite, ParquetFilterSuite: - SPARK-36182: can't read TimestampLTZ as TimestampNTZ → apache#3311 - SPARK-34212 Parquet should read decimals correctly → apache#3311 - row group skipping doesn't overflow when reading into larger type → apache#3311 - SPARK-35640 tests → apache#3311 - schema mismatch failure error message tests → apache#3311 - SPARK-25207: duplicate fields case-insensitive → apache#3311 - SPARK-31026: fields with dots in names → apache#3320 - Filters should be pushed down at row group level → apache#3320 FileBasedDataSourceSuite: - Spark native readers should respect spark.sql.caseSensitive → apache#3311 BucketedReadSuite, DisableUnnecessaryBucketedScanSuite: - disable bucketing when output doesn't contain all bucketing columns → apache#3319 - bucket coalescing tests → apache#3319 - SPARK-32859: disable unnecessary bucketed table scan tests → apache#3319 - Aggregates with no groupby over tables having 1 BUCKET → apache#3319 ParquetFileMetadataStructRowIndexSuite: - reading _tmp_metadata_row_index tests → apache#3317 FileDataSourceV2FallBackSuite: - Fallback Parquet V2 to V1 → apache#3315 UDFSuite: - SPARK-8005 input_file_name → apache#3312 ExtractPythonUDFsSuite: - Python UDF should not break column pruning/filter pushdown -- Parquet V1 → apache#3312 StreamingQuerySuite: - SPARK-41198: input row calculation with CTE → apache#3315 - SPARK-41199: input row calculation with mixed DSv1 and DSv2 sources → apache#3315 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Added the import statement to test files that were missing it: - FileDataSourceV2FallBackSuite.scala - ParquetFileMetadataStructRowIndexSuite.scala - ExtractPythonUDFsSuite.scala - DisableUnnecessaryBucketedScanSuite.scala - StreamingQuerySuite.scala - UDFSuite.scala Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The method signature in IgnoreComet.scala was not properly formatted according to scalafmt rules. This fixes the formatting to match Spark's scalafmt configuration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Set NOLINT_ON_COMPILE=true to skip scalastyle validation during SBT compilation, reducing CI time for Spark SQL test runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

andygrove and others added 8 commits February 4, 2026 08:18

Run Spark SQL tests with native_datafusion in CI

9edac45

fix: apply scalafmt formatting to IgnoreComet.scala in diff

9a72f94

The method signature in IgnoreComet.scala was not properly formatted according to scalafmt rules. This fixes the formatting to match Spark's scalafmt configuration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

upmerge

3c05776

fix

b113140

ci: disable scalastyle checks in Spark SQL tests

06ab9f4

Set NOLINT_ON_COMPILE=true to skip scalastyle validation during SBT compilation, reducing CI time for Spark SQL test runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix

1106b60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Run Spark SQL tests with `native_datafusion` in CI [WIP]#3393

chore: Run Spark SQL tests with `native_datafusion` in CI [WIP]#3393
andygrove wants to merge 8 commits intoapache:mainfrom
andygrove:spark-sql-native-datafusion

andygrove commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented Feb 4, 2026 •

edited

Loading