chore: Run Spark SQL tests with native_datafusion in CI [WIP]#3393
Draft
andygrove wants to merge 8 commits intoapache:mainfrom
Draft
chore: Run Spark SQL tests with native_datafusion in CI [WIP]#3393andygrove wants to merge 8 commits intoapache:mainfrom
native_datafusion in CI [WIP]#3393andygrove wants to merge 8 commits intoapache:mainfrom
Conversation
…sion tests Added annotations for the following tests that fail with native_datafusion scan: DynamicPartitionPruningSuite: - static scan metrics → apache#3313 ParquetQuerySuite, ParquetIOSuite, ParquetSchemaSuite, ParquetFilterSuite: - SPARK-36182: can't read TimestampLTZ as TimestampNTZ → apache#3311 - SPARK-34212 Parquet should read decimals correctly → apache#3311 - row group skipping doesn't overflow when reading into larger type → apache#3311 - SPARK-35640 tests → apache#3311 - schema mismatch failure error message tests → apache#3311 - SPARK-25207: duplicate fields case-insensitive → apache#3311 - SPARK-31026: fields with dots in names → apache#3320 - Filters should be pushed down at row group level → apache#3320 FileBasedDataSourceSuite: - Spark native readers should respect spark.sql.caseSensitive → apache#3311 BucketedReadSuite, DisableUnnecessaryBucketedScanSuite: - disable bucketing when output doesn't contain all bucketing columns → apache#3319 - bucket coalescing tests → apache#3319 - SPARK-32859: disable unnecessary bucketed table scan tests → apache#3319 - Aggregates with no groupby over tables having 1 BUCKET → apache#3319 ParquetFileMetadataStructRowIndexSuite: - reading _tmp_metadata_row_index tests → apache#3317 FileDataSourceV2FallBackSuite: - Fallback Parquet V2 to V1 → apache#3315 UDFSuite: - SPARK-8005 input_file_name → apache#3312 ExtractPythonUDFsSuite: - Python UDF should not break column pruning/filter pushdown -- Parquet V1 → apache#3312 StreamingQuerySuite: - SPARK-41198: input row calculation with CTE → apache#3315 - SPARK-41199: input row calculation with mixed DSv1 and DSv2 sources → apache#3315 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added the import statement to test files that were missing it: - FileDataSourceV2FallBackSuite.scala - ParquetFileMetadataStructRowIndexSuite.scala - ExtractPythonUDFsSuite.scala - DisableUnnecessaryBucketedScanSuite.scala - StreamingQuerySuite.scala - UDFSuite.scala Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The method signature in IgnoreComet.scala was not properly formatted according to scalafmt rules. This fixes the formatting to match Spark's scalafmt configuration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Set NOLINT_ON_COMPILE=true to skip scalastyle validation during SBT compilation, reducing CI time for Spark SQL test runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
N/A - This PR enables running Spark SQL tests with
native_datafusionscan in CI.Rationale for this change
Running Spark SQL tests with
native_datafusionscan helps ensure compatibility and catch regressions. This PR enables these tests in CI while ignoring known failing tests that are tracked in separate issues.What changes are included in this PR?
CI workflow changes: Added
native_datafusionscan mode to the Spark SQL test matrixTest annotations: Added
IgnoreCometNativeDataFusionannotations for failing tests, linked to tracking issues:How are these changes tested?
The changes are tested by the CI workflow itself - tests should pass with the known failures ignored.