Skip to content

Track version-specific correctness test exclusions added for Spark suites #2287

@myandpr

Description

@myandpr

Background

Follow-up for #2281.

PR #2281 enables Parquet correctness suite wrappers and Spark correctness tests for additional Spark versions. To keep those jobs green, it also adds version-specific exclude, excludeByPrefix, and disable entries in AuronSparkTestSettings.scala.

Scope

The goal of this issue is to track those exclusions so they are not lost after #2281 is merged. Each entry should eventually be reviewed and either fixed, re-enabled, or kept excluded with a clear reason.

Tracked entries by version:

  • Spark 3.1: 47 entries
  • Spark 3.2: 70 entries
  • Spark 3.4: 80 entries
  • Spark 3.5: 83 entries
  • Spark 4.0: 79 entries
  • Spark 4.1: 89 entries

Total tracked entries: 448.

Tracking

The list below was extracted from the #2281 diff for AuronSparkTestSettings.scala. It includes Parquet suite exclusions and the Spark correctness suite exclusions added when enabling those jobs.

Full exclusion list by Spark version

spark31

AuronParquetIOSuite

  • exclude: read dictionary encoded decimals written as INT32
  • exclude: read dictionary encoded decimals written as INT64
  • exclude: read dictionary encoded decimals written as FIXED_LEN_BYTE_ARRAY
  • exclude: read dictionary and plain encoded timestamp_millis written as INT64
  • exclude: SPARK-31159: compatibility with Spark 2.4 in reading dates/timestamps
  • exclude: SPARK-31159: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write

AuronParquetInteroperabilitySuite

  • exclude: parquet timestamp conversion

AuronParquetProtobufCompatibilitySuite

  • exclude: unannotated array of primitive type
  • exclude: unannotated array of struct
  • exclude: struct with unannotated array
  • exclude: unannotated array of struct with unannotated array
  • exclude: unannotated array of string

AuronParquetQuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly

AuronParquetSchemaSuite

  • exclude: schema mismatch failure error message for parquet reader
  • exclude: schema mismatch failure error message for parquet vectorized reader

AuronParquetThriftCompatibilitySuite

  • exclude: Read Parquet file generated by parquet-thrift

AuronParquetV1FilterSuite

  • excludeByPrefix: filter pushdown -
  • exclude: Filters should be pushed down for vectorized Parquet reader at row group level
  • exclude: SPARK-31026: Parquet predicate pushdown for fields having dots in the names
  • exclude: Filters should be pushed down for Parquet readers at row group level
  • exclude: SPARK-23852: Broken Parquet push-down for partially-written stats
  • exclude: SPARK-17091: Convert IN predicate to Parquet filter push-down
  • exclude: SPARK-25207: exception when duplicate fields in case-insensitive mode

AuronParquetV1PartitionDiscoverySuite

  • exclude: read partitioned table - partition key included in Parquet file
  • exclude: read partitioned table - with nulls and partition keys are included in Parquet file
  • exclude: SPARK-18108 Parquet reader fails when data column types conflict with partition ones
  • exclude: SPARK-21463: MetadataLogFileIndex should respect userSpecifiedSchema for partition cols

AuronParquetV1QuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: returning batch for wide table

AuronParquetV2FilterSuite

  • exclude: SPARK-31026: Parquet predicate pushdown for fields having dots in the names
  • exclude: Filters should be pushed down for Parquet readers at row group level
  • exclude: SPARK-23852: Broken Parquet push-down for partially-written stats
  • exclude: SPARK-17091: Convert IN predicate to Parquet filter push-down
  • exclude: SPARK-25207: exception when duplicate fields in case-insensitive mode

AuronParquetV2QuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: returning batch for wide table

spark32

AuronDataFrameAggregateSuite

  • exclude: SPARK-34837: Support ANSI SQL intervals by the aggregate function avg

AuronParquetIOSuite

  • exclude: SPARK-34817: Read UINT_64 as Decimal from parquet
  • exclude: SPARK-35640: read binary as timestamp should throw schema incompatible error
  • exclude: SPARK-35640: int as long should throw schema incompatible error
  • exclude: read dictionary encoded decimals written as INT32
  • exclude: read dictionary encoded decimals written as INT64
  • exclude: read dictionary encoded decimals written as FIXED_LEN_BYTE_ARRAY
  • exclude: read dictionary and plain encoded timestamp_millis written as INT64
  • exclude: SPARK-36726: test incorrect Parquet row group file offset
  • exclude: SPARK-34167: read LongDecimals with precision < 10, VectorizedReader true
  • exclude: SPARK-34167: read LongDecimals with precision < 10, VectorizedReader false

AuronParquetInteroperabilitySuite

  • exclude: parquet timestamp conversion

AuronParquetProtobufCompatibilitySuite

  • exclude: unannotated array of primitive type
  • exclude: unannotated array of struct
  • exclude: struct with unannotated array
  • exclude: unannotated array of struct with unannotated array
  • exclude: unannotated array of string

AuronParquetQuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly

AuronParquetRebaseDatetimeSuite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-31159, SPARK-37705: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetRebaseDatetimeV1Suite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-31159, SPARK-37705: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetRebaseDatetimeV2Suite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetSchemaSuite

  • exclude: schema mismatch failure error message for parquet reader
  • exclude: schema mismatch failure error message for parquet vectorized reader
  • exclude: SPARK-40819: parquet file with TIMESTAMP(NANOS, true) (with nanosAsLong=true)
  • exclude: SPARK-40819: parquet file with TIMESTAMP(NANOS, true) (with default nanosAsLong=false)

AuronParquetThriftCompatibilitySuite

  • exclude: Read Parquet file generated by parquet-thrift

AuronParquetV1FilterSuite

  • excludeByPrefix: SPARK-40280: filter pushdown -
  • excludeByPrefix: filter pushdown -
  • exclude: Filters should be pushed down for vectorized Parquet reader at row group level
  • exclude: SPARK-31026: Parquet predicate pushdown for fields having dots in the names
  • exclude: Filters should be pushed down for Parquet readers at row group level
  • exclude: SPARK-23852: Broken Parquet push-down for partially-written stats
  • exclude: SPARK-17091: Convert IN predicate to Parquet filter push-down
  • exclude: SPARK-25207: exception when duplicate fields in case-insensitive mode
  • exclude: Support Parquet column index
  • exclude: SPARK-34562: Bloom filter push down

AuronParquetV1PartitionDiscoverySuite

  • exclude: read partitioned table - partition key included in Parquet file
  • exclude: read partitioned table - with nulls and partition keys are included in Parquet file
  • exclude: SPARK-18108 Parquet reader fails when data column types conflict with partition ones
  • exclude: SPARK-21463: MetadataLogFileIndex should respect userSpecifiedSchema for partition cols

AuronParquetV1QuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: returning batch for wide table
  • exclude: SPARK-39833: pushed filters with count()
  • exclude: SPARK-39833: pushed filters with project without filter columns

AuronParquetV2FilterSuite

  • excludeByPrefix: SPARK-40280: filter pushdown -
  • exclude: SPARK-31026: Parquet predicate pushdown for fields having dots in the names
  • exclude: Filters should be pushed down for Parquet readers at row group level
  • exclude: SPARK-23852: Broken Parquet push-down for partially-written stats
  • exclude: SPARK-17091: Convert IN predicate to Parquet filter push-down
  • exclude: SPARK-25207: exception when duplicate fields in case-insensitive mode
  • exclude: Support Parquet column index

AuronParquetV2QuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: returning batch for wide table

spark34

AuronDataFrameSuite

  • exclude: SPARK-41048: Improve output partitioning and ordering with AQE cache

AuronParquetFieldIdIOSuite

  • exclude: Parquet reads infer fields using field ids correctly
  • exclude: absence of field ids
  • exclude: SPARK-38094: absence of field ids: reading nested schema
  • exclude: multiple id matches
  • exclude: read parquet file without ids
  • exclude: global read/write flag should work correctly

AuronParquetIOSuite

  • exclude: vectorized reader: missing all struct fields
  • exclude: SPARK-34817: Read UINT_64 as Decimal from parquet
  • exclude: SPARK-35640: read binary as timestamp should throw schema incompatible error
  • exclude: SPARK-35640: int as long should throw schema incompatible error
  • exclude: read dictionary encoded decimals written as INT32
  • exclude: read dictionary encoded decimals written as INT64
  • exclude: read dictionary encoded decimals written as FIXED_LEN_BYTE_ARRAY
  • exclude: read dictionary and plain encoded timestamp_millis written as INT64
  • exclude: SPARK-40128 read DELTA_LENGTH_BYTE_ARRAY encoded strings
  • exclude: SPARK-36726: test incorrect Parquet row group file offset
  • exclude: SPARK-34167: read LongDecimals with precision < 10, VectorizedReader true
  • exclude: SPARK-34167: read LongDecimals with precision < 10, VectorizedReader false

AuronParquetInteroperabilitySuite

  • exclude: parquet timestamp conversion

AuronParquetProtobufCompatibilitySuite

  • exclude: unannotated array of primitive type
  • exclude: unannotated array of struct
  • exclude: struct with unannotated array
  • exclude: unannotated array of struct with unannotated array
  • exclude: unannotated array of string

AuronParquetQuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: row group skipping doesn't overflow when reading into larger type

AuronParquetRebaseDatetimeSuite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-31159, SPARK-37705: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetRebaseDatetimeV1Suite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-31159, SPARK-37705: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetRebaseDatetimeV2Suite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetSchemaSuite

  • exclude: schema mismatch failure error message for parquet reader
  • exclude: schema mismatch failure error message for parquet vectorized reader
  • exclude: SPARK-40819: parquet file with TIMESTAMP(NANOS, true) (with nanosAsLong=true)
  • exclude: SPARK-40819: parquet file with TIMESTAMP(NANOS, true) (with default nanosAsLong=false)

AuronParquetThriftCompatibilitySuite

  • exclude: Read Parquet file generated by parquet-thrift

AuronParquetV1FilterSuite

  • excludeByPrefix: SPARK-40280: filter pushdown -
  • excludeByPrefix: filter pushdown -
  • exclude: Filters should be pushed down for vectorized Parquet reader at row group level
  • exclude: SPARK-31026: Parquet predicate pushdown for fields having dots in the names
  • exclude: Filters should be pushed down for Parquet readers at row group level
  • exclude: SPARK-23852: Broken Parquet push-down for partially-written stats
  • exclude: SPARK-17091: Convert IN predicate to Parquet filter push-down
  • exclude: SPARK-25207: exception when duplicate fields in case-insensitive mode
  • exclude: Support Parquet column index
  • exclude: SPARK-34562: Bloom filter push down

AuronParquetV1PartitionDiscoverySuite

  • exclude: read partitioned table - partition key included in Parquet file
  • exclude: read partitioned table - with nulls and partition keys are included in Parquet file
  • exclude: SPARK-18108 Parquet reader fails when data column types conflict with partition ones
  • exclude: SPARK-21463: MetadataLogFileIndex should respect userSpecifiedSchema for partition cols

AuronParquetV1QuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: row group skipping doesn't overflow when reading into larger type
  • exclude: returning batch for wide table
  • exclude: SPARK-39833: pushed filters with count()
  • exclude: SPARK-39833: pushed filters with project without filter columns

AuronParquetV2FilterSuite

  • exclude: SPARK-31026: Parquet predicate pushdown for fields having dots in the names
  • exclude: Filters should be pushed down for Parquet readers at row group level
  • exclude: SPARK-23852: Broken Parquet push-down for partially-written stats
  • exclude: SPARK-17091: Convert IN predicate to Parquet filter push-down
  • exclude: SPARK-25207: exception when duplicate fields in case-insensitive mode
  • excludeByPrefix: SPARK-40280: filter pushdown -
  • exclude: Support Parquet column index

AuronParquetV2QuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: returning batch for wide table

spark35

AuronDataFrameAggregateSuite

  • exclude: SPARK-16484: hll_*_agg + hll_union negative tests
  • exclude: SPARK-43876: Enable fast hashmap for distinct queries

AuronDataFrameSuite

  • exclude: SPARK-41048: Improve output partitioning and ordering with AQE cache

AuronParquetFieldIdIOSuite

  • exclude: Parquet reads infer fields using field ids correctly
  • exclude: absence of field ids
  • exclude: SPARK-38094: absence of field ids: reading nested schema
  • exclude: multiple id matches
  • exclude: read parquet file without ids
  • exclude: global read/write flag should work correctly

AuronParquetIOSuite

  • exclude: vectorized reader: missing all struct fields
  • exclude: SPARK-34817: Read UINT_64 as Decimal from parquet
  • exclude: SPARK-35640: read binary as timestamp should throw schema incompatible error
  • exclude: SPARK-35640: int as long should throw schema incompatible error
  • exclude: read dictionary encoded decimals written as INT32
  • exclude: explode nested lists crossing a rowgroup boundary
  • exclude: read dictionary encoded decimals written as INT64
  • exclude: read dictionary encoded decimals written as FIXED_LEN_BYTE_ARRAY
  • exclude: read dictionary and plain encoded timestamp_millis written as INT64
  • exclude: SPARK-40128 read DELTA_LENGTH_BYTE_ARRAY encoded strings
  • exclude: SPARK-36726: test incorrect Parquet row group file offset
  • exclude: SPARK-34167: read LongDecimals with precision < 10, VectorizedReader true
  • exclude: SPARK-34167: read LongDecimals with precision < 10, VectorizedReader false

AuronParquetInteroperabilitySuite

  • exclude: parquet timestamp conversion

AuronParquetProtobufCompatibilitySuite

  • exclude: unannotated array of primitive type
  • exclude: unannotated array of struct
  • exclude: struct with unannotated array
  • exclude: unannotated array of struct with unannotated array
  • exclude: unannotated array of string

AuronParquetQuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: row group skipping doesn't overflow when reading into larger type

AuronParquetRebaseDatetimeSuite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-31159, SPARK-37705: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetRebaseDatetimeV1Suite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-31159, SPARK-37705: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetRebaseDatetimeV2Suite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetSchemaSuite

  • exclude: schema mismatch failure error message for parquet reader
  • exclude: schema mismatch failure error message for parquet vectorized reader
  • exclude: SPARK-40819: parquet file with TIMESTAMP(NANOS, true) (with nanosAsLong=true)
  • exclude: SPARK-40819: parquet file with TIMESTAMP(NANOS, true) (with default nanosAsLong=false)

AuronParquetThriftCompatibilitySuite

  • exclude: Read Parquet file generated by parquet-thrift

AuronParquetV1FilterSuite

  • excludeByPrefix: SPARK-40280: filter pushdown -
  • excludeByPrefix: filter pushdown -
  • exclude: Filters should be pushed down for vectorized Parquet reader at row group level
  • exclude: SPARK-31026: Parquet predicate pushdown for fields having dots in the names
  • exclude: Filters should be pushed down for Parquet readers at row group level
  • exclude: SPARK-23852: Broken Parquet push-down for partially-written stats
  • exclude: SPARK-17091: Convert IN predicate to Parquet filter push-down
  • exclude: SPARK-25207: exception when duplicate fields in case-insensitive mode
  • exclude: Support Parquet column index
  • exclude: SPARK-34562: Bloom filter push down

AuronParquetV1PartitionDiscoverySuite

  • exclude: read partitioned table - partition key included in Parquet file
  • exclude: read partitioned table - with nulls and partition keys are included in Parquet file
  • exclude: SPARK-18108 Parquet reader fails when data column types conflict with partition ones
  • exclude: SPARK-21463: MetadataLogFileIndex should respect userSpecifiedSchema for partition cols

AuronParquetV1QuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: row group skipping doesn't overflow when reading into larger type
  • exclude: returning batch for wide table
  • exclude: SPARK-39833: pushed filters with count()
  • exclude: SPARK-39833: pushed filters with project without filter columns

AuronParquetV2FilterSuite

  • excludeByPrefix: SPARK-40280: filter pushdown -
  • exclude: SPARK-31026: Parquet predicate pushdown for fields having dots in the names
  • exclude: Filters should be pushed down for Parquet readers at row group level
  • exclude: SPARK-23852: Broken Parquet push-down for partially-written stats
  • exclude: SPARK-17091: Convert IN predicate to Parquet filter push-down
  • exclude: SPARK-25207: exception when duplicate fields in case-insensitive mode
  • exclude: Support Parquet column index

AuronParquetV2QuerySuite

  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: returning batch for wide table

spark40

AuronDataFrameFunctionsSuite

  • disable: Native execution can crash after ParquetQuery in Spark 4

AuronDateFunctionsSuite

  • exclude: SPARK-30668: use legacy timestamp parser in to_timestamp

AuronMathFunctionsSuite

  • disable: Native execution can crash in Spark 4

AuronMiscFunctionsSuite

  • exclude: reflect and java_method

AuronStringFunctionsSuite

  • exclude: string concat
  • exclude: string concat_ws
  • exclude: UTF-8 string validate
  • exclude: RegExpReplace throws the right exception when replace fails on a particular row

AuronDataFrameAggregateSuite

  • disable: Native execution can crash in Spark 4

AuronDatasetAggregatorSuite

  • disable: Native dataset aggregators fail in Spark 4

AuronTypedImperativeAggregateSuite

  • disable: Native execution can crash after ParquetQuery in Spark 4

AuronDataFrameSuite

  • disable: Native execution can crash in Spark 4

AuronParquetAvroCompatibilitySuite

  • exclude: required primitives
  • exclude: optional primitives
  • exclude: non-nullable arrays
  • exclude: SPARK-10136 array of primitive array
  • exclude: map of primitive array
  • exclude: various complex types
  • exclude: SPARK-9407 Push down predicates involving Parquet ENUM columns

AuronParquetColumnIndexSuite

  • exclude: reading from unaligned pages - test filters
  • exclude: test reading unaligned pages - test all types (dict encode)
  • exclude: SPARK-36123: reading from unaligned pages - test filters with nulls
  • exclude: test reading unaligned pages - test all types
  • exclude: reading unaligned pages - struct type

AuronParquetEncodingSuite

  • disable: Native execution can crash in Spark 4

AuronParquetFieldIdIOSuite

  • disable: Native parquet field id reads fail in Spark 4

AuronParquetIOSuite

  • disable: Native execution can crash in Spark 4

AuronParquetInteroperabilitySuite

  • disable: Native execution can crash in Spark 4

AuronParquetPartitionDiscoverySuite

  • exclude: read partitioned table - normal case
  • exclude: Resolve type conflicts - decimals, dates and timestamps in partition column

AuronParquetProtobufCompatibilitySuite

  • exclude: unannotated array of primitive type
  • exclude: unannotated array of struct
  • exclude: struct with unannotated array
  • exclude: unannotated array of struct with unannotated array
  • exclude: unannotated array of string

AuronParquetQuerySuite

  • exclude: simple select queries
  • exclude: appending
  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly

AuronParquetRebaseDatetimeSuite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-31159, SPARK-37705: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetRebaseDatetimeV1Suite

  • disable: Spark 4 test resources use jar paths unsupported by Hadoop Path

AuronParquetRebaseDatetimeV2Suite

  • disable: Spark 4 test resources use jar paths unsupported by Hadoop Path

AuronParquetSchemaPruningSuite

  • disable: Native parquet schema pruning reads fail in Spark 4

AuronParquetSchemaSuite

  • disable: Native execution can crash in Spark 4

AuronParquetThriftCompatibilitySuite

  • disable: Spark 4 test resources use jar paths unsupported by Hadoop Path

AuronParquetV1FilterSuite

  • disable: Native execution can crash in Spark 4

AuronParquetV1PartitionDiscoverySuite

  • exclude: read partitioned table - normal case
  • exclude: read partitioned table - partition key included in Parquet file
  • exclude: read partitioned table - with nulls and partition keys are included in Parquet file
  • exclude: SPARK-18108 Parquet reader fails when data column types conflict with partition ones
  • exclude: SPARK-21463: MetadataLogFileIndex should respect userSpecifiedSchema for partition cols

AuronParquetV1QuerySuite

  • exclude: simple select queries
  • exclude: appending
  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: returning batch for wide table
  • exclude: SPARK-39833: pushed filters with count()
  • exclude: SPARK-39833: pushed filters with project without filter columns

AuronParquetV1SchemaPruningSuite

  • disable: Native parquet schema pruning reads fail in Spark 4

AuronParquetV2FilterSuite

  • disable: Native execution can crash in Spark 4

AuronParquetV2PartitionDiscoverySuite

  • exclude: read partitioned table - normal case
  • exclude: SPARK-22109: Resolve type conflicts between strings and timestamps in partition column

AuronParquetV2QuerySuite

  • exclude: simple select queries
  • exclude: appending
  • exclude: self-join
  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: returning batch for wide table

AuronParquetV2SchemaPruningSuite

  • disable: Native parquet schema pruning reads fail in Spark 4

spark41

AuronDataFrameFunctionsSuite

  • disable: Native execution can crash after ParquetQuery in Spark 4

AuronDateFunctionsSuite

  • exclude: SPARK-30668: use legacy timestamp parser in to_timestamp

AuronMathFunctionsSuite

  • disable: Native execution can crash in Spark 4

AuronMiscFunctionsSuite

  • exclude: reflect and java_method

AuronStringFunctionsSuite

  • exclude: string concat
  • exclude: string concat_ws
  • exclude: UTF-8 string validate
  • exclude: RegExpReplace throws the right exception when replace fails on a particular row

AuronDataFrameAggregateSuite

  • disable: Native execution can crash in Spark 4

AuronDatasetAggregatorSuite

  • disable: Native dataset aggregators fail in Spark 4

AuronTypedImperativeAggregateSuite

  • disable: Native execution can crash after ParquetQuery in Spark 4

AuronDataFrameSuite

  • disable: Native execution can crash in Spark 4

AuronParquetAvroCompatibilitySuite

  • exclude: required primitives
  • exclude: optional primitives
  • exclude: non-nullable arrays
  • exclude: SPARK-10136 array of primitive array
  • exclude: map of primitive array
  • exclude: various complex types
  • exclude: SPARK-9407 Push down predicates involving Parquet ENUM columns

AuronParquetColumnIndexSuite

  • exclude: reading from unaligned pages - test filters
  • exclude: test reading unaligned pages - test all types (dict encode)
  • exclude: SPARK-36123: reading from unaligned pages - test filters with nulls
  • exclude: test reading unaligned pages - test all types
  • exclude: reading unaligned pages - struct type

AuronParquetEncodingSuite

  • disable: Native execution can crash in Spark 4

AuronParquetFieldIdIOSuite

  • disable: Native parquet field id reads fail in Spark 4

AuronParquetFileFormatSuite

  • exclude: Write and read back TIME values

AuronParquetFileFormatV1Suite

  • exclude: Write and read back TIME values

AuronParquetFileFormatV2Suite

  • exclude: Write and read back TIME values

AuronParquetIOSuite

  • disable: Native execution can crash in Spark 4

AuronParquetInteroperabilitySuite

  • disable: Native execution can crash in Spark 4

AuronParquetPartitionDiscoverySuite

  • exclude: read partitioned table - normal case
  • exclude: Infer the TIME data type from partition values

AuronParquetProtobufCompatibilitySuite

  • exclude: unannotated array of primitive type
  • exclude: unannotated array of struct
  • exclude: struct with unannotated array
  • exclude: unannotated array of struct with unannotated array
  • exclude: unannotated array of string

AuronParquetQuerySuite

  • exclude: simple select queries
  • exclude: appending
  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: create table with TIME

AuronParquetRebaseDatetimeSuite

  • exclude: SPARK-31159, SPARK-37705: compatibility with Spark 2.4/3.2 in reading dates/timestamps
  • exclude: SPARK-31159, SPARK-37705: rebasing timestamps in write
  • exclude: SPARK-31159: rebasing dates in write
  • exclude: SPARK-35427: datetime rebasing in the EXCEPTION mode

AuronParquetRebaseDatetimeV1Suite

  • disable: Spark 4 test resources use jar paths unsupported by Hadoop Path

AuronParquetRebaseDatetimeV2Suite

  • disable: Spark 4 test resources use jar paths unsupported by Hadoop Path

AuronParquetSchemaPruningSuite

  • disable: Native parquet schema pruning reads fail in Spark 4

AuronParquetSchemaSuite

  • disable: Native execution can crash in Spark 4

AuronParquetThriftCompatibilitySuite

  • disable: Spark 4 test resources use jar paths unsupported by Hadoop Path

AuronParquetV1FilterSuite

  • disable: Native execution can crash in Spark 4

AuronParquetV1PartitionDiscoverySuite

  • exclude: read partitioned table - normal case
  • exclude: Infer the TIME data type from partition values
  • exclude: read partitioned table - partition key included in Parquet file
  • exclude: read partitioned table - with nulls and partition keys are included in Parquet file
  • exclude: SPARK-18108 Parquet reader fails when data column types conflict with partition ones
  • exclude: SPARK-21463: MetadataLogFileIndex should respect userSpecifiedSchema for partition cols

AuronParquetV1QuerySuite

  • exclude: simple select queries
  • exclude: appending
  • exclude: create table with TIME
  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: Enabling/disabling ignoreCorruptFiles
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: SPARK-34212 Parquet should read decimals correctly
  • exclude: returning batch for wide table
  • exclude: SPARK-39833: pushed filters with count()
  • exclude: SPARK-39833: pushed filters with project without filter columns

AuronParquetV1SchemaPruningSuite

  • disable: Native parquet schema pruning reads fail in Spark 4

AuronParquetV2FilterSuite

  • disable: Native execution can crash in Spark 4

AuronParquetV2PartitionDiscoverySuite

  • exclude: read partitioned table - normal case
  • exclude: Infer the TIME data type from partition values
  • exclude: _SUCCESS should not break partitioning discovery
  • exclude: Resolve type conflicts - decimals, dates and timestamps in partition column
  • exclude: SPARK-22109: Resolve type conflicts between strings and timestamps in partition column

AuronParquetV2QuerySuite

  • exclude: simple select queries
  • exclude: appending
  • exclude: self-join
  • exclude: create table with TIME
  • exclude: SPARK-10634 timestamp written and read as INT64 - truncation
  • exclude: SPARK-26677: negated null-safe equality comparison should not filter matched row groups
  • exclude: Migration from INT96 to TIMESTAMP_MICROS timestamp type
  • exclude: returning batch for wide table

AuronParquetV2SchemaPruningSuite

  • disable: Native parquet schema pruning reads fail in Spark 4

Notes

Larger groups can be split into smaller issues or PRs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions