Skip to content

native_datafusion: tests asserting parquet-mr's permissive overflow/narrowing behavior cannot be made to pass #4352

@andygrove

Description

@andygrove

Description

Several Spark tests in ParquetTypeWideningSuite and one in ParquetQuerySuite assert that Spark's parquet-mr reader silently truncates / overflows / returns null when the requested schema is narrower than the file's schema, and that this only happens on the non-vectorized path (PARQUET_VECTORIZED_READER_ENABLED = false). When the vectorized reader is on, the same conversions throw SchemaColumnConvertNotSupportedException.

native_datafusion always rejects these conversions (mirroring the vectorized-reader branch via schema_adapter.rs), so the tests fail on the non-vectorized branch where Spark's parquet-mr would have silently produced wrong-but-tolerated output.

This is an architectural difference, not a fixable bug in the rejection logic — Comet has no parquet-mr-equivalent backend that produces silent-overflow results. The schema-adapter changes in #4297, #4343, #4344 are correct; these tests just have to be ignored under native_datafusion until/unless someone adds a permissive non-vectorized fallback.

Affected tests (Spark 4.1.x)

org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite:

  • unsupported parquet conversion *Type -> DecimalType(...) — 17 cases (expectError = vectorized)
  • parquet decimal precision change Decimal(X, 2) -> Decimal(Y, 2) — 6 narrowing cases
  • parquet decimal precision and scale change Decimal(X, Y) -> Decimal(A, B) — 12 cases
  • parquet decimal type change Decimal(5, 2) -> Decimal(3, 2) overflows with parquet-mr — 1 case (specifically tests parquet-mr's null-on-overflow)

Spark 3.4.x / 3.5.x / 4.0.x already carry the first three groups under IgnoreCometNativeDataFusion("https://github.com/apache/datafusion-comet/issues/3720"); 4.1.1's diff unignored them prematurely as part of the schema-adapter work and they need to be re-ignored. The overflows with parquet-mr test is unannotated in 4.0.2/4.1.1 and needs the same treatment.

Action

Re-add IgnoreCometNativeDataFusion(<this issue's URL>) to the affected tests in dev/diffs/4.1.1.diff, and add it to the parquet decimal type change ... overflows with parquet-mr test in dev/diffs/4.0.2.diff and dev/diffs/4.1.1.diff.

Related

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions