Description
Several Spark tests in ParquetTypeWideningSuite and one in ParquetQuerySuite assert that Spark's parquet-mr reader silently truncates / overflows / returns null when the requested schema is narrower than the file's schema, and that this only happens on the non-vectorized path (PARQUET_VECTORIZED_READER_ENABLED = false). When the vectorized reader is on, the same conversions throw SchemaColumnConvertNotSupportedException.
native_datafusion always rejects these conversions (mirroring the vectorized-reader branch via schema_adapter.rs), so the tests fail on the non-vectorized branch where Spark's parquet-mr would have silently produced wrong-but-tolerated output.
This is an architectural difference, not a fixable bug in the rejection logic — Comet has no parquet-mr-equivalent backend that produces silent-overflow results. The schema-adapter changes in #4297, #4343, #4344 are correct; these tests just have to be ignored under native_datafusion until/unless someone adds a permissive non-vectorized fallback.
Affected tests (Spark 4.1.x)
org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite:
unsupported parquet conversion *Type -> DecimalType(...) — 17 cases (expectError = vectorized)
parquet decimal precision change Decimal(X, 2) -> Decimal(Y, 2) — 6 narrowing cases
parquet decimal precision and scale change Decimal(X, Y) -> Decimal(A, B) — 12 cases
parquet decimal type change Decimal(5, 2) -> Decimal(3, 2) overflows with parquet-mr — 1 case (specifically tests parquet-mr's null-on-overflow)
Spark 3.4.x / 3.5.x / 4.0.x already carry the first three groups under IgnoreCometNativeDataFusion("https://github.com/apache/datafusion-comet/issues/3720"); 4.1.1's diff unignored them prematurely as part of the schema-adapter work and they need to be re-ignored. The overflows with parquet-mr test is unannotated in 4.0.2/4.1.1 and needs the same treatment.
Action
Re-add IgnoreCometNativeDataFusion(<this issue's URL>) to the affected tests in dev/diffs/4.1.1.diff, and add it to the parquet decimal type change ... overflows with parquet-mr test in dev/diffs/4.0.2.diff and dev/diffs/4.1.1.diff.
Related
Description
Several Spark tests in
ParquetTypeWideningSuiteand one inParquetQuerySuiteassert that Spark's parquet-mr reader silently truncates / overflows / returns null when the requested schema is narrower than the file's schema, and that this only happens on the non-vectorized path (PARQUET_VECTORIZED_READER_ENABLED = false). When the vectorized reader is on, the same conversions throwSchemaColumnConvertNotSupportedException.native_datafusionalways rejects these conversions (mirroring the vectorized-reader branch viaschema_adapter.rs), so the tests fail on the non-vectorized branch where Spark's parquet-mr would have silently produced wrong-but-tolerated output.This is an architectural difference, not a fixable bug in the rejection logic — Comet has no parquet-mr-equivalent backend that produces silent-overflow results. The schema-adapter changes in #4297, #4343, #4344 are correct; these tests just have to be ignored under
native_datafusionuntil/unless someone adds a permissive non-vectorized fallback.Affected tests (Spark 4.1.x)
org.apache.spark.sql.execution.datasources.parquet.ParquetTypeWideningSuite:unsupported parquet conversion *Type -> DecimalType(...)— 17 cases (expectError = vectorized)parquet decimal precision change Decimal(X, 2) -> Decimal(Y, 2)— 6 narrowing casesparquet decimal precision and scale change Decimal(X, Y) -> Decimal(A, B)— 12 casesparquet decimal type change Decimal(5, 2) -> Decimal(3, 2) overflows with parquet-mr— 1 case (specifically tests parquet-mr's null-on-overflow)Spark 3.4.x / 3.5.x / 4.0.x already carry the first three groups under
IgnoreCometNativeDataFusion("https://github.com/apache/datafusion-comet/issues/3720"); 4.1.1's diff unignored them prematurely as part of the schema-adapter work and they need to be re-ignored. Theoverflows with parquet-mrtest is unannotated in 4.0.2/4.1.1 and needs the same treatment.Action
Re-add
IgnoreCometNativeDataFusion(<this issue's URL>)to the affected tests indev/diffs/4.1.1.diff, and add it to theparquet decimal type change ... overflows with parquet-mrtest indev/diffs/4.0.2.diffanddev/diffs/4.1.1.diff.Related