Description
Spark writes `NullType` columns to parquet as `BOOLEAN` physical type with an `Unknown` logical type annotation (comment in `ParquetSchemaConverter.scala`: "Selected primitive type here doesn't have significance"). parquet-rs only accepts `LogicalType::Unknown` paired with `PhysicalType::INT32` and rejects any other physical type with `Cannot annotate Unknown from BOOLEAN for field '…'` (parquet-57.2.0/src/schema/types.rs:401, :423).
Result: any attempt to read a Spark-written parquet file that contains a `NullType` field fails in Comet with:
```
org.apache.comet.CometNativeException: Parquet error: Cannot annotate Unknown from BOOLEAN for field '_1'
```
The SPARK-54220 test in `ParquetIOSuite` (`SPARK-54220: vectorized reader: missing all struct fields, struct with NullType only`) is the concrete reproducer. It was unignored as part of PR #4190 / issue #4136 but crashes on the parquet read path before the new fix in `parquet_convert_struct_to_struct` is reached.
Reproducer
See `issue #4136: struct with only NullType fields in file (SPARK-54220)` in `CometNativeReaderSuite`. The failure manifests for both `native_datafusion` and `native_iceberg_compat`.
Suspected fix
Either:
- Upstream parquet-rs to accept `(Unknown, BOOLEAN)` (and arguably any physical type, since Spark's comment makes clear the physical type is a don't-care), or
- Work around in Comet: in the schema adapter or parquet reader factory, rewrite the physical type to INT32 before passing it to parquet-rs' validator — or special-case the Unknown-annotated field at read time.
Description
Spark writes `NullType` columns to parquet as `BOOLEAN` physical type with an `Unknown` logical type annotation (comment in `ParquetSchemaConverter.scala`: "Selected primitive type here doesn't have significance"). parquet-rs only accepts `LogicalType::Unknown` paired with `PhysicalType::INT32` and rejects any other physical type with `Cannot annotate Unknown from BOOLEAN for field '…'` (parquet-57.2.0/src/schema/types.rs:401, :423).
Result: any attempt to read a Spark-written parquet file that contains a `NullType` field fails in Comet with:
```
org.apache.comet.CometNativeException: Parquet error: Cannot annotate Unknown from BOOLEAN for field '_1'
```
The SPARK-54220 test in `ParquetIOSuite` (`SPARK-54220: vectorized reader: missing all struct fields, struct with NullType only`) is the concrete reproducer. It was unignored as part of PR #4190 / issue #4136 but crashes on the parquet read path before the new fix in `parquet_convert_struct_to_struct` is reached.
Reproducer
See `issue #4136: struct with only NullType fields in file (SPARK-54220)` in `CometNativeReaderSuite`. The failure manifests for both `native_datafusion` and `native_iceberg_compat`.
Suspected fix
Either: