native_datafusion (Spark 3.x): shim's ParquetSchemaConvert translation produces an extra SparkException cause-chain layer

## Description

On Spark 3.x, Comet's native-error → JVM-exception shim
(`spark/src/main/spark-3.{4,5}/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scala`)
translates a native `ParquetSchemaConvert` error into a `SparkException` whose
cause is `SchemaColumnConvertNotSupportedException`:

```scala
val cause = new SchemaColumnConvertNotSupportedException(column, physicalType, logicalType)
QueryExecutionErrors.unsupportedSchemaColumnConvertError(filePath, column, logicalType,
  physicalType, cause)
// returns: new SparkException(errorClass = "_LEGACY_ERROR_TEMP_2063", ..., cause = e)
```

Spark 3.x's executor / task error handling then re-wraps this `SparkException`
once more on the way back to the driver, producing a two-level chain:

```
SparkException (driver-side wrapping)
  cause -> SparkException (shim-generated, errorClass "_LEGACY_ERROR_TEMP_2063")
    cause -> SchemaColumnConvertNotSupportedException
```

Spark's own vectorized reader produces a one-level chain because
`ParquetVectorUpdaterFactory.getUpdater` throws
`SchemaColumnConvertNotSupportedException` directly; the file-scan code catches
it once and wraps in a `SparkException`. Spark 4.0+ also produces a one-level
chain for Comet because the 4.x shim's `parquetColumnDataTypeMismatchError` path
appears not to be re-wrapped by the executor.

## Why it matters

Spark's own `SPARK-34212 Parquet should read decimals correctly` (and similar
tests) assert the cause directly:

```scala
val e = intercept[SparkException] { readParquet(schema, path).collect() }.getCause
assert(e.isInstanceOf[SchemaColumnConvertNotSupportedException])
```

On Comet 3.x, `e.getCause` is the inner `SparkException`, not the
`SchemaColumnConvertNotSupportedException`, so the assertion fails. Tests that
walk the cause chain (e.g. our regression test in `ParquetReadSuite`) pass.

## Affected tests (currently kept ignored)

- `dev/diffs/3.4.3.diff` — `SPARK-34212 Parquet should read decimals correctly`
  (`ParquetQuerySuite`).
- `dev/diffs/3.5.8.diff` — same.

These would be unignored in `4.0.2.diff` / `4.1.1.diff` (where the chain is
one-level and the schema-adapter rejection from #4351 is in place).

## Suggested fix

Change the 3.x shim to throw `SchemaColumnConvertNotSupportedException`
directly rather than wrapping it in `unsupportedSchemaColumnConvertError`'s
`SparkException`. Spark's task error handling will wrap it once on the way
back to the driver, producing the same one-level chain Spark's own vectorized
reader produces. The error message format (`Parquet column cannot be converted
in file …`) needs to be preserved since some Spark SQL tests assert on it.

## Related

- #4351 — the schema-adapter rejection that surfaces this chain.
- #3720 — parent umbrella.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

native_datafusion (Spark 3.x): shim's ParquetSchemaConvert translation produces an extra SparkException cause-chain layer #4354

Description

Why it matters

Affected tests (currently kept ignored)

Suggested fix

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

native_datafusion (Spark 3.x): shim's ParquetSchemaConvert translation produces an extra SparkException cause-chain layer #4354

Description

Description

Why it matters

Affected tests (currently kept ignored)

Suggested fix

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions