Native scan path doesn't honour Parquet field-ID matching when spark.sql.parquet.fieldId.read.enabled=true

## Summary

Comet's native scan paths (`SCAN_NATIVE_DATAFUSION` and the new
`SCAN_NATIVE_DELTA_COMPAT` in the delta-kernel-phase-1 work) read parquet
columns by name. When the user enables Spark's parquet field-ID-based
column resolution via `spark.sql.parquet.fieldId.read.enabled=true`,
Spark's parquet reader matches columns by `parquet.field.id` metadata
on each `StructField` rather than by name. DataFusion's parquet path
does not honour that metadata, so columns are still resolved by name --
silently producing wrong results when names and IDs disagree.

## Repro (Delta column-mapping `id` mode)

The Delta `id` column-mapping mode relies on field-ID matching to
decouple the table's logical column name from the parquet file's
physical name. Tests that exercise the rename-detection semantics
(e.g. `DeltaColumnMappingSuite` "column mapping batch scan should
detect physical name changes" and "explicit id matching") expect
nulls when a field's ID is changed in Delta metadata such that it
no longer matches the file's stored ID. Vanilla Spark + Delta returns
nulls; Comet returns the actual data because its by-name resolver
finds the column whose name didn't change.

## Workaround

`nativeDataFusionScan` already declines when both
`spark.sql.parquet.fieldId.read.enabled=true` and the requiredSchema
has field-IDs (`ParquetUtils.hasFieldIds`). The same gate has now been
mirrored in `nativeDeltaScan`. However, the check returns false for
Delta because Delta's `HadoopFsRelation` strips the field-ID metadata
from `requiredSchema` -- the IDs live on the snapshot's metadata,
which the Comet rule doesn't consult. So the gate never fires for
Delta column-mapping `id` mode.

## Proposed fix

Extend Comet's parquet-read path to honour `parquet.field.id` /
`field_id` Arrow metadata for column resolution when the session's
`PARQUET_FIELD_ID_READ_ENABLED` is true, mirroring Spark's
`ParquetReadSupport.matchByName/matchByID` selection. Track per-field
IDs on `data_schema` and pass them through to the native parquet
reader so the schema adapter prefers ID-match.

Filed against: branch delta-kernel-phase-1 (PR #3932)
Related Spark behavior: `org.apache.spark.sql.execution.datasources.parquet.ParquetReadSupport`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native scan path doesn't honour Parquet field-ID matching when spark.sql.parquet.fieldId.read.enabled=true #4189

Summary

Repro (Delta column-mapping `id` mode)

Workaround

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Native scan path doesn't honour Parquet field-ID matching when spark.sql.parquet.fieldId.read.enabled=true #4189

Description

Summary

Repro (Delta column-mapping id mode)

Workaround

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Repro (Delta column-mapping `id` mode)