AnalysisException: [UNRESOLVED_COLUMN] during MERGE INTO on wide Iceberg tables (400+ columns) in Spark 3.5

### Apache Iceberg version

1.7.1

### Query engine

Spark

### Please describe the bug 🐞

When performing a MERGE INTO operation on an Apache Iceberg table with a large number of columns (~450), Spark 3.5 fails during the analysis phase with an UNRESOLVED_COLUMN error. The error is paradoxical because the "suggested columns" in the error message include the exact column name and alias that the analyzer claims it cannot resolve.

The same SQL logic works successfully on a smaller "test" version of the table (e.g., 10 columns) and executes successfully in other engines (e.g., Athena/Trino), suggesting a specific regression or limitation in the Spark Catalyst Optimizer’s ability to bind references in extremely wide MergeIntoTable logical plans.

Environment:

Spark Version: 3.5.x

Iceberg Version: [Insert your version, e.g., 1.4.3]

Catalog: Glue Catalog

Table Schema: ~450 columns, partitioned by [Insert column, e.g., date].

Steps to Reproduce:

Create an Iceberg table with 400+ columns.

Create a source staging table/view with a similar schema.

Run a MERGE INTO statement with 10+ join keys and 10+ column updates.

Observe the AnalysisException despite the columns being present and correctly typed.

Actual Error:

Plaintext
AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `target`.`dmsoptimestamp` cannot be resolved. Did you mean one of the following? [`target`.`dmsoptimestamp`, `source`.`dmsoptimestamp`, `target`.`uploadtimestamp`, ...]
Expected Behavior:
The analyzer should successfully bind the attributes to the target alias as it does with narrower tables.

Additional Context:

Increasing spark.sql.analyzer.maxIterations does not resolve the issue.

Materializing the source view into a physical table does not resolve the issue.

The issue appears unique to the SQL MERGE syntax; the DataFrame API (Join + Overwrite) works as a workaround, indicating the issue lies in the SQL-to-Logical-Plan resolution phase.

The failure occurs during the Resolution phase of the Catalyst Optimizer. Specifically, when the MergeIntoTable node is being resolved:

The AttributeMap for the target relation becomes excessively large (450+ entries).

The Rule<LogicalPlan> executor for ResolveReferences appears to hit a recursion or iteration limit when trying to bind the target alias to the specific AttributeReference in the RelationV2 scan.

Engine Discrepancy: The fact that Athena (Trino-based) resolves this plan successfully implies that Spark’s rule-based resolution of MergeIntoTable is not scaling linearly with schema width.

The same merge into request worked in Athena/trino engine for limited number of columns for update

### Willingness to contribute

- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- [x] I cannot contribute a fix for this bug at this time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnalysisException: [UNRESOLVED_COLUMN] during MERGE INTO on wide Iceberg tables (400+ columns) in Spark 3.5 #15526

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AnalysisException: [UNRESOLVED_COLUMN] during MERGE INTO on wide Iceberg tables (400+ columns) in Spark 3.5 #15526

Description

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions