Skip to content

AnalysisException: [UNRESOLVED_COLUMN] during MERGE INTO on wide Iceberg tables (400+ columns) in Spark 3.5 #15526

@NataliaLaurova

Description

@NataliaLaurova

Apache Iceberg version

1.7.1

Query engine

Spark

Please describe the bug 🐞

When performing a MERGE INTO operation on an Apache Iceberg table with a large number of columns (~450), Spark 3.5 fails during the analysis phase with an UNRESOLVED_COLUMN error. The error is paradoxical because the "suggested columns" in the error message include the exact column name and alias that the analyzer claims it cannot resolve.

The same SQL logic works successfully on a smaller "test" version of the table (e.g., 10 columns) and executes successfully in other engines (e.g., Athena/Trino), suggesting a specific regression or limitation in the Spark Catalyst Optimizer’s ability to bind references in extremely wide MergeIntoTable logical plans.

Environment:

Spark Version: 3.5.x

Iceberg Version: [Insert your version, e.g., 1.4.3]

Catalog: Glue Catalog

Table Schema: ~450 columns, partitioned by [Insert column, e.g., date].

Steps to Reproduce:

Create an Iceberg table with 400+ columns.

Create a source staging table/view with a similar schema.

Run a MERGE INTO statement with 10+ join keys and 10+ column updates.

Observe the AnalysisException despite the columns being present and correctly typed.

Actual Error:

Plaintext
AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name target.dmsoptimestamp cannot be resolved. Did you mean one of the following? [target.dmsoptimestamp, source.dmsoptimestamp, target.uploadtimestamp, ...]
Expected Behavior:
The analyzer should successfully bind the attributes to the target alias as it does with narrower tables.

Additional Context:

Increasing spark.sql.analyzer.maxIterations does not resolve the issue.

Materializing the source view into a physical table does not resolve the issue.

The issue appears unique to the SQL MERGE syntax; the DataFrame API (Join + Overwrite) works as a workaround, indicating the issue lies in the SQL-to-Logical-Plan resolution phase.

The failure occurs during the Resolution phase of the Catalyst Optimizer. Specifically, when the MergeIntoTable node is being resolved:

The AttributeMap for the target relation becomes excessively large (450+ entries).

The Rule executor for ResolveReferences appears to hit a recursion or iteration limit when trying to bind the target alias to the specific AttributeReference in the RelationV2 scan.

Engine Discrepancy: The fact that Athena (Trino-based) resolves this plan successfully implies that Spark’s rule-based resolution of MergeIntoTable is not scaling linearly with schema width.

The same merge into request worked in Athena/trino engine for limited number of columns for update

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions