Skip to content

Upsert fails after adding in a new column, target field not found #2467

@saul-data

Description

@saul-data

Apache Iceberg version

0.10.0 (latest release)

Please describe the bug 🐞

The upsert works perfectly fine until I needed to add a new field in the table.

Add the new column in the table

# Add in a column to an existing table

from pyiceberg.types import TimestamptzType, TimestampType

table = catalog.load_table(table_identifier)

(
    table.update_schema()
         .add_column("created_at", TimestamptzType(), doc="UTC created time", required=False) 
         .commit()
)

print("New schema:", table.schema())

Upsert the records

# Batch the records in 1000s
for rb in arrow_table_fixed.to_batches(max_chunksize=1000):
    batch_tbl = pa.Table.from_batches([rb])

# Upsert the data into the Iceberg table
try:
    upd = iceberg_table.upsert(batch_tbl)
    print("Upserted data into the Iceberg table.")
    print(upd)
except Exception as e:
    print(f"An error occurred during upsert: {e}")

Error message saying that the target schema doesn't have the new column

An error occurred during upsert: Target schema's field names are not matching the table's field names: ['cik_str', 'ticker', 'title', 'created_at'], ['cik_str', 'ticker', 'title']

Checked the target schema on Iceberg and the column is definitely there

# Get the schema from the Iceberg table
iceberg_table = catalog.load_table(table_identifier)
# 2) Get the PyArrow schema directly from the Iceberg schema
arrow_schema = iceberg_table.schema().as_arrow()
print(arrow_schema.schema)

output

cik_str: large_string not null
  -- field metadata --
  PARQUET:field_id: '1'
ticker: large_string not null
  -- field metadata --
  PARQUET:field_id: '2'
title: large_string
  -- field metadata --
  PARQUET:field_id: '3'
created_at: timestamp[us, tz=UTC]
  -- field metadata --
  doc: 'UTC created time'
  PARQUET:field_id: '5'

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions