Skip to content

Supporting concurrent schema update #8254

@officialtushark

Description

@officialtushark

Query engine

No response

Question

Does Iceberg support concurrent schema update for the same table?

I have two transaction, namely t1 and t2.
The transaction t1 has following new columns added to it {col1:int,col2:int}
The transaction t2 has following new columns added to it {col1:int}

In my test case, the transaction t2 fails if t1 is committed earlier. This is the stack trace for the failure:

org.apache.iceberg.exceptions.CommitFailedException: Table metadata refresh is required
	at org.apache.iceberg.BaseTransaction$TransactionTableOperations.commit(BaseTransaction.java:545)
	at org.apache.iceberg.SchemaUpdate.commit(SchemaUpdate.java:442)
	at org.apache.iceberg.BaseTransaction.applyUpdates(BaseTransaction.java:497)
	at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$5(BaseTransaction.java:420)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
	at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:418)
	at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:302)

Looking under the hood, I found that Iceberg checks if the current metadata is same as the base metadata, if not then it throws this exception, essentially failing the transaction.

I could not find a way to update the base metadata of the pending updates in a transaction.

Is there a way to make these concurrent updates work?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions