Skip to content

Commit-validation failures #16659

@martinskeem

Description

@martinskeem

Query engine

Kafka Connect

Question

Hi,

We use the Iceberg connector with Kafka Connect to sink data to Azure Databricks using their Iceberg catalog implementation.

After some time of successful operation, we see errors such as the one below after which there is no progression in the sink connector.

Coordinator iceberg-sink-connector-epe-log-topics-v1-0 failed to commit for commit 51a29b27-d1b1-45f4-b6f3-61228b4c8481, will try again next cycle","debug_stacktrace":"org.apache.iceberg.exceptions.BadRequestException: Malformed request: Commit validation failed. Please contact Databricks support for assistance. [ErrorCode: 2010]
	at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:341)
	at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:137)
	at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:119)
	at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:242)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:347)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:299)
	at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:112)
	at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:150)
	at org.apache.iceberg.rest.RESTTableOperations.commit(RESTTableOperations.java:206)
	at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:501)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:473)
	at org.apache.iceberg.connect.channel.Coordinator.commitToTable(Coordinator.java:286)
	at org.apache.iceberg.connect.channel.Coordinator.lambda$doCommit$1(Coordinator.java:173)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

We have done some analysis and it appears that Databricks on commit validation failures returns a HTTP 400 status code and not HTTP 409 error which appears to be how the Iceberg code recognises commit validation failures - after which it retries (optimistic concurrency).

I have a couple of questions:

  1. Is HTTP 409 the Iceberg specification for commit validation failures? Is it formalised somewhere? If so, I would like to include this in a bug report to Databricks (if not, I wonder if it will be accepted as a bug).
  2. Would you consider a PR that works around the issues such as this REST: treat HTTP 400 commit-validation failures as CommitFailedException #16644. We are currently testing this workaround in our development environment.

Br,
Martin

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions