-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Make sequence number conflicts retryable in concurrent commits #15126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When multiple processes concurrently commit to different branches of the
same table through the REST catalog, sequence number validation failures
in TableMetadata.addSnapshot() were throwing non-retryable ValidationException
instead of retryable CommitFailedException.
This fix catches the sequence number validation error in CatalogHandlers.commit()
and wraps it in ValidationFailureException(CommitFailedException) to:
- Skip server-side retry (which won't help since sequence number is in the request)
- Return CommitFailedException to the client so it can retry with refreshed metadata
38e36ca to
59a899f
Compare
core/src/test/java/org/apache/iceberg/rest/TestRestCatalogConcurrentWrites.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/rest/TestRestCatalogConcurrentWrites.java
Outdated
Show resolved
Hide resolved
| request.updates().forEach(update -> update.applyTo(metadataBuilder)); | ||
| } catch (ValidationException e) { | ||
| // Sequence number conflicts from concurrent commits are retryable by the client, | ||
| // but server-side retry won't help since the sequence number is in the request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting point ! since the snapshot obj is created in the client and sent to the server the sequence number is locked in and server can't do much fail fast seems reasonable.
I wonder if we can refactor / introduce some other mechanism rather than relying on exception message text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @singhpk234 !
Checking exception message is not an uncommon pattern within iceberg repo , it helps target particular scenarios that were thrown in a more generic exception type. Refactoring the exception itself will require TableMetadata change which increases risks.
I'm trying to minimize the change to get this issue fixed as per my understanding of the comment on the issue. As my original idea was to add an UpdateRequirement to the spec for this assertion.
Any thoughts?
5f76559 to
b245667
Compare
When multiple processes concurrently commit to different branches of the
same table through the REST catalog, sequence number validation failures
in TableMetadata.addSnapshot() were throwing non-retryable ValidationException
instead of retryable CommitFailedException.
This fix catches the sequence number validation error in CatalogHandlers.commit()
and wraps it in ValidationFailureException(CommitFailedException) to:
- Skip server-side retry (which won't help since sequence number is in the request)
- Return CommitFailedException to the client so it can retry with refreshed metadata
Issue #15001