Skip to content

BigQuery: Add retry detection for idempotent commit handling#15308

Open
joyhaldar wants to merge 1 commit intoapache:mainfrom
joyhaldar:bqms-table-creation-idempotency
Open

BigQuery: Add retry detection for idempotent commit handling#15308
joyhaldar wants to merge 1 commit intoapache:mainfrom
joyhaldar:bqms-table-creation-idempotency

Conversation

@joyhaldar
Copy link
Contributor

@joyhaldar joyhaldar commented Feb 12, 2026

Description

Adds retry detection to BigQuery Metastore table commits to handle cases where a commit might have succeeded but we received an error, for example, a network timeout after the request reached the server.

When retries occur during create or update, an earlier attempt may have actually succeeded. Without checking, we might incorrectly report failure or attempt duplicate commits. This addresses the existing TODO in the codebase.


Changes

  • Added RetryDetector utility that wraps callables and counts invocations
  • Added overloaded create(Table, RetryDetector) and update(TableReference, Table, RetryDetector) methods to BigQueryMetastoreClient
  • Updated BigQueryTableOperations.doCommit() to use checkCommitStatus when needed
  • Extracted cleanupMetadata() helper method

Notes for Reviewers

  1. I wanted to implement idempotency for BQMS tables during create/update, so I was checking other patterns and noticed that GlueCatalog has a RetryDetector mechanism. But that one is based on AWS MetricPublisher.
  2. Turns out, BigQueryRetryHelper (used in BQMS) does not return number of retries. So I wrote a custom RetryDetector which wraps the create and update callables to count invocations.
  3. Since I needed to pass the RetryDetector object to create and update calls, I added overloaded methods in BigQueryMetastoreClient.java. The 1-arg versions delegate to the 2-arg versions.
  4. Only checking RuntimeIOException because convertExceptionIfUnsuccessful() already converts HTTP errors to specific Iceberg exceptions. RuntimeIOException wraps IOException, which are network failures where we don't know if the request reached the server. Other exceptions like CommitFailedException or AlreadyExistsException are definitive server responses, so no need to check commit status for those.

@github-actions github-actions bot added the GCP label Feb 12, 2026
@joyhaldar joyhaldar force-pushed the bqms-table-creation-idempotency branch from 8c5f840 to decc695 Compare February 12, 2026 17:16
@joyhaldar joyhaldar force-pushed the bqms-table-creation-idempotency branch from decc695 to 9829ac3 Compare February 12, 2026 17:34
@joyhaldar joyhaldar marked this pull request as ready for review February 13, 2026 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant