fix(s3): fix concurrent write exception for oci (#3310)#3313
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses OCI-specific behavior where conditional S3 writes can fail with a concurrent update error even after another concurrent attempt has already succeeded, by suppressing failure recording for “losing” attempts once a write is completed.
Changes:
- Suppress failed
putObjectStats(..., false)recording for fast-retry attempts if another attempt has already completed successfully. - Short-circuit the primary write failure handler when another attempt has already completed, releasing the buffer and skipping failure stats.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // OCI S3 and AWS S3 have different write mechanisms for conditional operations. | ||
| // AWS silently overwrites objects, while OCI returns the following error: | ||
| // FAILED_PRECONDITION: A concurrent update to object xx was modified concurrently. | ||
| // If one write has succeeded, do not log failed records. | ||
| if (completedFlag.get()) { | ||
| data.release(); | ||
| return null; | ||
| } | ||
|
|
There was a problem hiding this comment.
The new early-return in the primary write failure handler only checks completedFlag once at the top. There’s still a race where completedFlag can flip to true (fast-retry succeeds) after this check but before the code enqueues a retry, causing unnecessary extra writes/retries even though finalCf will (or already did) complete successfully. Consider re-checking completedFlag/finalCf.isDone() immediately before scheduling any retry (queued/delayed) and, if already completed, skip retry and release the remaining data reference.
| // OCI S3 and AWS S3 have different write mechanisms for conditional operations. | ||
| // AWS silently overwrites objects, while OCI returns the following error: | ||
| // FAILED_PRECONDITION: A concurrent update to object xx was modified concurrently. | ||
| // If one write has succeeded, do not log failed records. | ||
| if (!completedFlag.get()) { | ||
| S3OperationStats.getInstance().putObjectStats(objectSize, false).record(retryTimerUtil.elapsedAs(TimeUnit.NANOSECONDS)); | ||
| } |
There was a problem hiding this comment.
The added comment says “do not log failed records”, but the guarded call is recording failure metrics (putObjectStats(..., false)), not logging. Please adjust the wording (and ideally avoid duplicating the same multi-line comment in both handlers) so future readers don’t confuse metrics recording with log emission.
No description provided.