fix(core): fix flush blocking by moving IO out of synchronized and using async upload #3040
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.




Background
Hi community, we encountered a problem during operation. When the system automatically restarted due to an anomaly, it encountered a blockage during shutdown and failed to restart successfully. The jstack logs showed that the blockage occurred at com.automq.opentelemetry.exporter.s3.S3MetricsExporter#flush.
jstack:
Root Cause
flush()execution holding the lock:objectStorage.write(...).get()uploadBufferlock.Solution
Move time-consuming operations out of the critical section:
Inside the lock, only do:
readable = uploadBuffer.readableBytes()
slice = uploadBuffer.readRetainedSlice(readable)
uploadBuffer.clear()
Outside the lock, do:
compressed = Utils.compress(slice)
objectStorage.write(...).whenComplete(...) (Asynchronous completion)
In the callback:
On success, update lastUploadTimestamp and nextUploadInterval, and call result.succeed()
On failure, log the error and call result.fail()
Regardless of success or failure, release finalCompressed and the slice to avoid memory leaks.