-
Notifications
You must be signed in to change notification settings - Fork 555
perf(s3stream): async heavy log cache operation #3016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -24,6 +24,7 @@ | |||||||||||||||||||||||||||||||||||
| import com.automq.stream.s3.model.StreamRecordBatch; | ||||||||||||||||||||||||||||||||||||
| import com.automq.stream.s3.trace.context.TraceContext; | ||||||||||||||||||||||||||||||||||||
| import com.automq.stream.s3.wal.RecordOffset; | ||||||||||||||||||||||||||||||||||||
| import com.automq.stream.utils.Threads; | ||||||||||||||||||||||||||||||||||||
| import com.automq.stream.utils.biniarysearch.StreamRecordBatchList; | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| import org.slf4j.Logger; | ||||||||||||||||||||||||||||||||||||
|
|
@@ -37,7 +38,9 @@ | |||||||||||||||||||||||||||||||||||
| import java.util.List; | ||||||||||||||||||||||||||||||||||||
| import java.util.Map; | ||||||||||||||||||||||||||||||||||||
| import java.util.Optional; | ||||||||||||||||||||||||||||||||||||
| import java.util.concurrent.CompletableFuture; | ||||||||||||||||||||||||||||||||||||
| import java.util.concurrent.ConcurrentHashMap; | ||||||||||||||||||||||||||||||||||||
| import java.util.concurrent.ExecutorService; | ||||||||||||||||||||||||||||||||||||
| import java.util.concurrent.TimeUnit; | ||||||||||||||||||||||||||||||||||||
| import java.util.concurrent.atomic.AtomicInteger; | ||||||||||||||||||||||||||||||||||||
| import java.util.concurrent.atomic.AtomicLong; | ||||||||||||||||||||||||||||||||||||
|
|
@@ -59,6 +62,8 @@ public class LogCache { | |||||||||||||||||||||||||||||||||||
| private static final Consumer<LogCacheBlock> DEFAULT_BLOCK_FREE_LISTENER = block -> { | ||||||||||||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||||||||||||
| private static final int MAX_BLOCKS_COUNT = 64; | ||||||||||||||||||||||||||||||||||||
| private static final ExecutorService LOG_CACHE_ASYNC_EXECUTOR = Threads.newFixedFastThreadLocalThreadPoolWithMonitor( | ||||||||||||||||||||||||||||||||||||
| 1, "LOG_CACHE_ASYNC", true, LOGGER); | ||||||||||||||||||||||||||||||||||||
| static final int MERGE_BLOCK_THRESHOLD = 8; | ||||||||||||||||||||||||||||||||||||
| final List<LogCacheBlock> blocks = new ArrayList<>(); | ||||||||||||||||||||||||||||||||||||
| final AtomicInteger blockCount = new AtomicInteger(1); | ||||||||||||||||||||||||||||||||||||
|
|
@@ -259,10 +264,19 @@ Optional<LogCacheBlock> archiveCurrentBlockIfContains0(long streamId) { | |||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
| /** | |
| * Marks the given cache block as free and asynchronously attempts to merge adjacent free blocks. | |
| * <p> | |
| * This method returns immediately after marking the block as free. The merge operation is performed | |
| * asynchronously on a background executor. | |
| * <p> | |
| * Callers should await the returned {@link CompletableFuture} if they require the merge operation to be | |
| * completed before proceeding (e.g., in tests or when subsequent operations depend on merge completion). | |
| * <p> | |
| * If an exception occurs during the asynchronous merge, the returned future will be completed exceptionally. | |
| * | |
| * @param block the block to mark as free | |
| * @return a {@link CompletableFuture} that completes when the merge operation finishes, or completes | |
| * exceptionally if the merge fails | |
| */ |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting block.free = true is not synchronized with the lock that protects the blocks list. This could lead to a race condition where:
- Thread A marks block as free (line 268)
- Thread B reads the block list and sees block.free = false (before the write is visible)
- Thread A's tryMerge checks block.free = true
While the locks in tryMerge() and tryRealFree() protect the list modifications, the initial block.free = true assignment should be either:
- Marked as
volatilein the LogCacheBlock class - Performed inside a lock
- Use an AtomicBoolean
This ensures visibility across threads.
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The markFree method uses a single-threaded executor (LOG_CACHE_ASYNC_EXECUTOR) for both tryMerge() and the cleanup operations in tryRealFree(). Since tryMerge() can be a long-running operation (it has an infinite loop), this could block cleanup operations from tryRealFree() that are queued after it. Consider:
- Using separate executors for merge and cleanup operations
- Adding timeout/cancellation mechanisms to prevent one operation from blocking others indefinitely
- Documenting the expected execution characteristics and ordering guarantees
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of markFree has changed from void to CompletableFuture<Void>, which is a breaking API change. Existing callers that don't await this future may experience race conditions where they assume the merge operation has completed when it hasn't. Consider:
- Adding documentation about the async behavior and when callers need to wait for completion
- Reviewing all call sites to ensure they properly handle the returned future (note:
S3Storage.freeCache()andSnapshotReadCache.put()currently ignore it)
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lambda removed.forEach(b -> { blockFreeListener.accept(b); b.free(); }) is executed asynchronously without error handling. If blockFreeListener.accept(b) or b.free() throws an exception, it will be silently swallowed by the executor. Consider wrapping the operations in a try-catch block to log any failures:
LOG_CACHE_ASYNC_EXECUTOR.execute(() -> removed.forEach(b -> {
try {
blockFreeListener.accept(b);
b.free();
} catch (Throwable t) {
LOGGER.error("Failed to free block", t);
}
}));| blockFreeListener.accept(b); | |
| b.free(); | |
| try { | |
| blockFreeListener.accept(b); | |
| b.free(); | |
| } catch (Throwable t) { | |
| LOGGER.error("Failed to free block", t); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
LOG_CACHE_ASYNC_EXECUTORis declared as a static field, which means it's shared across allLogCacheinstances in the JVM. This could lead to:LogCacheinstances competing for the single threadLogCachedelays operations in anotherConsider making this executor an instance field, or if global sharing is intentional, add documentation explaining the rationale and implications.