-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize batch read and caching in compactor #3652
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3652 +/- ##
============================================
+ Coverage 77.90% 77.92% +0.02%
- Complexity 5042 5043 +1
============================================
Files 477 477
Lines 23917 23927 +10
Branches 2126 2129 +3
============================================
+ Hits 18632 18644 +12
+ Misses 4233 4229 -4
- Partials 1052 1054 +2
... and 27 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
74285b2
to
c5612e3
Compare
df90728
to
e8bf35f
Compare
if (maxCacheEntries == 0) { | ||
log.warn("Since AddressSpaceView readCache size is 0, " + | ||
"overriding CorfuRuntime bulkReadSize and checkpointReadBatchSize to 1."); | ||
runtime.getParameters().setBulkReadSize(1); | ||
runtime.getParameters().setCheckpointReadBatchSize(1); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the batch can be successfully read and later GC'd/dropped since it's not cached, then that implies that there is enough memory to keep that batch in memory.
Why not set the cache size to the batch size for the compactor? that could improve the compaction performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
3fafa14
to
db6d7eb
Compare
db6d7eb
to
548140a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question - I see that this inefficiency while reading checkpointEntries is common to any client that has to go through the processCheckpoint() method. (Of course the compactor gets affected the most since it has to come up every time with a new runtime)
If that's the case, why to even have the checkpointBatchReadSize default to 5? Can we default it to 1 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
But changes in CompactorBaseConfig requires changes in CompactorConfigUnitTest
Read cache does not cache checkpoint entries. This invalidates the use of checkpoint batch read in stream layer. This PR adds the extra parameter CHECKPOINT_READ_BATCH_SIZE to compactor_runner.py and set it to 1 for compactor. This PR also enables read cache for compactor to allow just one batch read of non-checkpoint entries to be cached. This optimizes the read efficiency of checkpointer.
548140a
to
4b8593d
Compare
Good point. I changed the default checkpointReadBatchSize to 1 for corfu runtime. |
Read cache does not cache checkpoint entries. This invalidates the
use of checkpoint batch read in stream layer. This PR adds the extra
parameter CHECKPOINT_READ_BATCH_SIZE to compactor_runner.py
and set it to 1 for compactor.
This PR also enables read cache for compactor to allow just one batch
read of non-checkpoint entries to be cached. This optimizes the read
efficiency of checkpointer.
Related issue: #3651
Checklist (Definition of Done):