Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize batch read and caching in compactor #3652

Merged
merged 1 commit into from
Jun 15, 2023

Conversation

Lujie1996
Copy link
Collaborator

@Lujie1996 Lujie1996 commented Jun 6, 2023

Read cache does not cache checkpoint entries. This invalidates the
use of checkpoint batch read in stream layer. This PR adds the extra
parameter CHECKPOINT_READ_BATCH_SIZE to compactor_runner.py
and set it to 1 for compactor.

This PR also enables read cache for compactor to allow just one batch
read of non-checkpoint entries to be cached. This optimizes the read
efficiency of checkpointer.

Related issue: #3651

Checklist (Definition of Done):

  • There are no TODOs left in the code
  • Coding conventions (e.g. for logging, unit tests) have been followed
  • Change is covered by automated tests
  • Public API has Javadoc

@codecov
Copy link

codecov bot commented Jun 6, 2023

Codecov Report

Merging #3652 (4b8593d) into master (f2ec6e1) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #3652      +/-   ##
============================================
+ Coverage     77.90%   77.92%   +0.02%     
- Complexity     5042     5043       +1     
============================================
  Files           477      477              
  Lines         23917    23927      +10     
  Branches       2126     2129       +3     
============================================
+ Hits          18632    18644      +12     
+ Misses         4233     4229       -4     
- Partials       1052     1054       +2     
Impacted Files Coverage Δ
...rc/main/java/org/corfudb/runtime/CorfuRuntime.java 75.32% <100.00%> (ø)
...ava/org/corfudb/runtime/view/AddressSpaceView.java 80.28% <100.00%> (+0.28%) ⬆️

... and 27 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines +109 to +115
if (maxCacheEntries == 0) {
log.warn("Since AddressSpaceView readCache size is 0, " +
"overriding CorfuRuntime bulkReadSize and checkpointReadBatchSize to 1.");
runtime.getParameters().setBulkReadSize(1);
runtime.getParameters().setCheckpointReadBatchSize(1);
}

Copy link
Contributor

@Maithem Maithem Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the batch can be successfully read and later GC'd/dropped since it's not cached, then that implies that there is enough memory to keep that batch in memory.

Why not set the cache size to the batch size for the compactor? that could improve the compaction performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Lujie1996 Lujie1996 force-pushed the reduceReadSizeForCompactor branch 2 times, most recently from 3fafa14 to db6d7eb Compare June 8, 2023 14:53
@Lujie1996 Lujie1996 changed the title Reduce read size to 1 for compactor Optimize batch read and caching in compactor Jun 8, 2023
Copy link
Collaborator

@SravanthiAshokKumar SravanthiAshokKumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question - I see that this inefficiency while reading checkpointEntries is common to any client that has to go through the processCheckpoint() method. (Of course the compactor gets affected the most since it has to come up every time with a new runtime)
If that's the case, why to even have the checkpointBatchReadSize default to 5? Can we default it to 1 instead?

Copy link
Collaborator

@SravanthiAshokKumar SravanthiAshokKumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
But changes in CompactorBaseConfig requires changes in CompactorConfigUnitTest

Read cache does not cache checkpoint entries. This invalidates the
use of checkpoint batch read in stream layer. This PR adds the extra
parameter CHECKPOINT_READ_BATCH_SIZE to compactor_runner.py and set
it to 1 for compactor.

This PR also enables read cache for compactor to allow just one batch
read of non-checkpoint entries to be cached. This optimizes the read
efficiency of checkpointer.
@Lujie1996
Copy link
Collaborator Author

General question - I see that this inefficiency while reading checkpointEntries is common to any client that has to go through the processCheckpoint() method. (Of course the compactor gets affected the most since it has to come up every time with a new runtime) If that's the case, why to even have the checkpointBatchReadSize default to 5? Can we default it to 1 instead?

Good point. I changed the default checkpointReadBatchSize to 1 for corfu runtime.

@Maithem Maithem merged commit 4b69c44 into master Jun 15, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants