[SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore#29149
[SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore#29149baohe-zhang wants to merge 2 commits intoapache:masterfrom
Conversation
|
cc @HeartSaVioR @mridulm @tgravescs ^^ |
|
|
||
| try (WriteBatch batch = db().createWriteBatch()) { | ||
| while (valueIter.hasNext()) { | ||
| final Object value = valueIter.next(); |
There was a problem hiding this comment.
Adding one value (L204-L219) looks to be same with write() - let's extract and deduplicate.
| while (it.hasNext()) { | ||
| levelDB.write(it.next()) | ||
| } | ||
| val values = Lists.newArrayList( |
There was a problem hiding this comment.
This would be OK, given all entries are from inMemoryStore which are already materialized into memory.
HeartSaVioR
left a comment
There was a problem hiding this comment.
Looks OK in general. Just a minor comment. I'd like to wait for others to review as well if it doesn't hold too long.
|
ok to test |
|
add to whitelist |
|
Test build #126130 has finished for PR 29149 at commit
|
|
Test build #126129 has finished for PR 29149 at commit
|
|
@mridulm @tgravescs |
|
I likely won't have time for a review so go ahead without mine |
|
OK I'll go ahead merging. To be sure I'll trigger test once again. |
|
retest this, please |
|
Test build #126278 has finished for PR 29149 at commit
|
|
retest this, please |
|
Test build #126284 has finished for PR 29149 at commit
|
|
retest this, please |
|
Test build #126290 has finished for PR 29149 at commit
|
|
Thanks! Merging to master. |
|
Thanks for the review! |
|
Sorry for the delay in getting to this. This can be trivially done with an inner loop doing The performance actually improves for larger list sizes (due to memory pressure reducing - particularly in SHS), while the smaller lists suffer from minimal impact |
|
This seems an important improvement. Should I put up a followup PR to include this change? |
|
That would be great, thanks @baohe-zhang ! |
What changes were proposed in this pull request?
The idea is to improve the performance of HybridStore by adding batch write support to LevelDB. #28412 introduces HybridStore. HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. In the comments section of #28412 , @mridulm mentioned using batch writing can improve the performance of this dumping process and he wrote the code of writeAll().
Why are the changes needed?
I did the comparison of the HybridStore switching time between one-by-one write and batch write on an HDD disk. When the disk is free, the batch-write has around 25% improvement, and when the disk is 100% busy, the batch-write has 7x - 10x improvement.
when the disk is at 0% utilization:
when the disk is at 100% utilization:
I also ran some write related benchmarking tests on LevelDBBenchmark.java and measured the total time of writing 1024 objects. The tests were conducted when the disk is at 0% utilization.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually tested.