Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-2028] Implement RockDbBasedMap as an alternate to DiskBasedMap in ExternalSpillableMap #3117

Closed
wants to merge 51 commits into from

Conversation

rmahindra123
Copy link
Contributor

What is the purpose of the pull request

This pull request adds a new alternative based on RockDb for the Disk Based Map that is used within the ExternalSpillableMap. Our benchmark results shows that RockDb may improve performance significantly when the data set is large while available memory may be scarce. RockDb supports compression, efficient memory usage and native library, that may be more efficient in certain situations. By default, disk based map will be used, and a config change will be required to enable rocksDb.

In this PR, the rocksDB support is only enabled for HoodieMergeHandle, and a subsequent PR will extend it to all consumers of ExternalSpillableMap (tracked here HUDI-2044)

Brief change log

  • Adds a new alternative based on RockDb for the Disk Based Map that is used within the ExternalSpillableMap.
  • The support is currently added only for HoodieMergeHandle

Verify this pull request

This change added tests and can be verified as follows:

Added the unit test in TestSpillableRocksDBBasedMap
Updated the test for TestExternalSpillableMap

@hudi-bot
Copy link

hudi-bot commented Jun 20, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2021

Codecov Report

Merging #3117 (72a3594) into master (c08fbb4) will decrease coverage by 1.72%.
The diff coverage is 80.19%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3117      +/-   ##
============================================
- Coverage     46.01%   44.29%   -1.73%     
+ Complexity     5306     4615     -691     
============================================
  Files           911      826      -85     
  Lines         39476    36669    -2807     
  Branches       4254     3949     -305     
============================================
- Hits          18166    16243    -1923     
+ Misses        19456    18674     -782     
+ Partials       1854     1752     -102     
Flag Coverage Δ
hudicli 39.95% <ø> (ø)
hudiclient 16.44% <14.28%> (-14.00%) ⬇️
hudicommon 47.74% <85.10%> (+0.18%) ⬆️
hudiflink 59.94% <ø> (-1.40%) ⬇️
hudihadoopmr 51.29% <ø> (ø)
hudisparkdatasource 67.06% <ø> (+0.53%) ⬆️
hudisync 54.05% <ø> (+2.31%) ⬆️
huditimelineservice 64.36% <ø> (ø)
hudiutilities 58.40% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ain/java/org/apache/hudi/io/HoodieMergeHandle.java 0.00% <0.00%> (ø)
...org/apache/hudi/common/util/SpillableMapUtils.java 86.84% <ø> (ø)
...java/org/apache/hudi/config/HoodieWriteConfig.java 17.03% <20.00%> (+0.03%) ⬆️
...he/hudi/common/util/collection/BitCaskDiskMap.java 80.88% <75.00%> (ø)
...apache/hudi/common/util/collection/RocksDBDAO.java 75.12% <81.81%> (+0.26%) ⬆️
...he/hudi/common/util/collection/RocksDbDiskMap.java 85.10% <85.10%> (ø)
...i/common/util/collection/ExternalSpillableMap.java 77.45% <85.71%> (+6.02%) ⬆️
...common/table/log/HoodieMergedLogRecordScanner.java 82.35% <100.00%> (ø)
.../hudi/common/util/collection/LazyFileIterable.java 74.41% <100.00%> (ø)
...e/hudi/common/util/collection/RocksDBBasedMap.java 0.00% <0.00%> (-39.29%) ⬇️
... and 151 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c08fbb4...72a3594. Read the comment docs.

@vinothchandar vinothchandar added this to Ready for Review in PR Tracker Board Jun 23, 2021
@rmahindra123
Copy link
Contributor Author

rmahindra123 commented Jun 29, 2021

I see we have more test coverage in TestExternalSpillableMap. Can we parametrize the tests that are applicable and run those for both type of spillable maps.

I had tests for TestExternalSpillableMap in the stacked diff, anyway added it here.

In general, do we have tests around diff values for maxInMemorySizeInBytes for external spillable map. If not, do you think we can add them while we are at this.

We do test if the right amount of keys were in memory and the right amount was spilled over. Wondering how different values of maxInMemorySizeInBytes will help?

@rmahindra123
Copy link
Contributor Author

Moved to #3194

PR Tracker Board automation moved this from Nearing Landing to Done Jun 30, 2021
@rmahindra123 rmahindra123 deleted the rm_rocks_db branch June 30, 2021 21:10
@rmahindra123
Copy link
Contributor Author

Moved to #3194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet